Skip to main content

A Mobile-Based Skill Trading Platform for Enhancing Practical Skill Development

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 12 | Dec 2025

p-ISSN: 2395-0072

www.irjet.net

Automation Agent For Task Completion Surabhi M1, Mehraj Fathima Ansari2, Umme Kulsum Ansari3, Muskan Sahani4, Ms. Ambili K5 1234Students, Department of CSE-AIML, AMC Engineering College, Bengaluru, KARNATAKA, INDIA 5 Assistant Professor, Department of CSE-AIML, AMC Engineering College, Bengaluru, KARNATAKA, INDIA

---------------------------------------------------------------------***---------------------------------------------------------------------

assistants cannot perform deep, personalized, crossapplication automation such as booking cabs, ordering food, or sending messages. There is a clear need for a unified AIdriven system that understands natural language, works across different platforms, and automates repetitive tasks efficiently.

Abstract - With the increasing complexity of applications

and repetitive tasks like grocery booking, food ordering, Cab booking, and social media interactions there is a growing demand for intelligent automation solutions. This paper presents the design and development of an Automation Agent that enables users to perform multi-step tasks across Web applications using text or voice. The system leverages Gemini for interpreting user intent and generating dynamic task flows, while UiPath is used to automate interactions within apps.

Objectives  

The architecture is modular, comprising four key components: a native Web interface for capturing input and initiating actions, a cloud-based AI interpreter for understanding tasks, a backend service for maintaining user memory and preferences, and an automation engine that converts planned steps into real-time UI actions. By integrating technologies such as MongoDB and Google’s Gemini model, the assistant intelligently adapts its responses and dynamically automates workflows.

1.2 Significance The system simplifies application interactions, especially for users with limited digital skills or accessibility needs. It enhances productivity by enabling hands-free control and personalized automation of everyday tasks.

Key Words: Automation Agent, Gemini, UiPath, Task Automation, Voice Commands, Natural Language Processing

1.3 Scope The automated agent is designed for Web Apps which interacts with apps like Ola/Uber, Blinkit and WhatsApp. It navigates app interfaces in real-time using screen content, without relying on external APIs. A backend stores task history and user preferences for personalized automation.

1. INTRODUCTION In today’s mobile-centric world, users increasingly depend on applications for daily needs such as booking in Ola/Uber, ordering food, Grocery Shopping, WhatsApp Messaging and Call, Cab booking. Despite advancements in voice assistants, these routine tasks often require repetitive navigation across app interfaces, filling out forms, and confirming actions manually. This not only consumes time but also presents usability challenges for users with limited digital literacy or accessibility needs. The demand for a more intelligent and user-friendly solution is growing rapidly.

2. LITERATURE REVIEW The paper [1] presents a novel approach that leverages vision-based UI understanding combined with large language model planning by translating screenshots of mobile app interfaces into natural language descriptions. This enables task automation without requiring access to the underlying app view hierarchies, making it suitable for more restricted or closed environments. Despite its innovation, the system faces challenges when dealing with animated or frequently changing user interfaces, which may lead to decreased accuracy. The system does not currently support personalized user workflows.

1.1 Problem Statement Traditional virtual assistants like Google Assistant and Siri are limited in their ability to perform complex tasks inside third-party applications. These systems are primarily built around predefined voice shortcuts or app-level integrations and often cannot adapt to changing user interfaces, routine-based tasks, or complex workflows. They also lack robust interaction capabilities and cannot autonomously execute a sequence of steps across different apps.Existing

© 2025, IRJET

|

Impact Factor value: 8.315

Develop a voice/text-based AI assistant using Gemini. Automate app workflows without using external APIs Learn user preferences to personalize responses

The paper [2] reveals that AI techniques, especially pre- trained deep-learning models, are widely used in areas such as personalization and media processing. However, it highlights significant gaps, including the lack of real-time adaptive learning and agent-driven task execution within mobile environments. Furthermore, the study points out the absence of integration with large language models for natural language command interpretation, emphasizing a

|

ISO 9001:2008 Certified Journal

|

Page 53


Turn static files into dynamic content formats.

Create a flipbook
A Mobile-Based Skill Trading Platform for Enhancing Practical Skill Development by IRJET Journal - Issuu