Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 11 | Nov 2025

p-ISSN: 2395-0072

www.irjet.net

HOTPIN — WEARABLE AI ASSISTANT Keshav Patil¹, Vighnesh Nilajakar², Sanket Jatrate³, Vinayak Patil´, Darshan Kalkuppiµ 1Assistant Professor, Maratha Mandal’s Engineering College, Belagavi, Karnataka, India 2,3,4,5 Student, Maratha Mandal’s Engineering College, Belagavi, Karnataka, India

---------------------------------------------------------------------***--------------------------------------------------------------------system’s architecture reflects the growing trend of shifting Abstract – This paper presents the development of HOTPIN,

computation from cloud-based services to nearby edge devices, increasing responsiveness and privacy. The prototype demonstrates the practicality of integrating voice and vision processing within a small, battery-powered wearable assistant.

a compact wearable assistant designed to offer real-time voice interaction and optional visual analysis using a lightweight hardware platform. The system uses an ESP32-based controller connected to an I²S microphone and digital audio amplifier to process speech requests, while a low-power camera module can be activated only when visual input is needed. User audio is captured, converted to text, and transmitted over Wi-Fi to an edge server that communicates with a locally hosted large-language-model (LLM). The backend processes both text and image inputs and returns a contextual response, which is then converted to audio on the wearable device. This on-demand multimodal approach ensures reduced power consumption while maintaining responsiveness. Experimental evaluation shows efficient performance, with a median latency of around 300ms for voice-only requests and approximately 800ms for combined voice-and-image interactions. Power measurements indicate average current draws of 280mA and 450mA, enabling several hours of operation from a 5V, 2000mAh Li-ion battery. These results highlight the feasibility of integrating local AI processing with a wearable form factor, providing an energyaware and flexible platform for personal ambient intelligence.

1.1 Technology Assessment The HOTPIN prototype demonstrates emerging trends in embedded AI, where multimodal processing is enabled using a combination of microcontrollers and edge-hosted language models. The use of I²S audio modules allows precise digital capture and playback, while the selective activation of the camera provides visual context only when necessary. However, the limitations of the ESP32 platform must be considered, including constrained memory, limited parallel peripheral usage, and occasional conflicts between camera and audio drivers. These challenges indicate that while ESP32-based designs are suitable for early-stage wearables, achieving seamless multimodal performance may require more advanced microcontrollers or specialized AI-focused hardware in future iterations.

Key Words: Wearable AI, ESP32-CAM, Multimodal Assistant, Edge AI, LLM, I²S Audio.

1.2 Cost–Benefit Analysis The overall hardware cost of the HOTPIN wearable remains relatively low because it is built using widely available microcontroller components. Despite its modest cost, the system delivers strong functional benefits: real-time voice assistance, optional image-based interaction, reduced user effort, and improved energy efficiency. The on-demand activation of the camera avoids unnecessary power consumption, extending battery life. Although additional work is required to refine monetization and production strategies, the system demonstrates a high value-to-cost ratio for personal and experimental use cases.

1.INTRODUCTION Wearable AI devices are evolving rapidly as users increasingly demand hands-free interaction, personalization, and real-time assistance without relying on cloud connectivity. To address these requirements, the HOTPIN wearable assistant combines speech recognition, contextual processing, and selective image capture through a compact embedded platform and a local LLM-powered backend. The objective of the system is to provide an efficient multimodal assistant capable of performing tasks such as question answering, environmental understanding, and interactive guidance, while keeping power consumption and latency low.

2. SYSTEM ARCHITECTURE The HOTPIN assistant follows a layered design consisting of the Wearable Device Layer, Edge/API Layer, and LLM Backend Layer.

The ESP32 microcontroller serves as the central controller, handling digital audio through an I²S microphone and speaker amplifier. A camera module remains disabled during idle operation and is activated only when the user triggers a visual query, ensuring minimal standby consumption. Speech data is sent to a local REST API that passes the request to an LLM, enabling fast and private inference. The

Impact Factor value: 8.315

At the wearable layer, an ESP32 module interfaces with an INMP441 I²S microphone for audio capture and a MAX98357A I²S amplifier for audio output. A camera module remains powered off during idle periods and is enabled only

ISO 9001:2008 Certified Journal

Page 509