Issuu

International Research Journal of Engineering and Technology (IRJET) Volume: 12 Issue: 11 | Nov 2025

www.irjet.net

e-ISSN: 2395-0056 p-ISSN: 2395-0072

Speech-To-Text for Hearing Impaired Asst. Prof. Madhuri Martis1, Harshitha2, Chaithanya S B3, NaveenKumar4, Deeksha P5 1Assistant Professor, Department of Information Science and Engineering, Bapuji Institute of Engineering and

Technology, Davangere, Karnataka, India.

2,3,4,5Bachelor of Engineering, Information Science and Engineering, Bapuji Institute of Engineering and

Technology, Davangere, affiliated to VTU Belagavi, Karnataka, India. ----------------------------------------------------------------------***-------------------------------------------------------------------------semantic highlighting, the system enhances readability and Abstract - This project presents a multilingual and

user comprehension, especially for individuals with hearing impairments.

emotion-aware Speech-to-Text (STT) system aimed at improving communication accessibility for hearing-impaired users. The system accepts English audio input, performs automatic speech recognition, and enhances the transcript through named-entity highlighting, part-of-speech emphasis, and keyword-based emotion detection. Additionally, it supports translation of the generated text into regional languages such as Kannada and Telugu to improve inclusivity for native speakers. Implemented as a lightweight Flask web application, the system integrates ASR, NLP, and translation modules while offering features such as emoji tagging, user-friendly visualization, and PDF export. Experimental testing demonstrates reliable transcription accuracy for short audio inputs and efficient real-time processing, making the system suitable for educational, assistive, and everyday communication environments..

Although current STT systems such as Google Live Transcribe, Microsoft Azure Speech Services, and Whisper provide high accuracy, they often depend on strong internet connectivity, lack customization, or fail to support low-resource languages effectively. Many tools also overlook user experience (UX) elements like adjustable text size, contextual feedback, and inclusive design, which are essential for accessibility-based applications. This project addresses these challenges by designing a lightweight, web-based application built using Flask that provides fast processing, entity highlighting, emotion tagging, multilingual translation, and PDF export capabilities. The proposed system aims to deliver an accessible, accurate, and user-friendly platform that enhances communication for hearing-impaired users and broadens linguistic inclusivity.

Key Words: Speech-to-Text (STT), Automatic Speech Recognition (ASR), Natural Language Processing (NLP), Emotion Detection, Multilingual Translation, Kannada, Telugu, Accessibility, Hearing-Impaired Users, Flask Web Application, Emoji Tagging, PDF Generation.

1.1 System Architecture The proposed Speech-to-Text system is designed to be lightweight, modular, and highly accessible for hearingimpaired users. Its architecture ensures accurate transcription, emotion detection, multilingual translation, and smooth user interaction. To guarantee efficient operation, the system integrates several coordinated layers and technologies.

1. INTRODUCTION In recent years, advancements in Artificial Intelligence (AI), Automatic Speech Recognition (ASR), and Natural Language Processing (NLP) have significantly improved human–computer interaction. Speech-to-text technologies, in particular, have become essential for accessibility, communication support, and real-time information processing. However, despite progress in ASR and Neural Machine Translation (NMT), many existing systems still struggle with regional languages, emotional understanding, and real-world noise conditions. These limitations make current tools less effective for hearing-impaired users who rely heavily on accurate, expressive, and context-aware transcription.

1. Frontend Layer: The user interface is developed using HTML5, CSS3, JavaScript, and Bootstrap, providing a simple and responsive experience. It allows users to upload or record audio, view highlighted transcripts, select translation options, and download the final PDF output with ease. 2. Backend Layer: The backend is built using the Flask framework, which manages all user requests, processes audio files, handles NLP operations, and coordinates ASR, emotion detection, and translation modules. This layer controls the data flow between audio input, text processing, translation, and PDF generation.

This project focuses on developing a multilingual and emotion-aware Speech-to-Text (STT) system that converts English audio into text, highlights linguistic features, identifies emotional tone, and translates the transcript into Kannada and Telugu. The system integrates ASR for transcription, NLP techniques for entity recognition and emotion detection, and lightweight translation models for regional language support. By adding emoji cues and

Impact Factor value: 8.315

3. Audio Processing & Speech Recognition Module: This module uses Python’s SpeechRecognition library and Google’s Web Speech API to convert uploaded .wav audio into English text. It ensures accurate Automatic Speech

ISO 9001:2008 Certified Journal

Page 758