International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 11 Issue: 10 | Oct 2024
p-ISSN: 2395-0072
www.irjet.net
Automatic Speech Recognition for Indic Languages Manish Godbole 1, Kaustubh Joshi 2, Aditya Kadu3, Dr. Mukta Taklikar4 1Fourth Year Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India 2Fourth Year Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India 3Fourth Year Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India
4 Associate Professor, Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, India
---------------------------------------------------------------------***--------------------------------------------------------------------2. LITERATURE REVIEW Abstract - In collaboration with Lightbees, a fintech startup, this final year B.E. project focuses on enhancing 'AbleCredit', a flagship product aimed at simplifying loan applications for MSMEs. Leveraging advanced technologies in Artificial Intelligence, Machine Learning (AIML), Natural Language Processing (NLP), and digital signal processing, the project aims to innovate and streamline financial processes. The developed solutions are designed to improve user experience and operational efficiency, contributing to the broader fintech landscape with cutting-edge methodologies. This project embodies the integration of theoretical knowledge with practical application, addressing real-world challenges in financial technology.
“Liddy, E.D [1]” Natural Language Processing (NLP) is a process of computer-assisted text analysis that has theoretical and practical foundations. Since it is an expanding area of research and development there is no clear definition. Nevertheless, there are some features that the definition would have to contain. “Diksha Khurana 1 & Aditya Koli 1 & Kiran Khatter 2 & Sukhdev Singh [2]” Natural Language Processing (NLP) is a field that has gained significant attention recently for its ability to computationally represent and analyze human language. Its applications have now spread to different areas like machine translation, spam detection of emails, information extraction, summarization, healthcare and answering questions, to name a few. In this paper, the authors start by noting four phases through a discussion of different levels of NLP and components of Natural Language Generation, followed by the historical overview and the evolution of NLP. They then cover the current state of the art, which includes the different applications of NLP, the new trends that are emerging, and the challenges that already exist. In the last part, they summarize what is available in the datasets, models, and evaluation metrics of NLP.
Key Words: Automatic Speech Recognition (ASR), OpenAI Whisper, AI4Bharat, Multilingual Speech Recognition, Natural Language Processing, Artificial Intelligence, Machine Learning.
1.INTRODUCTION ASR systems stand for Automatic Speech Recognition systems that have changed how people communicate with machines and made it possible to have voice-based interfaces for applications such as virtual assistants, transcription services, and more. ASR transcribes the spoken language into written text, which is very useful in communication in a world where technology is the primary means of communication. Nonetheless, these systems are often constrained by language diversity and linguistic complexities, especially in places like India where there are more than 22 official languages and hundreds of dialects. Indic languages are a distinctive case due to their complex phonetic structures, tonal differences, and the frequent use of code-switching between languages.
“Aditya Jain*, Gandhar Kulkarni, Vraj Shah [3]” For instance, modern text processing algorithms assign entities to categories and the preferences of users establish them. These algorithms are present in features like smart replies and smart suggestions that can be used in different applications, which are designed to reduce the workload of users and the time spent in providing by accurate and efficient responses. Despite the significant developments in the field over the past decade, the task of handling speech processing issues yet to be finished. Neural networks and deep learning techniques play a significant part in the issues of both the text and the speech process for more efficient industrialized activities. Thus, these innovations bring the accuracy level of results near the level of human comprehension. AI systems execute text and speech processing algorithms for evaluating the user needs mainly
In this light, it is OpenAI's Whisper and AI4Bharat, a highly capable multilingual ASR model, which is a good start for solving these problems. The goal of fine-tuning Whisper and AI4Bharat for Indic languages is to develop a speech recognition system that captures the fine points of these languages, thus enhancing accessibility and digital inclusion for the diverse linguistic landscape of India.
© 2024, IRJET
|
Impact Factor value: 8.226
|
ISO 9001:2008 Certified Journal
|
Page 171