Review On Speech Recognition using Deep Learning

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 10 | Oct 2022

p-ISSN: 2395-0072

www.irjet.net

Review On Speech Recognition using Deep Learning Anushree Raj1, Sahir Abdulla2, Vishwas N3 Assistant Professor- IT Department, AIMIT, Mangaluru, anushreeraj@staloysius.ac.in 2 MCA Student, AIMIT, Mangaluru, 2117117vishwas@staloysius.ac.in 3 MCA Student, AIMIT, Mangaluru, 2117044sahir@staloysius.ac.in ---------------------------------------------------------------------***-------------------------------------------------------------------feature extraction algorithm, acoustic model, language Abstract: Speech is the most effective means for humans to 1

model, and search algorithm. A multidimensional pattern recognition system is essentially what the speech recognition system does.

communicate their ideas and emotions across a variety of languages. Every language has a different set of speech characteristics. The tempo and dialect vary from person to person even when speaking the same language. For some folks, this makes it difficult to understand the message being delivered. Long speeches can be challenging to follow at times because of things like inconsistent pronunciation, tempo, and other factors. The development of technology that enables the recognition and transcription of voice into text is aided by speech recognition, an interdisciplinary area of computational linguistics. The most crucial information is taken from a text source and adequately summarized by text summarization.

Speech recognition provides input for automatic translation, generates print-ready dictation, and allows hands-free operation of various devices and equipment—all of which are especially helpful to many disabled people. Medical dictation software and automated telephone systems were some of the first speech recognition applications [2]. Speech recognizers are made up of a few components, such as the speech input, feature extraction, feature vectors, a decoder, and a word output. The decoder leverages acoustic models, a pronunciation dictionary, and language models Benefits:

Key

words: Speech recognition, Deep learning, computational linguistics, feature extraction, feature vectors.

 It can help to increase productivity in many businesses, such as in healthcare industries.

1. INTRODUCTION To select the proper output, some Voice is frequently used and regarded as important information while engaging with others. Through comprehension and recognition, voice recognition technology enables machines to convert human vocal signals into equivalent commands. Speech is the most effective form of expression for thoughts and feelings when learning new languages. In the survey we conducted this is very useful when we want to communicate with others. This project will convert speech to text or text to speech using deep learning technique using CNN (conventional neural networking), Just like Google’s google Assistant ,Apple’s SIRI, Samsung’s Bixby. A combination of speech to text conversion and text summarization is used in the suggested work. Applications that call for concise summaries of lengthy talks will benefit from this hybrid approach, which is quite helpful for documentation. Deep learning is a sort of AI and machine learning that mimics how people learn specific types of information. Nowadays, numerous applications use humanmachine interaction [1]. Speech is one of the interactional media. The primary difficulty in human-machine interaction is identifying emotions in speech.

 It can capture speech much faster than you can type.  You can use text-to-speech in real-time.  The software can spell with the same ability as any other writing tool. Helps those who have problems with speech or sight.

3. LITERATURE REVIEW The most crucial component of human communication is speech. Although there are many ways to express what we think and feel, speaking is often regarded as the primary form of communication. The Google API can be used to convert recorded speech to text. Because the retrieved text does not contain a period, it is challenging to split the content into sentences that were created using the Google API. In the suggested model, a period is added at the end of each phrase to distinguish them from one another. The theoretical algorithms used to construct voice recognition were explained in this study. The precise steps involved in voice recognition, such as biometrics acquisition, preprocessing, feature extraction, biometrics pattern matching, and recognition outcomes, are first described. The detailed introduction of speech recognition in biological features [3]. The primary procedures, recognition strategies,

2. OBJECTIVES The objective of voice recognition is to use linguistic and phonetic data to convert the input speech feature vector series into a sequence of words. A full voice recognition system, according to the system's structure, consists of a

© 2022, IRJET

|

Impact Factor value: 7.529

|

ISO 9001:2008 Certified Journal

|

Page 659


Turn static files into dynamic content formats.

Create a flipbook