International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 04 | Apr -2017
p-ISSN: 2395-0072
www.irjet.net
Integration of Speech, Image & Text Processing Technologies Akhil S. Deshpande1, Shreyas S. Vaidya2, Pravin B. Swami3, Pavan R. Jaiswal4 1Student,
Computer Engineering (B.E.), P.I.C.T, Pune, Maharashtra, India Computer Engineering (B.E.), P.I.C.T, Pune, Maharashtra, India 3Student, Computer Engineering (B.E.), P.I.C.T, Pune, Maharashtra, India 4Professor, Computer Engineering (B.E.), P.I.C.T, Pune, Maharashtra, India 2Student,
---------------------------------------------------------------------***--------------------------------------------------------------------explains why humans want to have speech as Abstract - In everyday life, speech is considered as one communication/interaction medium with computers as well. the most important medium of communication. While conveying the message the most widely used form can be 1.1 Speech Input and Speech Output termed to be a speech signal. The main objective behind this work is to utilize this speech signal to simplify everyday life of common people. In today’s world, technology is hugely gaining popularity as it is meant to simplify the day-to-day life of people but on a deeper note it is in turn making things complicated as a person need to go through numerous number of automated client support systems. A person has to deal with huge amount of text at every moment thus the use of speech signal has become important especially in multitasking. Therefore, this paper proposes a design to satisfy general requirements which uses three methodologies namely image-to-text, text-to-speech and speech-to-text. It provides a simple working (processing) of these methodologies and a way in which these can be collaborated.
In general, a speech-based user interface requires both, speech input (recognition) and speech output (speech synthesis). When we think of these two, several arguments (merits) come along as follows. Speech is convenient as it makes hands and eyes free for other activities. In addition to that communication with a machine and other humans is simultaneously possible. In this kind of system design the user is not bound to a fixed place, has freedom of movement and orientation etc. The system especially can be used by visually impaired and other disabled people (e.g. physically handicapped) gracing it a commendable social value. However, there are few thing which we need to take care of. For example, the speech input can be disturbing for the environment, the recognizers are extremely sensitive against environment noise. Above all, background speaker, for some applications (e.g. those with high security requirements) the recognition accuracy might be insufficient. Thus usually high efforts for system training are necessary.
Key Words: Text-to-Speech, Image-to-Text, Speechto-Text, Unit Selection Synthesis, Natural Language Processing, Image Processing, Text Processing
1.2 STT, TTS AND ITT
1. INTRODUCTION
When a user speaks to a conversational interface, the system has to be able to recognize what was said. The speech-to-text (STT) component processes the acoustic signal that represents the spoken utterance and outputs a sequence of word hypotheses, thus transforming the speech into text. The other side of the coin is text-to-speech synthesis (TTS), in which written text is transformed into speech. There has been extensive research in both these areas, and striking improvements have been made over the past decade. In the following sections, an overview of the processes of STT and TTS is provided.
The most important form of communication is speech communication rather communication is defined by speech itself. In our everyday life, we need to communicate with other people in order to carry out our tasks. This is referred as man-to-man communication. But there exists one another aspect of communication known as man-to-machine communication. In this era of technology a person needs to interact with a large number of client support systems to get done with simplest of tasks. This becomes tedious and time consuming if the communication is through text. This
Š 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
|
Page 251