International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 11 Issue: 04 | Apr 2024
p-ISSN: 2395-0072
www.irjet.net
Image Text To Speech Conversion with Raspberry-Pi Using OCR Shivkanya V Dahiphale1, Prof. S. J. Nandedkar2 1 PG Student, Electronics & Telecommunication Engineering, Aurangabad, India 2Professor, Electronics & Telecommunication Engineering, Aurangabad, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – In this paper, we have proposed an image text
research papers, the survey is done for the invention of new techniques. This paper is going to be a solution for the conversion of images into sound. The system includes the head-mounted video camera. Its design is portable and lowpower. In this work, images are converted into text, and then that text is converted using MATLAB coding, which is commonly used for image processing techniques. So, there is scope to increase the database of the proposed system [2]. The device consists of a portable device for assisting the visually impaired. The setup is foldable, and hence its portability is enhanced. It can be broken down into two parts, and it barely takes 5 seconds to be set up again. The two parts of the device are 1) the stand, onto which the Raspberry Pi board is mounted, and 2) a slot in the wooden board for the camera. 2) A plain slate that has slots for inserting the paper [3]. This paper presents an approach for text extraction and the conversion of it to speech. The OCR (optical character recognition) converts the text images into machine-encoded text and saves it in a text file. Tesseract is the OCR engine that is used for extracting the English text from the image and storing it in a text file. The text-to-speech engine converts text-to-speech output [4]. Rama Mohan Babu, p. Srimaiyee, a. Srikrishna used characters in a text of different shapes and structures. Text extraction may employ binarization or directly process the original image; it consists of a survey of existing techniques for page layout and analysis. Mathematical morphology is a geometricalbased approach to analyzing images. For the extraction of geometrical structures and representing shapes in many applications, it provides powerful tools. Morphological Feature Extraction (MPE) has been proven to be a powerful tool for character detection and document analysis, particularly when using dedicated hardware. They proposed an algorithm for text extraction based on morphological operations [5]. OCR technology allows a machine to automatically recognize a character through an optical mechanism. OCR is the method of translating images that contain text into machine-editable form. If we read a page in a language other than our own, we may recognize the various characters, but we may be unable to recognize words. However, on the same page, we are usually able to interpret numerical statements-the symbols for numbers are universally used [6]. Bhushan Sonawane, Kiran Patil, Nikhil Pathak, and Ram Gamane used the Microsoft Office Document Imaging OCR technology to extract text tokens, prototypes, and templates. Then they performed the following processes:
to speech converter using Raspberry Pi. It is very difficult to read text from text images or text-boards. Visual impairment is one of the greatest handicaps of humanity, especially in today’s world where information is communicated through text messages rather than voice. We have tried to extract and convert text from an image, i.e., capture an image that only contains text and convert it into speech. This is done using Raspberry Pi and Optical Character Recognition (OCR). The captured image undergoes several image processing steps to find only the part of the image which contains the text. Various tools are used to convert a new image (which only contains text) into speech. These tools include OCR software, TTS (Text to Speech), and the audio output can be heard through speakers or earphones. Key Words: OCR, Text Translator, TTS, Raspberry Pi, Visually impaired person, Text Extraction
1. INTRODUCTION Every year, the number of visually challenged persons are increasing due to eye diseases, age related causes, traffic accidents and other causes. As reading is one of the most important tasks in the daily routine (text is present everywhere) of humankind, visually impaired people face many difficulties. Speech gives support to the visually challenged persons for reading out the text. The focus of this research is that the visually challenged person can get information about text into audio format. This paper have presented design for a camera based reading system that extract text from image and identify the text characters and strings from the captured image and finally text will be converted into audio. The captured image goes through a series of image pre-processing steps to locate only that part of the image that contains the text and removes the background. The OCR and TTS process the image. Text Recognition (OCR) has become one of the most popular uses of technology in text recognition and AI. Optical Character recognition (OCR), is the process of converting scanned images of machine printed or handwritten text (numerals, letters, and symbols), into a computer format text [1].
1.1 Summary of Literature Review In this section, we present some previous research done to assist visually challenged people with text-to-speech technology. This literature review is used to study different image-text-to-speech conversion techniques. By using these
© 2024, IRJET
|
Impact Factor value: 8.226
|
ISO 9001:2008 Certified Journal
|
Page 545