Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator Application

Page 1

Audio computing Image to Text Synthesizer - A Cutting-Edge Content Generator Application

1Student, School of Engineering, Ajeenkya DY Patil University, Pune, Maharashtra, India

2Student, School of Engineering, Ajeenkya DY Patil University, Pune, Maharashtra, India

3Professor, Ajeenkya DY Patil University, Pune, Maharashtra, India

Abstract - In the anti-establishment world, there is a firstrate extent in the utilizationofdigitaltechnologicalknow-how to be aware of how and a vary of methods are on hand for a character to catch images. Such images may additionally comprise necessarytextualinformationthat the customer may additionally desire to edit or store digitally. This can be completed the utilization of Optical Character Recognition with the help of Tesseract OCR Engine. OCR is a branch of artificial Genius that is used in features to apprehend textual content material from scanned documents or images. The recognized textual content materialcanmoreover be changed to audio sketch to aid visually impairedhumanbeings hear the data that they wish to understand and additionally to the illiterate. So, truly at the existing day purposes convert image to textual content, picture to handwritten notes and later provide its audio contents is the use of Optical Character Recognition (OCR) tool.

Now, we additionally introduced new attribute like image to text, textual content material to speech, and we can convert the textual content material to any language as per individual requirement, it will be increased available and accustomed way to do. All the journal, have reply in addition we’re alongside with translator that canbe google translatedbundle deal for our project. In this we will be exploring wonderful bundle and mission will comprise web page the region customer can add photograph and in the returned of at the backend it will process enter and ship lower back aspect in form of API. This utility can be used for character focus from scanned archives so that information can be digitalized. Also, the data can be converted to audio form to aid visually impaired people obtain the records easily. In this, we can prolong the utility to that is can apprehend greater languages, one of a form fonts. Various accents can moreover be delivered for audio files in the upcoming future.

Key Words: OCR (Optical character recognition), translator, Hand written notes, Tesseract, Text-toSpeech (TTS), Tesseract, OCR Engine.

1.INTRODUCTION

Audio computing Text and Image Synthesizer makes it doabletoextracttextualcontentmaterialfrompicturesto automate the processing of texts on images, videos, and scanneddocuments.Inthis,weshowupathowtomanner

an image to textual content material with React and Tesseract.js(OCR), pre-process images, and deal with the obstaclesofTesseract(OCR)andlaterprovideanoutputin audiostructurewhichcanbedownloadedandsavedforthe futurepreference.Textiswithoutproblemsonhandinmany belongingsinthestructureofdocuments,newspapers,faxes, printed information, handwritten notes, etc. Many people sincerely scan the report to preserve the records in the computers.Whenadocumentisscannedwithascanner,itis savedintheshapeofimages.Butthesephotographs areno longereditableanditisveryhardtofindoutwhattheman or woman requires as they will have to go via the entire image,inspectingeachlineandphrasetodetermineifitis relevant to their need. Images moreover take up more residence than phrase archives on the computer. It is fundamentaltobeinaroletomaintainthisrecordsinsucha waysothatitwillenduplessdifficulttosearchandeditthe data. There is a growing demand for features that can apprehend characters from scanned archives or captured photographsandmakethemeditableandbesidestroubles reachable[1]

Asanalyzingisofexcessivemagnitudeinthedaywiththe aidofdayhobbies(textbeingcurrentinalllocationsfrom newspapers,commercialenterpriseproducts,sign-boards, digital shows etc.) of mankind, visually impaired human beings face a lot of difficulties. Our software assists the visually impaired by way of the usage of reading out the textualcontenttothemandadditionallytotheilliterate[2].

This utility can be useful in many methods they are as follows;

1.1 Digitalizing Documents

An OCR application can convert printed or handwritten archives intodigital text format, makingitless difficultto store,edit,andsharetheinformation

1.2 Saves Time

Rather than manually typing out textual content material fromadocument,anOCRapplicationcanunexpectedlyand exactlyextractthetext,savingtimeandreducingthehazard oferrors.Itadditionallyoffersanoutputofanaudiofilethat can be downloaded and pay attention when in your free time.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page127
***
Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072
International Research
Volume:

1.3 Accessibility

For visually impaired individuals, an OCR application can converttextualcontentmaterialfromaphotointoadesign that can be learned about aloud via way of text-to-speech convertorapplication.

1.4 Language translation

Forpeoplewhoareilliterate,OCRapplicationcantranslate textual content material from one language to another, makingitmuchlesstoughforhumanbeingstounderstand and speak with others who talk unique languages.

1.5 Data Extraction

An OCR application can be used to extract precise information from documents, such as names, dates, and addresses,makingitmuchlesschallengingtoanalyzeand puttogetherthedata.

improveperformanceandshortendevelopmenttimes,such as automatic code splitting, hot module replacement, and optimised image loading. It also has a sizable and vibrant communityandavarietyofpluginsandlibrariesthatmaybe usedtoincreaseitscapability.

2.3 Flask

A well-liked open-source Python web framework called Flaskenablesprogrammerstocreatewebappsquicklyand effortlessly.SinceFlaskisamicro-framework,itissmalland doesn't need any specialised libraries or tools to operate. Developers can select the tools and libraries they want to utilise, making it flexible and simple to use. Flask offers a wide range of plugins and extensions, making it simple to addfunctionalitytotheapplication[5].

2.4 Firebase (Cloud Storage)

GoogleoffersdeveloperstheFirebaseCloudStorageservice, a cloud-based storage solution that enables them to store andserveuser-generatedmaterialincludingphotographs, videos, and audio files. Built on Google Cloud Storage, a dependableandscalableobjectstorageservice,isFirebase Cloud Storage. It is simple to use and integrate Firebase CloudStorageintoonlineandmobileapplications.Itoffersa straightforward API that enables developers to handle metadataandaccesscontrolaswellasuploadanddownload files.Inordertoprotectuserdata,FirebaseCloudStorage additionallyoffersbuilt-insecuritymeasureslikeencryption atrestandintransit.

2.5 googletrans

2. DETAIL DESCRIPTION OF TECHNOLOGY USED

2.1 Tesseract

Tesseractisanopticalcharacterrecognition(OCR)engine developed by Google. Its primary purpose is to recognize text embedded in images and convert it into machinereadabletextformat.Tesseract'sproficiencyatquicklyand accuratelyidentifyingwrittentextinavarietyoflanguages, suchasEnglish,French,Spanish,German,andmanymore,is well-known. Applications for Tesseract include data extraction,documentmanagement,andmachinetranslation. It is simple to use and implement because it can be integrated into several computer languages, including Python, Java, and C++. Tesseract can recognise text from scanned documents and supports a number of image formats,includingJPG,PNG,andTIFF[7].

2.2 Next.js

Thewell-knownopen-sourcewebframeworkNext.js,which isbuiltonReact,aidsprogrammersincreatingserver-side rendered (SSR) and statically generated web apps. Additionally,Next.jshasanumberofbuilt-incapabilitiesthat

UsingGoogleTranslatetotranslatetextismadesimplewith the help of the googletrans Python package. Text across differentlanguagesistranslatedusingtheGoogleTranslate API. Python application developers may rapidly and efficientlytranslatetextacrosslanguagesusinggoogletrans. More than a hundred languages are supported, including widelyusedoneslikeEnglish,Spanish,French,German,and ChineseaswellasuncommononeslikeAfrikaans,Bengali, andIcelandic.Thesimplicityanduseofgoogletransaretwo ofitsmainadvantages.ItoffersastraightforwardAPIthat enablesprogrammerstotranslatetextusingverylittlecode. Thelibrarytakescareoftherestafterdevelopersspecifythe sourceandtargetlanguages.

2.6 gTTS

UsingGoogle'sText-to-SpeechAPI,developerscantranslate textintospokenlanguageusingthegTTS(GoogleText-toSpeech)Pythonpackage.Itoffersasimpleuserinterfacethat makesitpossibletocreateaudiofilesfromtextinarangeof languages. Python programmers may rapidly and simply generatespokenlanguagefilesfromwrittentextwithgTTS. It is capable of speaking a broad variety of dialects and languages,includingwidelyusedoneslikeEnglish,Spanish,

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page128
Fig -1:ImagetoTextAudioConvertorapplication

French, German, and Chinese as well as less widely used oneslikeBengali,Gujarati,andSwahili.Thesimplicityand usabilityofgTTSaretwoofitsmainadvantages.Itoffersa straightforward API that enables programmers to create text-to-speechconversionsusingverylittlecode[4].

2.7 PyWhatKit

Python'spywhatkitpackagedealgivesasimpleinterfacefor turning text into handwritten notes. It creates images of handwrittennotesthatappearlikehandwritingbyusingthe Pillowlibrary.Itiseasyfordeveloperstousebecauseitis constructed on pinnacle of some of the most widespread Python modules, together with PyAutoGUI, Pillow, and Paperclip.ThesimplicityanduseofPyWhatKitaretwoofits primary advantages. It affords a simple API that enables programmers to complete difficult duties with a little quantity of code. Developers can use pywhatkit to, for instance, send emails with attachments, convert textual contenttohandwriting,oreventakescreenshots.

3. METHODOLOGY

OCR (optical character recognition) is a technology that converts embedded texts from images into a text. Using, Tesseract-OCRlibraryitcanextracttextfromimagesandthat textcanbesavedincloudthatisFirebasecloudstorage.It can take one text input and convert that text into another languagetextbyselectingthelanguagefromdropdownusing thegoogletransAPIprovidedbygoogle.Andwecanalsothe textinFirebasestorage.Inthenextpart,theapplicationis convertingthetextintoaudiobyusinggTTSthatisGoogle texttospeech,saveitinFirebaseanddisplayitonUI(user interface).Andthen,lateritcanconverttextintohandwritten notes by using PyWhatKit library and display it on application.

3.2 Select Language

To obtain different languages from text we have to select languagestoconvertthetextinvariouslanguagesobtained fromtheimage.

3.3 Image Pre-Processing

This step consists of shade to grey scale conversion, part detection, noise removal, warping and cropping and thresholding.Thephotographistransformedtogreyscaleas manyOpenCVfeaturesrequiretheinputparameterasagrey scaleimage.Thispermitsustobecomeawareofandextract solely that location which carries textual content and eliminates the undesirable background. In the end, Thresholding is accomplished so that the picture appears likeascanneddocument.Thisiscarriedouttopermitthe OCRtoeffectivelyconvertthephotographtotext.

3.4 Image to Text Convertor

In the given figure(fig.3) suggests the go with the flow of Text-To-Speech. The first block is the photograph preprocessing modules and the OCR. It converts the preprocessedimage,whichisin.png/jpg/jpegform,toa.txtfile. WearetheuseoftheTesseractOCR.

Fig -2:WorkingFlowchart

3.1 Image Input

Toconvertandimagetotextwerequireddataasitcontains imagesinvariousformat.Theveryfirststepistouploadthe imageforpre-processingtoextracttextcontent.

3.5 Text to Audio-Convertor

Inthebelow(fig4)itconvertsthe.txtfiletoanaudiooutput. Here,thetextualcontentistransformedtospeechtheuseof a speech synthesizer. This Audio file can be generated in variousotherlanguages.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page129
Fig -3:ConversionofEnglishtoVariousLanguages

RESULTS

Text extraction from photos is really useful in many true worldapplications.Thedatathatissavedintextualcontentis huge and there is favour to store this statistics in such a manner that can be searched except problem every time required.Eliminationoftheuseofpaperisoneofthestepsto improvement in the course of a world of electronics. Also, recordsthatcanbechangedtoaudioshapeisawaytoease the lives of visually impaired people. Likewise, the textual contentmaterialidentifiedcanbetranslatedintoavarietyof languagesandcanbeprocessedinthechosenlanguageinto speechordocument.Theknowledgeablestatisticsiscreated forallonhandfontsandhandwrittentextsinEnglishsothat the OCR will be capable to convert any textual content reachable in the photo into text. The computing device moreover acknowledges textual content in one-of-a-kind

patternsorfontsandtechniquesittobereachableforprementioned elements such as conversion to speech or documentandmoreoverhelpstranslation.

Thewebsiteadditionallyacknowledgeshandwrittentextual content and strategies it to be on hand for pre-mentioned points such as conversion to speech or file and also helps translation. The method of the extraction of the textual contentmaterialcanbeconvertedintoaudioitsbeaccuracy oftheextractionisextrastudytoanyothertechniqueitsbe veryspeedytocarriedoutandusetotheandroidutilityitcan beused.ThesoundexcellentoftheuseofTTSitsbegood.

This application lets in its user’s to understand textual contentfrompicsandconvertitintodocumentandspeech. Thetextualcontentmaterialcanbeofaquantityoflanguages anditcanmoreoverbetranslatedtoa rangeoflanguages. Thequintessentialfunctionofthesystemisitspotentialto convert written textual content material into handwritten noteswhichcanlaterbetransformedtoanyone-of-a-kind languageorintoaudiofile.Theconversionofmassivevolume ofimagesintotextualcontentwill makeitlessdifficultfor translationandcanbeusedtoconverttoaudiofileasnicely asinthestructureofhandwrittennotesasshowinthefig.5

4.1 Performance

Theprecision-recallcurveandF1scoreareusedtovisualise the precision-recall curve and determine the model's performance. A dataset of 100 photos containing ground truthtextandassociatedOCRoutputfromTesseractisused toevaluatethemodel.

ToevaluatetheperformanceofOCRandcalculatetheF1 score.

 Truepositive(TP):TheOCRresultagreeswiththe sourcetext.

 Falsepositive(FP):TheOCRoutputdiffersfrom thetextusedasthebasisforcomparison.

 Falsenegative(FN):TheOCRfailedtorecognise thegroundtruthtext.

Themodelrecognisedthetextproperlyin85oftheimages (TP),wronglyin5oftheimages(FP),andnotatallin10 images(FN)asshowninthefig.6

 Precision=TP/(TP+FP)=85/(85+15)= 0.9444

 Recall=TP/(TP+FN)=85/(85+10)=0.8974

 F1score=2*(precision*recall)/(precision+ recall)=2*(0.9444*0.8974)/(0.9444+0.8974) =0.9189

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page130
Fig -4:TexttoAudioGenerator 3.6 Image to Handwritten Notes This application can also convert the image text into the handwrittenformatwiththehelpofPyWhatKit. Fig -5:ConversionofTexttoHandwrittennotes 4.

A confusion matrix is a method for assessing how well a categorizationmodelisworking.Itdisplayshowmanytrue positives,falsepositives,falsenegatives,andtruenegativesa modelcorrectlypredictedforacertaincollectionofdataas showinthefig.7

6. CONCLUSIONS

Our application helps the users to understand textual content of wide number of languages from images and convertthemintoTextandlatertospeech.Italsoconsistsof the characteristic of translation of textual content into a variety of languages. Many famous written works can be translatedintoanumberlanguagesforthemtoattainspecial people. This approach can take a look at textual content material from a range of sources, and even generate synthesizedspeechbyusingaudio.Italsoconvertstextual contentintotheformofhandwrittennotestounderstand easilyasshowninfig(5).Itismoreconvenienttouse,itis highlysecure,canbeusedanywhereandaccurate.

REFERENCES

[1] NishaPawar,ZainabShaikh,PoonamShinde,Prof.Y.P. Warke, "Image to Text Conversion Using Tesseract," International Research Journal of Engineering and Technology(IRJET)Feb02,2019

[2] AshaG.Hagargund,SharshaVanriaThota,MitadruBera, EramFatimaShaik,"Imageto SpeechConversionfor Visually Impaired," International Journal of Latest ResearchinEngineeringandTechnology(IJLRET),ISSN: 2454-5031,Volume03,June062017

[3] Nivetha.S, Kameshwari.S, "Image to Text and Speech Converter," International Research Journal of Engineering and Technology (IRJET), e-ISSN: 23950056; p-ISSN: 2395-0072, Volume 07, Issue Nov 11 2020

[4] Arjun Pratap, Kunal Wavhule, Viraj Patil, Vaibhav Narawade, "OCR-WRITTEN TEXT TO AUDIO CONVERTER"IJARIIE,ISSN(O)-2395-4396,2022

[5] Umatia, S., Varma, A., Syed, A., Tiwari, K., & Shah, F. (2022, November 30). Text Recognition from Images. International Journal for Research in Applied Science and Engineering Technology, 10(11), 1003–1009. https://doi.org/10.22214/ijraset.2022.47498

5. FUTURE SCOPE

Inthefuture,wecanprolongtheapplicationviaincluding morelanguages,exceptionalfontsandimprovehandwritten notes. Various accents can moreover be added for audio data.Initially,wetakeonlyoneimageatatimeasanenterin thefuturewecanaddamultiplenumberofimagesforpreprocessing.Notonlyimageswecantakeanytypeofvideos andbreakdownintoframesandthatimageobtainedfrom the video can additionally be processed. This will help in makingsubtitles.

[6] KumarGarai,Sayan,OjaswitaPaul,UpayanDey,Sayan Ghoshal, Neepa Biswas, and Sandip Mondal. "A Novel Method for Image to Text Extraction Using TesseractOCR."AmericanJournalofElectronics&Communication 3,no.2(2022):8-11

[7] Lestari,IkhaNovieTri,andDadangIskandarMulyana. "ImplementationofOCR(OpticalCharacterRecognition) UsingTesseractinDetectingCharacterinQuotesText Images." Journal of Applied Engineering and TechnologicalScience(JAETS)4,no.1(2022):58-63

[8] Patil, Shruti, Vijayakumar Varadarajan, Supriya Mahadevkar, Rohan Athawade, Lakhan Maheshwari,

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page131
Fig -6:Precision-Recallcurve Fig -7:Confusionmatrix

ShrushtiKumbhare,YashGarg,DeepakDharrao,Pooja Kamat, and Ketan Kotecha. 2022. "Enhancing Optical CharacterRecognitiononImageswithMixedTextUsing SemanticSegmentation"JournalofSensorandActuator Networks 11, no. 4: 63. https://doi.org/10.3390/jsan11040063

[9] KarthikeyanG,BharanidharanG,JeevanandhamD,and BalajiBG.2022.“TextRecognitionImagesUsingOCR”. InternationalJournalofProgressiveResearchinScience andEngineering3(05):57-60.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 05 | May 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page132

Turn static files into dynamic content formats.

Create a flipbook