“SKYE : Voice Based AI Desktop Assistant”

Page 1

“SKYE : Voice Based AI Desktop Assistant”

1,2,3 Student, Department of Computer Science & Engineering, GHRCEM, Pune, Maharashtra, India

4Professor, Department of Computer Science & Engineering, GHRCEM, Pune, Maharashtra, India

***

Abstract - Nowadays, the field of technology is rapidly advancing with each passing day. In the past, computers were only capable of performing limited tasks, but now, with the emergence of new technologies such as machine learning, artificial intelligence, deep learning, and a few others have advanced computer systems to the point where we can perform any sort of task with them. One of the popular technology, artificial intelligence (AI), is to achieve a natural dialogue between humans and machines. This paper demonstrates how our suggested voice assistant, which offers aid to individuals which are Visually impared and with disabilities. A voice assistant refers to a software application that utilizes voice technology to comprehend spoken language and generate artificial vocal responses. The goal is to create a Python-based desktop voice assistant that can assist users with a wide range of tasks without requiring them to interact with a keyboard. The purpose of this project is to explore the intelligent behavior of voice assistants and their potential applications for both daily use and education.

Key Words: Python, speech recognition, voice assistant, TTS (Text to Speech), STT(Speech to Text), Desktop Assistant, Artificial Intelligence.

1. INTRODUCTION

Intoday'sfast-pacedworld,weoftenfindourselveswantingtocompletetasksmoreefficientlybyusingvoicecommands insteadofrelyingsolelyonakeyboardandmouse.Thisisparticularlyusefulformultitasking,andcompaniessuchasGoogle, Amazon,andAppleareworkingtomakethistechnologymorewidelyavailable.Forinstance,wecaneasilysetremindersor alarmsusingvoicecommands.Withthisinmind,wehavedevelopedaplatformthatcanbeinstalledinanylocationandusedto assistpeoplewithvarioustasksthroughvoice-basedcommunication.

Avirtualassistantisanartificialintelligencetechnologydesignedtohelpuserswithbasictasksbyrespondingtonatural languagecommands.Thesystemworksbyconvertingaudiosignalsintodigitaldata,whichisthenanalyzedbythesoftware. ThispersonalizedspeechrecognitionprojectisbuiltonPythonandiscapableofrecognizingusercommands,interactingwith users,andcompletingtasksaccordingly.Examplesoftasksitcanperformincludegreetingusersbasedonthetimeofdayor event,playingmusic,providingweatherupdates,analyzingweatherconditionsandadvisingusersonwhetherit'ssafetogo out,openingapplicationsandfolders,creatingnewfolders,changingdirectories,sendingemails,andmore.Usingavirtual assistanttoperformthesetaskscansaveusersasignificantamountoftimeandeffort.

Focusingonwhatismostimportant,whetheritispersonalorprofessional,iscrucial.However,individualsoftenspendtoo muchtimeon mundanetasksthatcanbeautomatedwithpersonal assistants. Whenpeople are not familiar witha work environment,theymayhavetroublelocatingnecessaryapplicationssuchasbrowsers,IDEs,orothersoftware.Thiscanresult in hours wasted searching for applications and ultimately wasted time. To mitigate this issue, a voice-enabled personal assistantcanbeemployedtoautomatethisprocess.Bysimplyissuingavoicecommand,theassistantwilltakecareoftherest. Artificialintelligence-basedvoiceassistantshavenumerousapplications,includingITHelpdesk,homeautomation,HR-related tasks,andvoice-basedsearch.Voice-basedsearchisexpectedtobethefutureofnext-generationtechnology,withusersrelying heavilyonvoiceassistantsforalltheirneeds.Theseassistantswillalsobeparticularlyusefulforvisuallyimpairedindividuals. Worldwide,15%ofthepopulationhassomeformofdisability,with2-4%experiencingsignificantdifficultieswiththeirdaily functioning.Usingawebsitecanbechallengingforpeoplewithdisabilities,soweaimedtodevelopauniquewayforpeople withdifferentneedstoaccesstheinternet.Tocreateourproject,weusedVisualStudioCode,aswellasmodulesandlibraries suchaspyttsx3,SpeechRecognition,Datetime,Wikipedia,Smtplib,andmore.

2. LITERATURE REVIEW

Inmoderntimes,weteachourmachinestoemulatehumanthinkingandperformtasksindependently,resultinginmachines replacinghumanlabor.Asaresult,theconceptofvoiceassistantshasemerged,whichcanaccomplishvarioustasksforhumans basedonvoicecommands.Thevirtualassistanthastheabilitytocomprehendandsortoutparticulardirectivesprovidedby theuser,providingpertinentinformationinresponse.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page787

Voicecontrolisbecominganewwayforhumanstointeractwithmachines,whereanalogsignalsareconvertedintodigital wavesthroughspeechsignals.Smartphoneusagehasskyrocketedinrecentyears,resultinginextensiveuseofvoiceassistants such as Apple's Siri, Google's Assistant, and Microsoft's Cortana., and Amazon's Alexa. These voice assistants are created utilisingtechnologieslikeasvoicerecognition,speechsynthesis,andNaturalLanguageProcessing(NLP)toprovideconsumers withavarietyofapplicationsthatmaketheirlifeeasierandmorepleasant.

Deepak Shende and Ria Umabiya stated that AIVA, which was introduced in 2018, is an intelligent assistant, along with Microsoft'sCortanaandGoogleAssistantfromGoogle.Theirobjectiveistocreateapersonalassistantthatiscontrolledbyvoice commandsandcanperformavarietyoftasks,includingconductingInternetsearchesandpostingcommentsonsocialmedia platformssuchasFacebookandTwitter.Theassistantalsohasthecapabilitytoprovideweatherupdatesfortheuser'sregion. Simplevoiceinstructionscanreadilyachievethesethings.[5]

C.VimalaandV.Radhastatedthatamonghumans,speechisthemostprevalentmodeofcommunication.Automatedspeech recognition has become popular because humans tend to prefer machines that can communicate using speech, which is consideredthemostsophisticatedmethod.DTWandHMMarethemostcommonlyusedspeechrecognitiontechniques.MFCC, whichprovidesdistinctdimensionsofthesoundsignal,isusedforspeechfeatureextraction.Previousresearchhasshownthat MFCCismoreaccurateandrealisticthanothertechniquesusedforminingvoicecharacteristics.Thestudywasconductedin MATLAB,andtheresultsindicatethatthemachineiscapableofdetectingwordswithahighlevelofaccuracy.[4]

Dr.KshamaV.Kulhalliconductedasurveytocomparetheperformanceofthetopvoiceassistants,namelyGoogleassistant, Apple'sSiri,andMicrosoft'sCortana.ThesurveyresultsindicatedthatGoogleassistantprovidedthemostaccurateresponses comparedtotheotherassistants.Googleassistantwasabletoeasilyrecognizevariationsinvoice.[7]

Speechtechnologyisapopularandversatiletechnologythatcanbeusedforvariousapplications.Itenablesrobotstointeract withhumansinastructuredandappropriateway,providingusefulservices.Thestudycoversthefundamentalsofthespeech recognitionprocessanditsvariousmodelsandapplications,aswellasadescriptionoftheongoingresearchintothevarious techniques used in speech recognition systems. Speech recognition systems continue to advance and have limitless applications.[6]

Tulshansuggestedthatfrequenttypingcanleadtofingerinjuries.Topreventsuchissues,werequireasystemthatcanperform tasksbasedonvoicecommands.Thesystemwillrecognizetheuser'svoice,processtherecognizedwords,anddisplaythemon thescreeniftheyarerelevantandmeaningful.Followingthis,specifickeywordswillberecognizedtocompileandexecutethe program.[3]

3. TECHNOLOGIES USED

a. Python:

 Pythonisawell-likedprogramminglanguageatahighlevelthatisrecognizedforitsstraightforwardness, comprehensibility,anduser-friendliness.

 Python3.10.0isbeingutilisedinthedevelopmentofthevoiceassistantproject.

b. Visual Studio Code:

 Microsofthascreatedawell-knownopen-sourcecodeeditorcalledVisualStudioCode,oftenreferredto asVSCode.

 VSCodesupportsawiderangeofprogramminglanguagesandoffersfeaturessuchassyntaxhighlighting, codecompletion,debugging,Gitintegration,andextensionsthatcanbeinstalledtoaddmore functionality.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page788

4. METHODOLOGY

1: basicworkflowofmodel

TherearethreemodulesinthisAssistant.Thefirststepisfortheassistanttotakeuservoiceinput.Second,analysetheuser inputandtranslateittotheappropriateintentandfunction.Thethirdistheassistantprovidingtheuserwiththeoutcomeall along through speech. The assistant will first begin receiving human input. When the assistant receives the input, it will transformtheanaloguevoiceinputtodigitaltext.Iftheassistantisunabletoturnthevoiceintotext,itwillprompttheuserfor inputoncemore.Afterconverted,itwillbeginprocessingtheinputandmappingittoacertainfunction.Theoutputwillthenbe providedtotheuserbyvoicecommand.Usersrequestorinquirygetschopsintosegregatecommandswhichmakesiteasyto recognizebyourdesktopvoiceassistant.

 Incomparisontootherinquiries,oursissearchedinsidethecommandlist

 Thevoiceassistantreceivestheseordersthroughthecommandlist.

 Oncethevoiceassistantacceptsorreceivesacommand,itwillimmediatelydetermine theappropriateactiontotake.

 Iftheuser'sinquiryisnotunderstandable,thenthevoiceassistantwillaskforclarificationbeforeproceeding.

 Particularly,thevoiceassistantdetectswhatwewanttoget.

 Whenthevoiceassistantrecognisesthecommandanddeterminesthatitcanproceed,itwillprovidethepersonor userwiththenecessaryinformation.

Forexample,Whenapersonsays,"SKYE,openWhatsapporwikipedia,"thevoiceassistantwilllistentothecommandandtake theappropriateaction,suchasopeningtherelevantwebsite.Aftertheuserhasfinishedspeaking,thevoiceassistantwillpause forafewsecondstoensureithascapturedtheentirerequest,andthenitwillsearchitsdatabasefortheinquirytoprovidethe relevantresult.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page789
Fig

Thefollowingareseveralmodulesemployedtocreateavoiceassistantcapableofperformingvariouscommonfunctions.

4.1 Packages used :

a. Speech: RecognitionTheSpeechRecognitionlibraryisemployedtocapturespokenwordsfromamicrophoneand processthemtodeterminetheirmeaningandconvertthemintotextformat.Thislibraryenablesmachinesystems tocomprehendandinterprethumanlanguage.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page790
Fig 2: detailedworkflowofmodel Fig.3 differentpackagesused

b. Pyttsx3: ThePyttsx3library,whichstandsforPythontexttospeech,isutilizedtoenableourvoiceassistantto communicatewithusaudibly.Itsupportsvarioustexttospeechenginesthatcanconverttextintospeech,allowing thevoiceassistanttospeaktoitsuser.Wecanselectthevoicetobemaleorfemalebasedonourpreferences.

c. Wikipedia: ToretrieveinformationfromWikipediaonanytopic,performasearch,orseeksolutionstoaquery,we mustemploytheWikipedialibrary.ThisPythonlibraryrequiresaninternetconnectiontoobtainresults,anditcan presentthefindingstotheuserinbothtextandvoiceformat.

d. Datetime: Thismoduleiscrucialforfacilitatingdateandtime-relatedfunctionalities.Itisusefulwhenauserneeds toaccessthecurrentdateandtimeorwhentheywanttoscheduleataskataspecifictime.

e. PyWhatkit: PyWhatKit is a Python library that offers several functionalities, including sending messages and imagesviaWhatsApp,playingYouTubevideos,convertingimagestoASCIIart,sendingemails,andmore.

f. OS (Operating System): TheOSmoduleinPythonisutilizedforinteractingwiththeoperatingsystem.Specifically, wecanusethe'Startfile()'functiontolaunchanyinstalledapplicationonoursystem.

g. webbrowser: ThewebbrowsermoduleinPythonoffersauser-friendlywaytoopenandshowwebpagesinaweb browserwindow.Itprovidestheoptiontoopenawebpageineitheranewbrowserwindoworthecurrentone, dependingonyourpreferences.

5. RESULTS

a. Asking for today’s date

AsshowninFig.4.Wehaveaskedskyeforthecurrentdateandhetoldusthecurrentdate.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page791
Fig.4. Outputofscreenfordisplayingdate

AsshowninFig.5.WehaveaskedskyetoOpenGoogle.Itreceivestherequestandperformsactiononit

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page792
b. Opening Google Fig.5. OutputofscreenforGooglesearch c. Opening youtube and searching tom and jerry Fig.6. Outputofscreenforopeningyoutube

6. FUTURE SCOPE

Atpresent,theprogramisrestrictedtotheEnglishlanguage;however,thereareintentionstobroadenitsaccessibilityto otherlanguagesshortly.Thegoalistocreateareliablesoftwarethatrequiresminimaltypingandcanbeoperatedentirely throughvoicecommands,providingaseamlessuserexperience.Toensurewidespreadadoption,it'simportanttominimizethe software'srelianceonthelocalenvironmentandoperatingsystem.

Ourvirtualassistantwillsoonhavearead-aloudfunctionthatwillallowindividualswithdisabilitiestolistentoandaccess desiredinformationfromvariouswebresources.Currently,thisfeatureisonly availableonPC,butinthefuture,itwillbe availableonalldevices.Additionally,thefeaturewillbedesignedwitheaseofuseanduser-friendlinessinmindforindividuals withdisabilities,ensuringthattheydonotneedconstantsupervisiontouseiteffectively.

7. CONCLUSIONS

AI-poweredvoiceassistantsfordesktopshavetransformedthewayweinteractwithourcomputers.Byutilizingadvanced algorithms and technologies, these assistants can understand and interpret user speech, allowing for more efficient and effortless task performance. With a broad range of functions such as messaging, calling, and playing music, these voice assistants have become an indispensable tool for many users. As technology progresses, we can anticipate these voice assistantstobecomeevenmoresophisticated,furthersimplifyingourlives.Thisprojectwillbenefitindividualsofallagesand thosewithdisabilitiesoruniquecircumstances.Thepersonalvoiceassistantwillbeuser-friendlyandminimizetheneedfor manual human efforts to accomplish various tasks. The current voice assistant system operates exclusively on desktops. However,themodularnatureofthesystemallowsforadditionalfeaturestobeaddedwithoutdisruptingthecurrentsystem functionalities.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page793
AsshowninFig.7.WehaveaskedskyetoOpenYoutubeandsearchtomandjerryvideoonyoutubeskyereceivesthe requestandperformsthegiventask. Fig.8. Output of screen for running a video on youtube Fig.7. Outputofscreenforrunningavideoonyoutube

8. REFERENCES

[1] “ASKITheVirtualDesktopAI-BasedVoiceAssistant”,ISSN2581-9429Volume02,Number1(2022).

[2] “DesktopvoiceguideusingpythonandArtificialIntelligence,”ISSN2582-5208Volume04,Number5(2022).

[3] Tulshan, Amrita & Dhage, Sudhir. (2019). “Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa”, 4th InternationalSymposiumSIRS2018,Bangalore,India,September19–22,2018,RevisedSelectedPapers.10.1007/978981-13-5758-9_17.

[4] V.RadhaandC.Vimala,“Areviewonspeechrecognitionchallengesandapproaches,”doaj.org,vol.2,no.1,pp.1–7,2012.

[5] Deepak Shende. Ria Umabiya, Monika Raghorte, Aishwarya Bhisikar. Anup Bhange. "Al Based Voice Assistant Using Python",InternationalJournalofEmergingTechnologiesandInnovativeResearch(www.jetir.org),ISSN2349-5162,Vol.6, Issue2,pageno.506-509,February-2019.

[6] Srivastava S., Prakash S. (2020) Security Enhancement of IoT Based Smart Home Using Hybrid Technique. In: BhattacharjeeA.,BorgohainS.,SoniB.,VermaG.,GaoXZ.(eds)MachineLearning,ImageProcessing,NetworkSecurityand Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_44

[7] Dr.KshamaV.Kulhalli,Dr.KotrappaSirbi,Mr.AbhijitJ.Patankar,"PersonalAssistantwithVoiceRecognitionIntelligence", InternationalJournalofEngineeringResearchandTechnology.ISSN0974-3154Volume10,Number1(2017)

[8] “ComparativeAnalysisofSmartVoiceAssistants” IEEEInternationalConferenceonComputationSystemand InformationTechnologyforSustainableSolutions(CSITSS)2021.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page794

Turn static files into dynamic content formats.

Create a flipbook