“SKYE : Voice Based AI Desktop Assistant”
Hardik Muley1 , Jaydeep Ghosh2 , Ankit Lal Sinha3 , Prof. Padmavati Sarode 41,2,3 Student, Department of Computer Science & Engineering, GHRCEM, Pune, Maharashtra, India
4Professor, Department of Computer Science & Engineering, GHRCEM, Pune, Maharashtra, India
***
Abstract - Nowadays, the field of technology is rapidly advancing with each passing day. In the past, computers were only capable of performing limited tasks, but now, with the emergence of new technologies such as machine learning, artificial intelligence, deep learning, and a few others have advanced computer systems to the point where we can perform any sort of task with them. One of the popular technology, artificial intelligence (AI), is to achieve a natural dialogue between humans and machines. This paper demonstrates how our suggested voice assistant, which offers aid to individuals which are Visually impared and with disabilities. A voice assistant refers to a software application that utilizes voice technology to comprehend spoken language and generate artificial vocal responses. The goal is to create a Python-based desktop voice assistant that can assist users with a wide range of tasks without requiring them to interact with a keyboard. The purpose of this project is to explore the intelligent behavior of voice assistants and their potential applications for both daily use and education.
Key Words: Python, speech recognition, voice assistant, TTS (Text to Speech), STT(Speech to Text), Desktop Assistant, Artificial Intelligence.
1. INTRODUCTION
Intoday'sfast-pacedworld,weoftenfindourselveswantingtocompletetasksmoreefficientlybyusingvoicecommands insteadofrelyingsolelyonakeyboardandmouse.Thisisparticularlyusefulformultitasking,andcompaniessuchasGoogle, Amazon,andAppleareworkingtomakethistechnologymorewidelyavailable.Forinstance,wecaneasilysetremindersor alarmsusingvoicecommands.Withthisinmind,wehavedevelopedaplatformthatcanbeinstalledinanylocationandusedto assistpeoplewithvarioustasksthroughvoice-basedcommunication.
Avirtualassistantisanartificialintelligencetechnologydesignedtohelpuserswithbasictasksbyrespondingtonatural languagecommands.Thesystemworksbyconvertingaudiosignalsintodigitaldata,whichisthenanalyzedbythesoftware. ThispersonalizedspeechrecognitionprojectisbuiltonPythonandiscapableofrecognizingusercommands,interactingwith users,andcompletingtasksaccordingly.Examplesoftasksitcanperformincludegreetingusersbasedonthetimeofdayor event,playingmusic,providingweatherupdates,analyzingweatherconditionsandadvisingusersonwhetherit'ssafetogo out,openingapplicationsandfolders,creatingnewfolders,changingdirectories,sendingemails,andmore.Usingavirtual assistanttoperformthesetaskscansaveusersasignificantamountoftimeandeffort.
Focusingonwhatismostimportant,whetheritispersonalorprofessional,iscrucial.However,individualsoftenspendtoo muchtimeon mundanetasksthatcanbeautomatedwithpersonal assistants. Whenpeople are not familiar witha work environment,theymayhavetroublelocatingnecessaryapplicationssuchasbrowsers,IDEs,orothersoftware.Thiscanresult in hours wasted searching for applications and ultimately wasted time. To mitigate this issue, a voice-enabled personal assistantcanbeemployedtoautomatethisprocess.Bysimplyissuingavoicecommand,theassistantwilltakecareoftherest. Artificialintelligence-basedvoiceassistantshavenumerousapplications,includingITHelpdesk,homeautomation,HR-related tasks,andvoice-basedsearch.Voice-basedsearchisexpectedtobethefutureofnext-generationtechnology,withusersrelying heavilyonvoiceassistantsforalltheirneeds.Theseassistantswillalsobeparticularlyusefulforvisuallyimpairedindividuals. Worldwide,15%ofthepopulationhassomeformofdisability,with2-4%experiencingsignificantdifficultieswiththeirdaily functioning.Usingawebsitecanbechallengingforpeoplewithdisabilities,soweaimedtodevelopauniquewayforpeople withdifferentneedstoaccesstheinternet.Tocreateourproject,weusedVisualStudioCode,aswellasmodulesandlibraries suchaspyttsx3,SpeechRecognition,Datetime,Wikipedia,Smtplib,andmore.
2. LITERATURE REVIEW
Inmoderntimes,weteachourmachinestoemulatehumanthinkingandperformtasksindependently,resultinginmachines replacinghumanlabor.Asaresult,theconceptofvoiceassistantshasemerged,whichcanaccomplishvarioustasksforhumans basedonvoicecommands.Thevirtualassistanthastheabilitytocomprehendandsortoutparticulardirectivesprovidedby theuser,providingpertinentinformationinresponse.
Voicecontrolisbecominganewwayforhumanstointeractwithmachines,whereanalogsignalsareconvertedintodigital wavesthroughspeechsignals.Smartphoneusagehasskyrocketedinrecentyears,resultinginextensiveuseofvoiceassistants such as Apple's Siri, Google's Assistant, and Microsoft's Cortana., and Amazon's Alexa. These voice assistants are created utilisingtechnologieslikeasvoicerecognition,speechsynthesis,andNaturalLanguageProcessing(NLP)toprovideconsumers withavarietyofapplicationsthatmaketheirlifeeasierandmorepleasant.
Deepak Shende and Ria Umabiya stated that AIVA, which was introduced in 2018, is an intelligent assistant, along with Microsoft'sCortanaandGoogleAssistantfromGoogle.Theirobjectiveistocreateapersonalassistantthatiscontrolledbyvoice commandsandcanperformavarietyoftasks,includingconductingInternetsearchesandpostingcommentsonsocialmedia platformssuchasFacebookandTwitter.Theassistantalsohasthecapabilitytoprovideweatherupdatesfortheuser'sregion. Simplevoiceinstructionscanreadilyachievethesethings.[5]
C.VimalaandV.Radhastatedthatamonghumans,speechisthemostprevalentmodeofcommunication.Automatedspeech recognition has become popular because humans tend to prefer machines that can communicate using speech, which is consideredthemostsophisticatedmethod.DTWandHMMarethemostcommonlyusedspeechrecognitiontechniques.MFCC, whichprovidesdistinctdimensionsofthesoundsignal,isusedforspeechfeatureextraction.Previousresearchhasshownthat MFCCismoreaccurateandrealisticthanothertechniquesusedforminingvoicecharacteristics.Thestudywasconductedin MATLAB,andtheresultsindicatethatthemachineiscapableofdetectingwordswithahighlevelofaccuracy.[4]
Dr.KshamaV.Kulhalliconductedasurveytocomparetheperformanceofthetopvoiceassistants,namelyGoogleassistant, Apple'sSiri,andMicrosoft'sCortana.ThesurveyresultsindicatedthatGoogleassistantprovidedthemostaccurateresponses comparedtotheotherassistants.Googleassistantwasabletoeasilyrecognizevariationsinvoice.[7]
Speechtechnologyisapopularandversatiletechnologythatcanbeusedforvariousapplications.Itenablesrobotstointeract withhumansinastructuredandappropriateway,providingusefulservices.Thestudycoversthefundamentalsofthespeech recognitionprocessanditsvariousmodelsandapplications,aswellasadescriptionoftheongoingresearchintothevarious techniques used in speech recognition systems. Speech recognition systems continue to advance and have limitless applications.[6]
Tulshansuggestedthatfrequenttypingcanleadtofingerinjuries.Topreventsuchissues,werequireasystemthatcanperform tasksbasedonvoicecommands.Thesystemwillrecognizetheuser'svoice,processtherecognizedwords,anddisplaythemon thescreeniftheyarerelevantandmeaningful.Followingthis,specifickeywordswillberecognizedtocompileandexecutethe program.[3]
3. TECHNOLOGIES USED
a. Python:
Pythonisawell-likedprogramminglanguageatahighlevelthatisrecognizedforitsstraightforwardness, comprehensibility,anduser-friendliness.
Python3.10.0isbeingutilisedinthedevelopmentofthevoiceassistantproject.
b. Visual Studio Code:
Microsofthascreatedawell-knownopen-sourcecodeeditorcalledVisualStudioCode,oftenreferredto asVSCode.
VSCodesupportsawiderangeofprogramminglanguagesandoffersfeaturessuchassyntaxhighlighting, codecompletion,debugging,Gitintegration,andextensionsthatcanbeinstalledtoaddmore functionality.
4. METHODOLOGY
1: basicworkflowofmodel
TherearethreemodulesinthisAssistant.Thefirststepisfortheassistanttotakeuservoiceinput.Second,analysetheuser inputandtranslateittotheappropriateintentandfunction.Thethirdistheassistantprovidingtheuserwiththeoutcomeall along through speech. The assistant will first begin receiving human input. When the assistant receives the input, it will transformtheanaloguevoiceinputtodigitaltext.Iftheassistantisunabletoturnthevoiceintotext,itwillprompttheuserfor inputoncemore.Afterconverted,itwillbeginprocessingtheinputandmappingittoacertainfunction.Theoutputwillthenbe providedtotheuserbyvoicecommand.Usersrequestorinquirygetschopsintosegregatecommandswhichmakesiteasyto recognizebyourdesktopvoiceassistant.
Incomparisontootherinquiries,oursissearchedinsidethecommandlist
Thevoiceassistantreceivestheseordersthroughthecommandlist.
Oncethevoiceassistantacceptsorreceivesacommand,itwillimmediatelydetermine theappropriateactiontotake.
Iftheuser'sinquiryisnotunderstandable,thenthevoiceassistantwillaskforclarificationbeforeproceeding.
Particularly,thevoiceassistantdetectswhatwewanttoget.
Whenthevoiceassistantrecognisesthecommandanddeterminesthatitcanproceed,itwillprovidethepersonor userwiththenecessaryinformation.
Forexample,Whenapersonsays,"SKYE,openWhatsapporwikipedia,"thevoiceassistantwilllistentothecommandandtake theappropriateaction,suchasopeningtherelevantwebsite.Aftertheuserhasfinishedspeaking,thevoiceassistantwillpause forafewsecondstoensureithascapturedtheentirerequest,andthenitwillsearchitsdatabasefortheinquirytoprovidethe relevantresult.
Thefollowingareseveralmodulesemployedtocreateavoiceassistantcapableofperformingvariouscommonfunctions.
4.1 Packages used :
a. Speech: RecognitionTheSpeechRecognitionlibraryisemployedtocapturespokenwordsfromamicrophoneand processthemtodeterminetheirmeaningandconvertthemintotextformat.Thislibraryenablesmachinesystems tocomprehendandinterprethumanlanguage.
b. Pyttsx3: ThePyttsx3library,whichstandsforPythontexttospeech,isutilizedtoenableourvoiceassistantto communicatewithusaudibly.Itsupportsvarioustexttospeechenginesthatcanconverttextintospeech,allowing thevoiceassistanttospeaktoitsuser.Wecanselectthevoicetobemaleorfemalebasedonourpreferences.
c. Wikipedia: ToretrieveinformationfromWikipediaonanytopic,performasearch,orseeksolutionstoaquery,we mustemploytheWikipedialibrary.ThisPythonlibraryrequiresaninternetconnectiontoobtainresults,anditcan presentthefindingstotheuserinbothtextandvoiceformat.
d. Datetime: Thismoduleiscrucialforfacilitatingdateandtime-relatedfunctionalities.Itisusefulwhenauserneeds toaccessthecurrentdateandtimeorwhentheywanttoscheduleataskataspecifictime.
e. PyWhatkit: PyWhatKit is a Python library that offers several functionalities, including sending messages and imagesviaWhatsApp,playingYouTubevideos,convertingimagestoASCIIart,sendingemails,andmore.
f. OS (Operating System): TheOSmoduleinPythonisutilizedforinteractingwiththeoperatingsystem.Specifically, wecanusethe'Startfile()'functiontolaunchanyinstalledapplicationonoursystem.
g. webbrowser: ThewebbrowsermoduleinPythonoffersauser-friendlywaytoopenandshowwebpagesinaweb browserwindow.Itprovidestheoptiontoopenawebpageineitheranewbrowserwindoworthecurrentone, dependingonyourpreferences.
5. RESULTS
a. Asking for today’s date
AsshowninFig.4.Wehaveaskedskyeforthecurrentdateandhetoldusthecurrentdate.
AsshowninFig.5.WehaveaskedskyetoOpenGoogle.Itreceivestherequestandperformsactiononit
6. FUTURE SCOPE
Atpresent,theprogramisrestrictedtotheEnglishlanguage;however,thereareintentionstobroadenitsaccessibilityto otherlanguagesshortly.Thegoalistocreateareliablesoftwarethatrequiresminimaltypingandcanbeoperatedentirely throughvoicecommands,providingaseamlessuserexperience.Toensurewidespreadadoption,it'simportanttominimizethe software'srelianceonthelocalenvironmentandoperatingsystem.
Ourvirtualassistantwillsoonhavearead-aloudfunctionthatwillallowindividualswithdisabilitiestolistentoandaccess desiredinformationfromvariouswebresources.Currently,thisfeatureisonly availableonPC,butinthefuture,itwillbe availableonalldevices.Additionally,thefeaturewillbedesignedwitheaseofuseanduser-friendlinessinmindforindividuals withdisabilities,ensuringthattheydonotneedconstantsupervisiontouseiteffectively.
7. CONCLUSIONS
AI-poweredvoiceassistantsfordesktopshavetransformedthewayweinteractwithourcomputers.Byutilizingadvanced algorithms and technologies, these assistants can understand and interpret user speech, allowing for more efficient and effortless task performance. With a broad range of functions such as messaging, calling, and playing music, these voice assistants have become an indispensable tool for many users. As technology progresses, we can anticipate these voice assistantstobecomeevenmoresophisticated,furthersimplifyingourlives.Thisprojectwillbenefitindividualsofallagesand thosewithdisabilitiesoruniquecircumstances.Thepersonalvoiceassistantwillbeuser-friendlyandminimizetheneedfor manual human efforts to accomplish various tasks. The current voice assistant system operates exclusively on desktops. However,themodularnatureofthesystemallowsforadditionalfeaturestobeaddedwithoutdisruptingthecurrentsystem functionalities.
8. REFERENCES
[1] “ASKITheVirtualDesktopAI-BasedVoiceAssistant”,ISSN2581-9429Volume02,Number1(2022).
[2] “DesktopvoiceguideusingpythonandArtificialIntelligence,”ISSN2582-5208Volume04,Number5(2022).
[3] Tulshan, Amrita & Dhage, Sudhir. (2019). “Survey on Virtual Assistant: Google Assistant, Siri, Cortana, Alexa”, 4th InternationalSymposiumSIRS2018,Bangalore,India,September19–22,2018,RevisedSelectedPapers.10.1007/978981-13-5758-9_17.
[4] V.RadhaandC.Vimala,“Areviewonspeechrecognitionchallengesandapproaches,”doaj.org,vol.2,no.1,pp.1–7,2012.
[5] Deepak Shende. Ria Umabiya, Monika Raghorte, Aishwarya Bhisikar. Anup Bhange. "Al Based Voice Assistant Using Python",InternationalJournalofEmergingTechnologiesandInnovativeResearch(www.jetir.org),ISSN2349-5162,Vol.6, Issue2,pageno.506-509,February-2019.
[6] Srivastava S., Prakash S. (2020) Security Enhancement of IoT Based Smart Home Using Hybrid Technique. In: BhattacharjeeA.,BorgohainS.,SoniB.,VermaG.,GaoXZ.(eds)MachineLearning,ImageProcessing,NetworkSecurityand Data Sciences. MIND 2020. Communications in Computer and Information Science, vol 1241. Springer, Singapore. https://doi.org/10.1007/978-981-15-6318-8_44
[7] Dr.KshamaV.Kulhalli,Dr.KotrappaSirbi,Mr.AbhijitJ.Patankar,"PersonalAssistantwithVoiceRecognitionIntelligence", InternationalJournalofEngineeringResearchandTechnology.ISSN0974-3154Volume10,Number1(2017)
[8] “ComparativeAnalysisofSmartVoiceAssistants” IEEEInternationalConferenceonComputationSystemand InformationTechnologyforSustainableSolutions(CSITSS)2021.