COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

COMPUTER VISION-ENABLED GESTURE RECOGNITION FOR DIFFERENTLY-ABLED PEOPLE

Department of Computer Science and Engineering (PSG College Of Technology , Coimbatore) Avinashi Road , Coimbatore , Tamil Nadu - 641004 ***

Abstract - This research aims to lower the barrier of day-today communication for disabled people by developing a userfriendly, cost-effective system by which we can determine the most appropriate character from the sign that the user is showing to the system. About nine million people in the world are deaf and mute. Communication between differently-abled people and general people has always been a challengingtask but sign language helps them to communicate with other people. But not everyone understands sign language and here is where our system will come into the picture. Various machine-learning algorithms have been investigated to facilitate pattern identification and processing. Advanced Python is used to train the model usingtheimage featuresthat were obtained. In response to the sign presented to the model, the trained model accurately predicts the words thataremost appropriate using the datasets that are fed into the system. Words that are predicted are generated into a voice output. Sign language provides a way for speech-impaired and hearing-impaired people to communicate with other people. Instead of a voice, sign language uses gestures to communicate. Sign language is a standardized way of communication in which every word and alphabet is assigned to a distinct gesture The solution aims to assist those in need, ensuring social relevance by offering an interface that can help facilitate simple and unrestricted communication between a wide range of people Computer vision and machine learning researchers are now conductingintensiveresearchin the area of image-basedhand gestureidentification. Intending to make human-computer interaction (HCI)simplerandmore natural without the use of additional devices, it is an area where many researchers are researching. Therefore, themain objective of research on gesture recognition is to develop systems that can recognize certain human gestures and use them, for instance, to convey information.

Key Words: Mediapipe,OpenCV,SLR,gTTs,HCI

1. INTRODUCTION

Acrucialapplicationofgesturerecognitionissignlanguage detection.Currenttechnologiesforgesturerecognitioncan bedividedintotwotypes:sensor-basedandvision-basedIn sensor-based methods, data gloves or motion sensors are incorporated from which the data of gestures can be

extracted.Evenminutedetailsofthegesturecanbecaptured bythedatacapturingglovewhichultimatelyenhancesthe performanceofthesystem.However,thismethodrequires wearingadata-capturinghandglovewithembeddedsensors whichmakesitabulkydevicetocarry.Thismethodaffects the signer’s usual signing ability and it also reduces user amenities.Vision-basedmethodsincludeimageprocessing. This approach provides a comfortable experience to the user. The image is captured with the help of cameras. No extradevicesareneededinthevision-basedapproach.This methoddealswiththeattributesoftheimagesuchascolour andtexturethatareobligatoryforintegratingthegesture. Although the vision-based approach is straightforward, it hasmanychallengessuchasthecomplexityandconvolution ofthe background,variationsinilluminationand tracking other postures along with the hand object, etc. arise. Sign languageprovidesawayforspeech-impairedandhearingimpairedpeopletocommunicatewithotherpeople.Instead ofavoice,signlanguageusesgesturestocommunicate.Sign languageisastandardizedwayofcommunicationinwhich everywordandalphabetisassignedtoadistinctgesture.It would be a win-win situation for both differently-abled peopleandthegeneralpublicifsuchasystemisdeveloped where sign language could be converted into text/speech. Technology is advancing day after day but no significant improvements are undertaken for the betterment of specially-abledcommunities.Ninemillionpeopleworldwide are both deaf and silent. Differently abled persons and regularpeoplehavelongfounditdifficulttocommunicate, but sign language makes it easier for them to do so. However,noteveryoneisfluentinsignlanguage,whichis whereourapproachwillbeuseful.

1.1 CHALLENGES IN THE DAY-TO-DAY LIFE OF A BLIND PERSON

1. LargeSectionsOfTheInternetAreUnusable

Almostalloftheworld'smainoperatingsystemshave somesortofsoftwarethatcanassistinturningthetext onyourscreenintoaudiomessages.Unfortunately,the majority of these accessible solutions lack advanced technology. In other words, despite all the amazing thingstechnologycanachieve,ifyouareunabletoseea

© 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page159

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

webpage,youmighthavetolistentoitsfullcontentsin linearorder.

2. Braille Illiteracy Is Growing Due To A Lack Of Infrastructure

Theworryingtrendofbrailleilliteracyisaproblemin theblindworld.Brailleisclunky,challengingtomaster, and sluggish to process, according to some blind readers. A braille book is much longer than the same bookinprintbecausebrailletakesupmuchmorespace onapage.

3. Withdrawing Money From An ATM Can Be TimeConsuming

Althoughtheabilitytouseheadphonestohearprompts isbecomingmoreprevalent,itisstilluncommon.Braille on the buttons doesn't assist if you can't see the matchingmessagesandnumbersonthescreen,making an ATM inaccessible to the blind if it doesn't communicate.

4. LoweredExpectationsCanBeLife-HaltingHurdles

Duetothewidespreadmisperceptionthatbeingblindis adisability,blindpersonsarefrequentlydismissedona systemiclevelbeforetheyhavetheopportunitytoshow theirpotential.Blindpersonsarefrequentlyemployed byfactoriesthatpaythemcentsonthedollarfortheir labouraftertrainingthemtoperformmenialactivities in"shelteredworkshops."

2. LITERATURE REVIEW

1. ConversionofSignLanguageintoText-MaheshKumar NB

Thispaperrevolvesaroundtheestablishmentofageneral frameworkforcreatinganSLRmodel.SLRdataprocessing involves sign representation, normalization and filtering, data formatting and organization, feature extraction and featureselection.

In this research, MATLAB is used to demonstrate the recognitionof26handmotionsinIndiansignlanguage.The suggested system has four modules, including feature extraction,signrecognition,sign-to-text,pre-processingand hand segmentation. The suggested system aids in the reduction of dimensions. It becomes more difficult to visualizethetrainingsetandsubsequentlyworkonitasthe number of features increases. Sometimes, the majority of these traits are redundant since they are connected. Algorithms for dimensionality reduction are useful in this situation. By getting a set of principal variables, dimensionality reduction is the process of reducing the numberofrandomvariablesbeingconsidered.Itcanbesplit intotwocategories:featureextractionandfeatureselection.

BSLusesatwo-handedfingerspellingsysteminthecurrent systems,asopposedtoASL'suseofaone-handedapproach (andFSL).

Gesture recognition has several uses, including the understandingofsignlanguage.

Signlanguagerecognitionhastwodifferentapproaches.

-Glove-basedapproaches

-Vision-basedapproaches.

Glove-basedtechniquesSignersinthiscategorymustputon asensorgloveoracolouredglove.Bywearingglovesduring thesegmentationprocess,thetaskwillbemadesimpler.The disadvantageofthisstrategyisthatthesignermustwearthe sensor hardware and the glove while the device is in operation. LDA is mostly utilised in machine learning, pattern identification, and statistics. It is employed to identifyalinearcombinationofpropertiesthatdistinguishes betweenatleasttwoclassesofobjectsoroccurrences.LDA describeshowtomodelthedistinctionsbetweenthevarious data classes. Continuous measurements are conducted on independentvariablesforeachobservationinLDA.

Demerits

• DesigningSLRtoolsthatgetcloseto100%accuracyona bigvocabularyisnotyetdone.

• Future emphasisshould be paidmoretotheusability factor.

• generatecriticism

• Techniques that allow for the quick identification of frequentmistakeswhileguaranteeingthatuserinputis respected.

2. DeepLearningforSignLanguageRecognition:Current Techniques,Benchmarks,andOpenIssues-Muhammad Al-Qurishi,,ThariqKhalid,RiadSouissi

The solution's input layer is made up of a hand sign presentationdisplayandaninputdevicebasedonSLRdata collectiontechnique.Thesecondlayeristhepre-processing layer,whichdecodesasignintothenecessarydataformat and filters gesture data. There may also be extra stages, includingsamplenormalizationorintegratingdatafroma video's subsequent frames. Feature extraction is the first operationthesystemtakesafterreceivingsigndata.Visual features, hand movement characteristics, 3D skeleton features, and face features are just a few examples of the manyvariouskindsoffeaturesthatcanbeemployedasthe main source of information. One of the key elements affecting the SLR method's effectiveness is the choice of featurestobeusedinthealgorithmtraining.Beforebeing

2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page160

©

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

supplied to the modelling layer, the data are normally processedandconvertedintoavectorformat.

Modulesinvolved–

Data collection can be divided into 3 - Hardware-based, vision-basedandhybrid

•Hardware-basedmethodsmakeuseoftoolsorwearable sensors. To recognize sign language, wearable technology frequentlyattachessensorstotheuserorusesaglove-based system. These tools, whether they are gloves, rings, or sensors, can translate sign language into text or speech. Becausecumbersomeapparatusmustbeworn,sensor-based techniques are never natural. They instead suggest RealSense,acutting-edgemethodthatcannaturallyidentifyand trackhandplacements.

•Vision-basedtechniquesarelessrestrictiveonusersthan sensor-based ones. Recent SLR studies sometimes rely on input in the form of visual data. They are limited by the subparfunctionalityoftraditionalcameras.Anotherissueis that simple hand features can lead to ambiguity while complexfeaturestakemoretimetoprocess.

•Hybrid-Whencomparedtoothermethods,hybridmethods performaswellasorbetter.Thismethodcallsforhardware andvision-basedmodalitiestobecalibrated,whichcanbe particularly difficult. This approach is quicker because it doesn't need to be retrained. Since it influences how the modelsaretrainedand,consequently,howquicklytheymay become skilled at discerning between distinct signs or words, feature extraction is a crucial stage for all SLR models. Features are always derived from raw data and correspondtothepositionsofbodycomponentscrucialfor sign language communication (certain spots on the hands andface).Featuresaredeterminedusingstatisticalmethods withpredeterminedweightsandexpressedasvectors.

Essentially,featureselectionmeansreducingtheamountof information in the data to a select few relevant statistical characteristics,whicharethenfedintothemachinelearning network.Thegoal isto reducethenumberof calculations requiredtoproduceanaccurateforecastbyonlyincluding those features that significantly improve the algorithm's abilitytodistinguishbetweendistinctclasses.

3. Signet: A Deep Learning based Indian Sign Language RecognitionSystem-SruthiC.JandLijiyaA

ConvolutionalNeuralNetworkarchitectureisimplemented forIndianSignLanguagestaticAlphabetrecognitionfrom the binary silhouette of the signer hand region after reviewing several existing methods in sign language recognition. The proposed method was implemented successfullywithanaccuracyof98.64%,whichissuperiorto themajorityofthecurrentlyusedtechniques.Problemswith hand gesture recognition are approached using either a

glove-based method or a vision-based one. Since these particulardevicescapturedatadirectlyfromsigners,gloves offergoodaccuracy.

Basedonfeaturesdeterminedusingvariousimageorvideo processing techniques, vision-based systems perform the taskofobjectrecognitionfromimagesorvideos.

Demerits:

Thesuggestedarchitecturecaneventuallybeexpandedwith new approaches and modules to create a fully automated sign language recognition system. Soon, facial expression andcontextanalysiswillbeincorporatedintosignlanguage recognition.Theresultingrecognitionaccuracyoutperforms themajorityoftheavailabletechniques.

4. Real Time Sign Language Recognition and Speech Generation- Kartik Shenoy, Tejas Dastane,Varun Rao , DevendraVyavaharkar

Becausetheyaremoreaccuratethangadgetmethods,image processing and neural network-based sign language recognition systems are chosen. create a neural networktrainedsignlanguagerecognitionsystemthatisbothuserfriendlyandaccurate,generatingbothtextandspeechfrom theinputgesture.

Merits:

• Theproposedsolutionwasputtothetestinreal-world scenarios, and it was demonstrated that the classification models that were created could identify everytaughtgesture.

• The system will be improved in the future, and trials usingfulllanguagedatasetswillbeconducted.

5. IndianSignLanguageRecognitionSystem-Yogeshwar IshwarRokade,PrashantJadav

Ingesturerecognition,signlanguagerecognitionformsan important application. It consists of two different approaches.

1)Glove-basedapproach:

Thesignerinthiscaseneedstowearasensororacoloured glove.Usingtheglovemakesthesegmentationphase'sduty easier.Thedrawbackofthisstrategyisthatthesignermust carrythesensorhardware,includingtheglove,throughout theentireprocess.

2)Vision-basedapproach:

It makes use of the algorithms of image processing for detectingandtrackingthehandsignsincludingthesigner’s facial expressions. This vision-based approach is simple

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page161

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

sincethesignersneednotwearadditionalhardware.Inthe proposedsystemvision-basedapproachisused.

PROPOSEDAPPROACH:

A.ImageAcquisition

Thenumberofsignsmadeuseinthesystemare A,B,D,E,F,G,H,J,K,O,P,Q,S,T,X,Y,Z.

B.HandObjectDetection:

•HandSegmentation

•FilterandNoiseRemoval

•FeatureExtraction  Classification Table -1: Accuracychart

defined in ISL for gesture categorization. The system can attainanaccuracyof99.7%forstatichandposturesandan accuracyof97.23%forgesturesusingthismethodology. 

DatasetUsed: 

Pre-Processing:  HandExtractionAndTracking 

Feature Extraction Using Grid-Based Fragmentation Technique 

Classification:RecognitionofISLHandposesusingkNN

• GestureClassificationusingHMM

• TemporalSegmentation

6. Real-Time Indian Sign Language (Isl) RecognitionKartik Shenoy ,TejasDastane Varun Rao , Devendra Vyavaharkar

The method described in this work uses grid-based characteristicstorecognisehandpositionsandmotionsfrom theIndianSignLanguage(ISL)inreal-time.Withthehelpof thissystem,thecommunicationgap between thehearingandspeech-impairedandthegeneralpublicismeanttobe closed.Thecurrentsolutionsareeithernotreal-timeoroffer justamodestlevelofaccuracy.Theoutputfromthissystem is good for both parameters. It can recognise 33 hand positionsandafewISLmovements.Asmartphonecamera recordssignlanguage,andtheframesaresenttoadistant server for processing. It is user-friendly because no other hardwareisrequired,suchasglovesortheMicrosoftKinect sensor. Hand detection and tracking employ methods including Face detection, Object stabilisation, and Skin Colour Segmentation. A Grid-based Feature Extraction approachisalsousedintheimage,representingtheattitude ofthehandasaFeatureVector.Thek-NearestNeighbours method is then used to categorise hand positions. The motion and intermediate hand postures observation sequences,ontheotherhand,aregiventoHiddenMarkov Modelchainscorrespondingtothe12pre-selectedgestures

Table-2: Accuracychart

7. TexttoSpeechConversion-S.Venkateswarlu,Duvvuri BKKameshDuvvuri,SastryJammalamadaka

The user can now hear the contents of text images rather than reading them thanks to a novel, effective, and costbeneficial technique proposed in the current research. It combinestheprinciplesofTexttoSpeechSynthesizer(TTS) andOpticalCharacterRecognition(OCR)intheRaspberryPi. Peoplewhoareblindorvisuallychallengedcanefficiently communicatewithcomputersusingthis typeofdevice. In computer vision, text extraction from colour images is a difficult task. Using OCR technology, text-to-speech conversion reads English alphabets and numbers that are present in images and converts them into voices. The device'sdesign,implementation,andexperimentalfindings arecoveredinthispublication.Thevoiceprocessingmodule andthepictureprocessingmodulemakeupthisgadget.The device's 900 MHz processing speed was taken from the RaspberryPiv2platform.

8. Design and Implementation of Text To Speech Conversion for Visually Impaired PeopleItunuoluwaIsewon,JeliliOyeladeOlufunkeOladipupo

There are two basic stages to the text-to-speech (TTS) synthesisprocess.Firstistextanalysis,whichconvertsthe inputtextintoaphoneticorotherlinguisticrepresentation; second is speech waveform creation, which creates the output using this phonetic and prosodic data. Typically, thesetwostagesarereferredtoashigh-levelandlow-level synthesis.Theinputtextcouldbe,forinstance,scannedtext

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072 © 2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page162

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

fromanewspaper,standardASCIIfromemail,amobiletext message, or data from a word processor. The character string is then pre-processed and examined to produce a phonetic representation, which is typically a string of phonemes plus some extra information for proper intonation, length, and stress. The information from the high-level synthesiser is ultimately used to make speech sounds using the low-level synthesiser. There have been numerousreportedmechanicalattemptstoproducespeechlikesoundssincetheeighteenthcentury.

9. P.V.V Kishore, P. Rajesh Kumar, E. Kiran Kumar &S.R.C.Kishore, 2011,Video Audio Interface for Recognizing Gestures of Indian Sign Language, InternationalJournalofImageProcessing(IJIP),Volume 5,Issue4,2011pp.479-503

Theauthorsofthepaper[12]suggestedasystemthatcan translateISLmotionsfroma videofeedintoEnglish voice and text and recognise them. They achieved this by employing a variety of image processing techniques, including edge detection, wavelet transform, and picture fusion, to separate the shapes in the video stream. Shape features were extracted using Ellipsoidal Fourier descriptors,andthefeaturesetwasoptimisedandreduced usingPCA.Thesystemwastrainedusingthefuzzyinference system,anditachievedanaccuracyof91%.

10. Adithya, V., Vinod, P. R., Gopalakrishnan, U., and IEEE (2013). "Artificial Neural Network Based Method for IndianSignLanguageRecognition."

The authors of the paper [13] proposed a method for automatically identifying movements in Indian sign language.Thesuggestedmethodtransformstheinputimage beforeusingittodetecthandsusingYCbCrcolourspaceand digital image processing techniques.Some ofthemethods usedtoextractthefeaturesincludedistancetransformation, projection of distance transformation coefficients, Fourier descriptors, and feature vectors. The data were classified usinganartificialneuralnetwork,andtherecognitionrate was91.11percent

3. METHADOLOGY

We have used Media Pipe holistic for real-time image tracking. Separate models for posture, face, and hand componentsareindividuallyintegratedintotheMediaPipe Holistic pipeline and are each customized for their respective fields. As a multi-stage pipeline, Media Pipe Holisticismadetotreateachregionwiththeproperimage resolution.WestartbycollectingkeypointsfromMediaPipe holisticandcollectabunchofdatafromkeypointsi.e.,our hands,ourbodyandonourfaceandsavedataintheformof NumPy arrays. We can vary the number of sequences according to our needs, but each sequence will have 30 frames.

•WethenbuildanLSTMmodeltotrainwithourstoreddata which helps us to detect action with several frames. The usageofLSTMwasadvantageousbecauseLSTMnetworks areanextensionofrecurrentneuralnetworks(RNNs)which were developed primarily to address failures in RNNs. LSTMsprovideuswithalargerangeofparameterssuchas learningrates,andinputandoutputbiases.Thus,thereisno needforprecisemodifications.WithLSTMs,updatingeach weightismuchmoresimilartoBackPropagationThrough Time(BPTT),reducingcomplexitytoO(1).

•Imageprocessingistheprocessofanalysingapictureand turningtheresultsintosentences.Forthis,adatasetwitha significantnumberofimagesandcorrespondinglydetailed captions is needed. To forecast the characteristics of the photos in the dataset, a trained model is employed. This comesunderphotodata.Thedatasetisthenanalysedsuch thatonlythemostintriguingtermsareincludedinit.Datain textformat.Wetrytofitthemodelwiththesetwosortsof data.Byusinginputwordsthatwerepreviouslypredictedby themodel andtheimage,themodel'stask istoproduce a descriptivesentenceforthepicture,onewordatatime.

• We have utilized categorical classification for model training as we include multiple classes of signs However thereisnointrinsicorderingofthecategories.Thenumber ofepochsforthemodelisdeterminedbyusifweincrease thenumberofepochs theaccuracyincreasesbutthetime takentorunthemodelalsoincreasesandoverfittingofthe modelcanhappen,forgesturerecognition.

•Oncetrainingisdone,wecanusethismodelforreal-time hand gesture detection and simultaneously convert the gesture to speech using OpenCV. OpenCV is a significant open-sourcetoolkitforimageprocessing,machinelearning, and computer vision. Python, C++, Java, and many other programming languages are supported by OpenCV. It can analysepicturesandmoviestofindfaces,objects,andeven human handwriting. When it is integrated with various libraries,suchasNumpywhichisahighlyoptimizedlibrary fornumericaloperations.

Fig-1: Systemflow–1

©
2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page163

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

Table-3: Comparisonbetweenexistingandproposedmodel

connectedlayerusingSoftMaxactivation.Aregressionlayeris thenapplied.

4.4

Prediction of gestures:

We choose the number of epochs for the model; while this increasesaccuracy,italsolengthensthetimeittakestorun and increases the risk of overfitting the model, which is problematicforgesturerecognition.

4.5 Usage of Text-to-Speech and Displaying output text on screen:

4. SYSTEM ARCHITECTURE

4.1 Input Video Acquisition:

WehaveusedOpenCVforcapturingtheinputvideothrough severaldevicessuchasdesktopwebcam,phonecameraand DSLR. Python comes with several libraries for processing imagesandvideos.OpenCVisoneamongthem.Alargelibrary calledOpenCVhelpsprovideavarietyofmethodsforimage andvideooperations.Wecanrecordvideofromthecamera usingOpenCV.Itletsyoucreateavideocaptureobjectwhich ishelpfultocapturevideosthroughawebcamandthenedit them as you see fit. We can specify the countdown timer insteadofchoosingoneofthespecifiedcountdowns.Whenthe appropriate key is pressed, the countdown timer is started, and we display the countdown on our camera using the cv2.putText()function.Whenthecountdownreacheszero,we takeanimage,displayitforapredeterminedamountoftime, andthenwriteorsaveittodisc.

4.2 Feature extraction using Media Pipe Holistic:

WehaveusedMediaPipeHolisticforfeatureextraction.The Media Pipe Holistic pipeline includes separate models for posture,face,andhandcomponentsthatareeachtailoredfor theirparticularindustries.MediaPipeHolistic,amulti-stage pipeline,isdesignedtotreateachregionwiththeappropriate image resolution. Starting with key points from Media Pipe Holistic,wegatheralotofdatafromkeypointssuchasour hands,ourbody,andourfaceandrecordtheinformationas NumPyarrays.Thenumberofsequencescan bechangedto suitourneeds,buteachserieswillinclude30frames.

4.3 Training the input data with Long Short-Term Memory (LSTM) model:

Here,weemployauniqueclassofRNNcalledanLSTM,which enables our network to learn long-term and training of our storeddatatakesplace,whichenablesustorecognizeaction overseveralframes.Thesecondtechniqueinvolvessending theInceptionCNNmodel'santicipatedlabelstoanLSTM.The inputtotheLSTMisprovidedbythefinalhiddenlayerofthe CNN.WeutilizedanetworkwithasinglelayerofseveralLSTM units after extracting the bottleneck features, then a fully

value:

ThegesturesweretranslatedintovocaloutputusingGoogle Text-to-Speech.Inadditiontothis,theoutputtextisdisplayed onthescreen.Thiswholesetoftheprocesstakesplaceagain forrecognizingasentence.

Fig-2 : Systemflow2

5. IMPLEMENTATION AND RESULT

5.1 Extract Media Pipe Holistic Key points

TheMediaPipeHolisticpipelineintegratesseparatemodels for pose, face and hand components, each of which is optimized for their particular domain. However, because of theirdifferentspecializations,theinputtoonecomponentis notwell-suitedfortheothers.Theposeestimationmodel.

Fig-3

: MediaPipeIntegrationofseparatemodel

5.2 Build a Sign Language model using a Action Detection powered by LSTM layers

ThevanillaLSTMnetworkhasthreelayers;aninputlayerand a single hidden layer followed by a standard feedforward outputlayer.Theproposedsystemisdesignedtodevelopa real-timesignlanguagedetectorusinga.TensorFlowobject detectionAPIandtrainitthroughtransferlearning.

© 2023, IRJET | Impact Factor
7.529 | ISO 9001:2008 Certified Journal | Page164

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

Fig- 4: BuildingamodelpoweredbyLSTMlayers

5.3 Predict sign language in real time using video sequences

Thesystemmakesreal-timepredictionsofthesignsshown by the user on the camera and appends the labels of the predictionstoasentencebeltonthescreen.Itthenvoices outthepredictionmade.

Chart-1 :Categoricalaccuracygraphs

Fig- 5: Realtimedetectionandprediction

5.4 Voice Output for predictions made using gTTs

ThereareseveralAPIsavailabletoconverttexttospeechin Python. One such API is the Google Text-to-Speech API commonlyknownasthegTTSAPI.gTTSisaveryeasy-to-use toolwhichconvertsthetextentered,intoaudiowhichcanbe savedasanmp3file.

Chart-2 : TrainingLossgraphs

Thetrainedmodelisfoundtobeofanaccuracyof98.2%. Thelossoverthetrainingepochsisfoundtobenearlyzero

6. FUTURE SCOPE

Motivations to pursue further research to develop an improvisedversionoftheproposedsystem.Enhancingthe recognition capability under various lighting conditions. Implementingandidentifyingagreaternumberofgestures andatthesametimemaintaininghigheraccuracy.Applying thisgesturerecognitionforaccessinginternetapplications. Weintendtobroadenthescopeofourdomainscenariosand integrateourtrackingsystemwitharangeofhardware,such as digital TV and mobile devices. We want to make this techniqueaccessibletoavarietyofpeople

7. CONCLUSION

Fig-6 : VoiceOutputforpredictionsmadeusingGtts

5.5 Training and Loss graphs

The model is trained on categorical classification for over 2000epochs

Ourmethod,calledhandgesturerecognition,acknowledged that constructing a database from scratch utilizing video sequences and frames has time constraints and that the process is sensitive to variations in gesture. It is a wellknownissueforhumanitythatsomepeopleareunableto talkorhear.Thetechnologywillprovideauserinterfacethat makes communication . Our method, called hand gesture recognition, acknowledged that constructing a database fromscratchutilizingvideosequencesandframeshastime constraintsandthattheprocessissensitivetovariationsin gesture. It is a well-known issue for humanity that some people are unable to talk or hear. The technology will

2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page165

©

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072

provide a user interface that makes communicating with people with disabilities straightforward. The Long ShortTermMemoryNetworkhasperformedexceptionallywellat recognizingsignlanguagehandgesturesinvideoclips.The technology recognizes sign language more rapidly and accuratelythanothertechniquesreportedintheliterature. Human-computerinteractionhasseveralpotentialusesfor hand gestures, which are an effective form of human communication. When compared to conventional technologies, vision-based hand gesture recognition algorithms offer several demonstrated advantages. The current work is only a small step towards achieving the outcomesrequiredinthefieldofsignlanguagerecognition, ashandgesturerecognition isa challengingproblem. The chosen hand traits worked well with machine learning techniquesandcouldbeappliedinavarietyofreal-timesign languagerecognitionsystems.Thesystemwillcontinuetobe improved, and trials using full language datasets will be conductedinthefuture.Incaseswherethereisaneedfor trustworthysolutionstotheissueofstartandfinishgesture identification,itisalsoplannedtoevaluatesystemsthatcan read dynamic sign language gestures. The proposed approach is a strong foundation for the creation of any vision-based sign language recognition user interface system, even if there is still more work to be done in this fielding with people with disabilities straightforward. The Long Short-Term Memory Network has performed exceptionally well at recognizing sign language hand gestures in video clips. The technology recognizes sign languagemorerapidlyandaccuratelythanothertechniques reportedintheliterature.Human-computerinteractionhas several potential uses for hand gestures, which are an effectiveformofhumancommunication.Whencomparedto conventional technologies, vision-based hand gesture recognition algorithms offer several demonstrated advantages.Thecurrentworkisonlyasmallsteptowards achievingtheoutcomesrequiredinthefieldofsignlanguage recognition, as hand gesture recognition is a challenging problem.Thechosenhandtraitsworkedwellwithmachine learningtechniquesandcouldbeappliedinavarietyofrealtime sign language recognition systems. The system will continue to be improved, and trials using full language datasets will be conducted in the future. In cases where thereisaneedfortrustworthysolutionstotheissueofstart and finish gesture identification, it is also planned to evaluate systems that can read dynamic sign language gestures.Theproposedapproachisastrongfoundationfor thecreationofanyvision-basedsignlanguagerecognition userinterfacesystem,evenifthereisstillmoreworktobe doneinthisfield.

8. REFERENCES

[1]MaheshKumarN B“Conversionof SignLanguageinto Text“,InternationalJournalofAppliedEngineeringResearch ISSN0973-4562Volume13,Number9(2018)

[2]DeepLearningforSignLanguageRecognition:Current Techniques,Benchmarks,andOpenIssues--MuhammadAlQurishi,,ThariqKhalid,RiadSouissi,IEE,vol9

[3] Muhammad Al-Qurishi , ,Thariq Khalid , Riad Souissi ”Deep Learning for Sign Language Recognition: Current Techniques,Benchmarks,andOpenIssues”,IEE,vol9

[4]SruthiC.JandLijiya A“Signet:ADeepLearningbased Indian Sign Language Recognition System” , 2019 InternationalConference

[5] Yogeshwar Ishwar Rokade , Prashant Jadav “INDIAN SIGN LANGUAGE RECOGNITION SYSTEM”,July 2017 International Journal of Engineering and Technology 9(3S):189-196

[6] Recognition Kartik Shenoy ,TejasDastane Varun Rao , Devendra Vyavaharkar “Real-Time Indian Sign Language (Isl)”,9thICCCNT2018July10-12,2018,IISC,

[7]S.Venkateswarlu,DuvvuriBKKameshDuvvuri,Sastry Jammalamadaka“Text to Speech Conversion “ , October 2016IndianJournalofScienceandTechnology9(38)

[8] Isewon ,JeliliOyelade Olufunke Oladipupo “Design and ImplementationofTextToSpeechConversionforVisually Impaired People Itunuoluwa “,International Journal of AppliedInformationSystems(IJAIS)–ISSNVolume7–No.2, April2014

[9]Al-Jarrah,A.Halawani,“RecognitionofgesturesinArabic sign language using neuro-fuzzy systems,” The Journal of ArtificialIntelligence133(2001)117–138.

[10]Rajam,Subha&Ganesan,Balakrishnan.(2011)“Real timeIndianSignLanguageRecognitionSystemtoaiddeafdumbpeople”,InternationalConferenceonCommunication TechnologyProceedings

[11]GeethaM.,ManjushaU.C.“VisionBasedRecognitionof Indian Sign Language Alphabets and Numerals Using BSpline Approximation”, International Journal of Computer ScienceandEngineering(IJCSE)

[12] PV.V Kishore, P. Rajesh Kumar, E. Kiran Kumar &S.R.C.Kishore, ” Video Audio Interface for Recognizing GesturesofIndianSignLanguage”,InternationalJournalof ImageProcessing(IJIP),Volume5,Issue4,2011

[13] Adithya, V., Vinod, P. R., Gopalakrishnan, U., and Ieee (2013)."ArtificialNeuralNetworkBasedMethodforIndian SignLanguageRecognition."

[14] A Deep Learning-based Indian Sign Language Recognition System , International Journal of Advanced Research in Science, Communication and Technology (IJARSCT)Volume2,Issue2,February2022

©
| Page166
2023, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

Turn static files into dynamic content formats.

Create a flipbook