Survey on Human Behavior Recognition using CNN

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072

Survey on Human Behavior Recognition using CNN

Anushree Raj1 , Sadiya Ayub Humbarkar2, Sumedha E3

1 Assistant Professor- IT Department, AIMIT, Mangaluru, anushreeraj@staloysius.ac.in

2 MCA Student, AIMIT, Mangaluru, 2117097SADIYA@staloysius.ac.in

3 MCA Student, AIMIT, Mangaluru, 2117112SUMEDHA@staloysius.ac.in ***

Abstract Humanbehaviorrecognitionisacrucialareaof scientificresearchinthescienceofcomputervisionthathas significantapplicationsinavarietyofindustries,including intelligent surveillance, smart homes, and virtual reality. Traditionalmanualapproacheshaveahardtimemeetingthe demands of high recognition accuracyand applicability in thecontemporarycomplicatedenvironment.Deeplearning's arrivalhasopenedupnewavenuesforbehaviorrecognition research. The major focus of this paper is behavior recognition using convolutional neural networks (CNN). Before discussing and analyzing the classical learning methodsanddeeplearningmethodsofbehaviorrecognition, theresearchcontextandimportanceofbehaviorrecognition are first introduced. Based on the convolution neural networkdesignedforthespecifichumanbehaviorinpublic areas, wedevelop a series of human behavior recognition systems.Inordertoextractmovingforegroundcharactersof thebody,thevideoofhumanbehaviordatasetwillfirstbe dividedintoimages,whichwillthenbeprocessedusingthe background removal approach. Second, the planned convolutionneuralnetworkistrainedwiththetrainingdata sets,andthedepthlearningnetworkisbuiltusingstochastic gradient descent. Finally, using the developed network model,thenumeroussamplebehaviorsarecategorizedand recognized, and the recognition outcomes are evaluated against the state-of-the-art techniques. The findings demonstrate that CNN is capable of studying human behavior models automatically and recognizing human behaviorswithouttheneedformanuallyannotatedtraining.

Keywords Convolutional Neural Network (CNN); deep learning; YOLOv3 algorithm; LSTM (Long Short-Term Memory networks); R-CNN (Region-Based Convolutional NeuralNetwork)

1. INTRODUCTION

The technique of classifying and recognizing human behaviors, such as activities or expressions, is based on observations.Theglobalaspectsofapicturehavebecome increasingly important in traditional human behavior identification over the past few decades. To characterize humanbehavior,thesestaticelementsincludeedgefeatures, shapefeatures,statisticalfeatures,andtransformfeatures.It isamethodforcategorizinganddetectingactivitiesbasedon observations, like sensor data streams. Recognition of Human Behavior involves several processes, including

detection,description,clustering,andrecognition.Recently, the field of ubiquitous computing has made major advancementsinthestudyofdevice-freehumanbehavior recognition as well as behavior recognition in video and photos.

Therearea numberoftechniquesfor recognizing humanactivity,includingconventionaltechniquesthatrely onfeaturesretrievedfromphotos.Deeplearningisthemost advanced technology for object detection, processing and detectingvastamountsofpicturedatawiththeleastamount of latency. The CNN model-based behavior recognition implementationhasreceivedalotofattention.

ConvolutionalNeuralNetwork(CNN)isanetwork with a focus on computer vision, a class of deep learning modelsthataremostlyusedforobjectdetectionandpicture processing. A lightweight convolutional neural network is createdforthepurposeof recognizing humanbehavior in ordertolowerthenumberofnetworkparametersandlower the demand for processing and storage resources. It is suggested to use a combined training approach that combinespre-training,fine-tuningtraining,andmigration trainingtoincreasethedeepCNNmodel'sperformanceat recognition. When we input an image into a CNN, it has numerous layers, and as each layer generates activation functions, the following layer receives them. The network may recognize increasingly more complex elements, includingobjects,faces,etc.,aswegofurtherintoit.

2. OBJECTIVE

Theobjectiveofthispaperistoshowthecapabilityofdeep learningtoimplementthehumanbehaviorrecognition.To regulate and achieve the recognition of human behaviors activitiesinrealtime,aConvolutionalNeuralNetwork(or CNN) framework is established. The augmentation is consideredfortrainingdatasetexpectingbetterprediction. Theprimetaskofthisresearchpaperistotakeoutvisual information from digital video; where human body movementneedstobecaptured.Amyriadofdigitalvideo andimageprocessingtechniquesissuitableforextracting information, factual characteristics of human behavior observation. It requires the technologies such as digital imageprocessing,patternrecognition,machinelearning,and deeplearning.

©
Page654
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072

3. RELATED WORKS

To recognize pedestrians, Jia Lu, Wei Qi Yan, and Minh Nguyen demonstrated a deep learning-based detection method. The study used the YOLO model, a deep learning technique that enables real-time detection. To reduce the time-consuming, a GPU acceleration is needed while deep learning is being trained and tested. A suitable hyperparametershouldbecarefullychoseninordertofinetune the model because various hyperparameters can influence the outcomes. Extending the YOLO detection approachshouldbethefocusoffuturedevelopment.Deep learningdemonstratesthecapacitytorecognizeobjectsand assigneachonetotheappropriateclass.

MayurShitole,JerryZeyuGao,ShuqinWang,andHanping LinShengZhouandLaylaRezaproposewell-definedemojibased human behavior patterns to facilitate machine learning-based dynamic behavior classification and detection.Itconcentratesonfourdifferenthumanactions: standing, moving quickly, moving slowly, and sitting. Additionally, a system is described to facilitate real-time humandynamicbehavioridentificationandcategorization basedonthesuggestedmachinelearningmodelandemojibasedbehaviorpatterns.Thepaperalsopresentssomeprior case study results for dynamic human behavior detection and classification utilizing emoji representation. Live streamingaswellaspre-recordedvideoscanbothbeplayed onthesystemwithoutanyissues.

A deep network and HMM-based behavior recognition techniqueisproposedbyChenChen.Thisstudymaximizes thebenefitsoftraditionalapproachestoretainfeaturesby combiningthemwithdeeplearningtechniques.Thebenefits of deep networks, self-extraction, self-training, and time informationprocessingallowhissuggestedstrategytohave a positive impact on the identification of interactive behavior.However,thismethodisstillnottimelyduetothe laborious manual extraction of features by conventional methods.

ZhengjieWangandYinjingGuoprovidethecurrentgeneral methods of behavior identification, along with related surveys, the concept of channel state information, and an explanation of the principles of CSI-based behavior recognition.Thepaperalsogoesintogreatdetailaboutthe generalframeworkforbehaviorrecognition,includingthe fundamental signal selection, signal preprocessing, and behavioridentificationtechniquesemployingpattern-based, model-based, and deep learning-based approaches. The paper divides the existing research and applications into threecategoriesbasedontheaforementionedrecognition methodologies and describes each typical application in detail,includingthetestequipment,experimentalsituations, user count, observed behaviors, classifier, and system performance. Additionally, it examines a few particular applicationsandincludesin-depthdiscussionsonthechoice ofrecognitionmethodsandperformanceassessment.These

conversationsoffersomevaluablesuggestionsforcreating anidentifyingsystem.

NotmuchthoughtwasgiventowhichwaytheAcFRsystem wouldmovewhenitdecidedtoalteritsperspective.Thiscan be problematicsincethesystem mightchoosetoexamine thepersonfrombehindratherthanthefront,whichishow individuals typically move to see a subject's face more clearly. For more accurate active face recognition, the directionofthefacemustbeestimated.

Using EEG brain waves, Sumin Jin, Yungcheol Byun, and SangyongByunhavesuggestedamethodtoidentifyspecific humanbehaviors oractions. Theyidentifiedsix behaviors forthisandrecordedthebrainwavesassociatedwitheach behavior.TheyusedCNNandLSTMmodels,andthestudies revealedthattheywereabletorecognize66%ofbehaviors usingEEGbrainwaves.Thisisapromisingresultgiventhe complexity of the interaction between brain waves and behaviors. Due to dynamic information, the LSTM model producedabetteroutcome

An enhanced deep learning-based method for identifying abnormalhumanbehaviorisproposedbyWeihuZhangand ChangLiu.Thisapproachhasagreaterrateofrecognition, extractsfeaturemoreprecisely,andsimplifiesthemodelless than the conventional approach. To obtain precise critical areasofhumanmotionandanopticalflowmap,theGauss model is chosen, and the Farneback dense optical flow technique is applied. The benefits of CNN and LSTM are combinedtoproduceanaccuraterecognitioneffect.

FranjoMatkovi,DarijanMareti,andSlobodanRibarioffera methodforidentifyingmotionpatternsandabnormalcrowd behavior in surveillance film. It is based on an analysis of fuzzy predicates and fuzzy logic formulas derived from humaninterpretationofrealvideosequences,(multi-agent) crowdsimulators,anddatafromcommonsense.Toidentify and categorize motion patterns in line with the given taxonomyoffuzzylogicpredicates,fuzzylogicpredicatesare used to analyze the motion patterns of an individual or group of individuals. The detection and classification of unusual crowd behavior using fuzzy logic functions. The fuzzypredicatesserveasthefundamentalbuildingblocksof fuzzylogicfunctions,andtheassignmentfunctionsforthese predicates are created by expertly interpreting training videosequencesinconjunctionwithfuzzylogicoperators. We use genuine trajectories obtained by the proposed 4pipelinedmulti-persontrackerandgroundtruthannotations of actual video sequences to evaluate the proposed technique.Positiveandreassuringresultshavebeenfound inearlytests.

To address the issue of long-term modelling of existing behavior recognition algorithms, Feng Xiufang and Dong Xiaoyu suggested a group feature behavior recognition algorithmbasedontheattentionmechanism.Theredundant framesbetweenframesinvideosequencesaresuccessfully

©
Page655
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072

eliminatedusingsparsesampling.Theoriginalframeimages areusedtomodelspacefeaturesinCNNtoefficientlyextract motion change information. Progressive networks with pyramid pools are used to extract picture features during networktraining.Thefinalfeaturevectoristhenproduced byaddinganattentionlayerafterthevideoframefeatures have been consecutively encoded by Bi-GRU. The experimental results demonstrate that this paper's data features can effectively enhance the network's ability to expressitself,andthatthis paper'snetwork structurecan wellmimicthelong-termattentionofvideos.

4. METHODOLOGY

Asillustratedhere,theidentificationandcomprehensionof humanbehaviorfeatureextractionandmotionarethetwo fundamental components of human behavior recognition. The process of feature extraction involves taking the importantfeaturesfromvideoorpicturedata.Sincefeature information is crucial for recognition, feature extraction reallyhasadirectbearingontheoutcomeofrecognition.

Data Collection

There are numerous publicly available datasets for the identificationofhumanbehavior,includingtheWeizmann dataset, UT-Interaction dataset, KTH dataset, UCF dataset, BEHAVEdataset,HMDB51dataset,andMSCOCOdataset.A brief summary of these datasets is included in the table 1 below.

Datasets

Brief Description

Weizmann consistsof90moviesofnineparticipants doing 10 distinct movements, including sprinting, jumping in place, forward jumping, bending, waving one hand, jumping jacks, side jumping, standing on oneleg,strolling,andwavingtwohands.

UTInteraction

Contains footage of human-to-human encounters from the six classes of handshake, point, hug, push, kick, and punchperformedcontinuously.

KTH contains the following six actions: hand clap,box,jog,walk,andjog.Eachactionis performedby25differentpeople,andthe settingissystematicallychangedforeach actor's action to accommodate for performancenuance.

UCF contains 13,320 video clips that are divided into 101 different categories. Thereare5typesthatcanbeassignedto

these 101 categories (Body motion, Human-humaninteractions,Human-object interactions,Playingmusicalinstruments and Sports). The videos have all been compiledfromYouTube.

BEHAVE a collection of data on interactions betweenpeopleandobjectsinthewild.It features 20 objects being used by 8 personsin5differentnaturalsettings.

HMDB51 a compilation of realistic footage taken fromarangeofmedia,includingtelevision and the web. 6,849 video clips from 51 actioncategoriesmakeupthecollection.

MSCOCO a large collection of 328,000 pictures of peopleandcommonobjects.

Table1:Summaryofthesedatasets

Data Pre-Processing

Therawdatagatheredbymotionsensorsneedstobepreprocessed in the following ways in order to feed the suggested network with a certain data dimension and increasethemodel'saccuracy.

1) Linear Interpolation: The aforementioned datasets are accurate,andthesubjectsworewirelesssensors.Asaresult, throughout the collection procedure, some data could be lost.NAN/0iscommonlyusedtoindicatethemissingdatain these circumstances. This problem was resolved by employing the linear interpolation method to fill in the missingvariablesinthisinvestigation.

2)ScalingandNormalization:Itisessentialtonormalizethe inputdatatothe0–1rangebecausetrainingmodelsstraight fromlargevaluesfromchannelsmayfail.

Data Augmentation

A large scale of dataset is the premise of a successful application of convolutional neural networks (CNNs). In ordertocreatetrainingsamplesandincreasethesizeofthe training dataset, data augmentation methods alter the trainingimageinanumberofrandomways.Increasingthe depthandwidthofaneuralnetworktypicallyimprovesits learningcapacity,makingiteasiertofitthedistributionof training data. Our research demonstrates that in the convolutionneuralnetwork,depthismoresignificantthan width.However,asthedepthofneuralnetworksincreases, sodotheparametersthatmustbetaught,whichwillresult inoverfitting.Toomanyparameterswillfitthepropertiesof thedatasetwhenitissmall.Thedataaugmentationconsists ofrandomnoises,scale,rotation,andcrop.

©
Page656
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |
Action Video Sequence Extract Feature Behavior Recognition

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072

Proposed Method

WesuggestaYOLOv3strategytoidentifyingandclassifying dynamic human behavior patterns, which is motivated by prior research and methodologies. A real-time object detection system called YOLOv3 (You Only Look Once, Version3)recognizesparticularthingsinfilms,livefeeds,or still photos. To find an item, the YOLO machine learning systemleveragesfeaturesthatadeepconvolutionalneural networkhaslearned.

A Convolutional Neural Network (CNN) called YOLO is capable of quickly recognising things. CNNs are classifierbased systems that are able to examine input images as organisedarraysofdataandfindconnectionsbetweenthem. YOLOhastheadvantageofbeingfasterthanothernetworks while keeping accuracy. The model can now see the complete image at test time, which helps it make more accuratepredictions.RegionsarescoredbyYOLOandother convolutional neural network algorithms based on how closelytheyresemblepredeterminedclassifications

Initially,theYOLOv3algorithmdividesanimageintoagrid. Eachgridcellforetellsthepresenceofaspecificnumberof boundaryboxes(alsoknownasanchorboxes)arounditems that perform well in the aforementioned predetermined classes.Onlyoneobjectisdetectedbyeachboundarybox, whichhasacorrespondingconfidencescoreindicatinghow correct it expects that prediction to be. The ground truth boxes'dimensionsfromtheoriginaldatasetareclusteredto findthemostprevalentsizesandshapesbeforecreatingthe borderboxes.

R-CNN (Region-based Convolutional Neural Networks, developed in 2015), Fast R-CNN (an R-CNN upgrade developed in 2017), and Mask R-CNN are further comparablealgorithmsthatcanaccomplishthesamegoal. However,YOLOistaughttodoclassificationandbounding boxregressionsimultaneously,incontrasttosystemslikeRCNNandFastR-CNN.

5. ANALYSIS

Compared to past studies of behavior recognition using CNNs,thisstudygivesmoreunderstandingswithrespectto accuracy on human behaviors. With the use of this study approach,itispossibletoanticipatehumanbehaviormore accuratelyfromrawdatawhilealsosimplifyingthemodel anddoingawaywiththenecessityforsophisticatedfeature engineering.SelectionofDatasetsaredecidedonthebasisof accuracy and complexities and its features that can be obtainedfromit.Thisstudycanbefurtherextendedandcan be implemented for different behaviors. Although many implementedthisrecognitionmethodusingdifferentmodels like LSTM, YOLO, R-CNN models, most important features areobtainedfromCNNmodel.

6. CONCLUSION

In this research study, human behavior and activity is recognized using convolutional neural network. Human behavioridentificationisacomplextaskthatiswhyseriesof images have been analyzed so that every moment can be capturedforanalysisandprediction.Formoreinformation asadatasetispreparedbydataaugmentationprocess.More trainingdatamakesthesystemrobustandmoreaccurate. Deeplearningprocessincludesmultidimensionalinputdata setandhasthecapabilityofheretical,sequentialcalculation simultaneouslyalongwithadaptationprocess.Thisability makesitsuitableforbehavioranalysis.Tomakeunderstand thenetworksvideoclipconvertedintoseriesofimagesso that machine can learn deeply every moment of human activity.Incomparisontootherapproaches,thesuggested approach can efficiently cut down on complexity while maintaining network performance. Future research can develop a new parameter selection technique to boost recognitionperformanceevenmore.Deeplearningnetwork hasthescopeofweightupdatingwhichisusefulfordynamic behavior identification that is the resultant of behavior changeactivity.

REFERENCES

[1]JiaLu,WeiQiYanandMinhNguyen,“HumanBehaviour Recognition Using Deep Learning”, 2018 15th IEEE InternationalConferenceonAdvancedVideoandSignal BasedSurveillance(AVSS).

[2]ShuqinWang,JerryZeyuGao,HanpingLin,MayurShitole Layla Reza, Sheng Zhou, “Dynamic Human Behavior Pattern Detection and classification”, 2019 IEEE Fifth InternationalConferenceonBigDataComputingService andApplications(BigDataService).

[3] An Gong, Chen Chen, and Mengtang Peng, “Human Interaction Recognition Based on Deep Learning and HMM”,IEEEAccess2019(Volume:7)

[4]Zhengjie Wang, KangkangJiang, YushanHou, Wenwen Dou,ChengmingZhang,ZehuaHuang,andYinjingGuo,“A SurveyonHumanBehaviorRecognitionUsingChannel StateInformation”,IEEEAccess2019(Volume:7)

[5]ChenxiHuang,YutianXiao,andGaoweiXu,“Predicting HumanIntention-BehaviorThroughEEGSignalAnalysis Using Multi-Scale CNN”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, IEEE 2020, Volume:18

[6]MasakiNakada,HanWang,DemetriTerzopoulos,’’AcFR: Active Face Recognition Using Convolutional Neural Networks”, 2017 IEEE Conference on Computer Vision andPatternRecognitionWorkshops(CVPRW).

©
Journal | Page657
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072

[7]SuminJin,YungcheolByun,SangyongByun,”Analysisof Brain Waves for Detecting Behaviors” , 2018 InternationalConferenceonIntelligentInformaticsand BiomedicalSciences(ICIIBMS),Volume:3

[8]WeihuZhang;ChangLiu,”ResearchonHumanAbnormal Behavior Detection Based on Deep Learning”, 2020 International Conference on Virtual Reality and IntelligentSystems(ICVRIS).

[9] Franjo Matkoviü, Darijan Marþetiü, Slobodan Ribariü, “AbnormalCrowdBehaviourRecognitioninSurveillance Videos”,201915thInternationalConferenceonSignalImageTechnology&Internet-BasedSystems(SITIS).

[10] Feng Xiufang, Dong Xiaoyu, “Research on Human Behavior Recognition Method Based on Static and DynamicHistorySequence”, 2020EighthInternational ConferenceonAdvancedCloudandBigData(CBD).

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

Page658
©
|

Turn static files into dynamic content formats.

Create a flipbook