DEEPFAKE DETECTION USING MACHINE LEARNING by IRJET Journal

DEEPFAKE DETECTION USING MACHINE LEARNING

Chandan Mahto1 , Prapti Khaparde2 , Priya Khatake3 , Dr. Rajesh Kadu4

1,2,3

Student at Mahatma Gandhi Mission’s College of Engineering And Technology, Navi Mumbai 4Associate Professor at Mahatma Gandhi Mission’s College of Engineering And Technology, Navi Mumbai

Abstract – With the increasing misuse of artificial intelligence for generating deepfakes in the form of fake images, clonedvoices, and manipulated videos, ensuring media authenticity has become a significant challenge. This paper presents a unified machine learning-based multi-modal deepfake detection system capable of detecting forgeries in audio, image, and video formats. For image-based detection, a pipeline using MTCNN for face detection and InceptionResNetV1 for classification is used. Audio deepfakes are detected using CNN models trained on mel spectrograms derived from the ASVspoof 2019 dataset. For video analysis, a ResNet-based frame feature extractor and LSTM model are used to capture temporal inconsistencies, trained on the Celeb-DF dataset. All detection models are integrated into a single user interface using Streamlit, allowing users to input any media type and receive instant detection results. The system achieves high accuracy across all modalities and provides a practical, scalable solution for deepfake identification.

Key Words: Deepfake Detection, Multi-Modal Detection, Audio Forgery, Image Manipulation, Video Deepfakes,CNN,LSTM.

1.INTRODUCTION

The rapid advancement of AI-generated synthetic media popularlyknownasdeepfakes hasgivenrisetosignificant concerns around misinformation, identity theft, and manipulation of public discourse. Deepfakes can take the form of realistic synthetic videos, voice clones, or altered imagesthatareoftenindistinguishabletothehumaneyeor ear.

Inthiswork,weproposeacomprehensivemachinelearningbasedsystemcapableofdetectingdeepfakesinthreedistinct modalities:image,audio,andvideo.Oursystemisbackedby well-known datasets (ASVspoof 2019 and Celeb-DF) and architectures(MTCNN,InceptionResNetV1,ResNet,LSTM), and it includes an intuitive UI built using Streamlit that allowsuserstouploadtheinputwhichuserswantstodetect tocheckwhetheritisdeepfakeornotandrespectivemodel predict and gives the result back and display on the UI to users.

2. Motivation

Intheoldoneorwecansaypreviousdeepfakedetection or Mostexistingdeepfakedetectionsystemsfocusonasingle domain,suchasonlyaudiooronlyvideooronlyimage.In real-world scenarios, however, manipulated content can appearinanyformat.Themotivationbehindthisprojectisto buildamultipleorseparatemodelforrespective,thatisone fordetectdeepfakeimage,onefordetectdeepfakeaudio,one for detect deepfake video . This detection system that can detect deepfakes regardless of the input type and deliver resultstotheuserinasimpleandinteractiveway.

Theprimarygoalsofthissystemare:

1. Todevelopanaccuratedetectionmodel foreachmedia typethathelptheuserstodetectthedeepfakeimage,audio ,videosothatusercanknowtherealityhiddenbehindthe content and aware the public and save them from this manipulated content traps and make them responsible person

2. To integrate all detection pipelines into a single user interface. Previously the user have to use different UI for detectothercontent,asthereisnosingleUIwhereusercan giveimage,audio,videoatoneplacetodetectthereality hiddenbehindthecontent.SowemakethesingleUIwhere, usercangivetheinputforrespectivemodelandcandetect thecontentorinputgivenbyuserstoknowwhetheritisreal ordeepfake.

3. To build a practical tool for general users, where users become the more informative regarding the deepfake content.Thispreventtheusersfromthetrapsofdeepfake content . It is also makes the society more responsive regarding the deepfake content as the society people can checktherealityofcontentandknowstherealpictureofthe contents.Theallthreemodelareusefulandgivetheaccurate resultaspertheusersinputandresponsetimeisalsoless.

4.Awarethesocietyregardingthedeepfakecontentwhichis createdusing the newly emerged technologieslike AIand ML. This deepfake detection system gives the users more clarityforthedeepfakecontentsandgivesthembetteridea forprevention andtheirrelativesandotherpeoplefromthe menacesituation.

Volume:12Issue:04|Apr2025 www.irjet.net

3. Literature Survey

Andreasetal[1]thispaperexaminestherealismofstate-of the-artimagemanipulations,andhowdifficultitistodetect them,eitherautomaticallyorbyhumans.Afterthecollecting dataitismanipulated,thentheimageisdetectedwhetherit isfakeorrealusingCNNsconvutionalneuralnetworks.

YuezunLietal[2]TheneedtodevelopandevaluateDeep Fake detection algorithms calls for large-scale datasets. However,currentDeepFakedatasetssufferfromlowvisual qualityanddonotresembleDeepFakevideoscirculatedon theInternet.TheuseofDNNshasmadetheprocesstocreate convincingfakevideosincreasinglyeasierandfaster.Inthis work,theypresentanewlarge-scaleandchallengingDeep Fake video dataset, Celeb-DF3, for the development and evaluationofDeepFakedetectionalgorithms.

Brianetal[3]TheDFDCisthelargestcurrentlyandpublicly availablefaceswapvideodataset.Thedatasetcontainsover 100,000clipsfrom3,426+paidactors.Thedatasetiscreated usingseveralDeepfakesandGAN-basedandnon-learning techniques.

Kaedeetal[4]Inordertoidentifydeepfakes,weintroduce in this paper new synthetic training data dubbed selfblendedimages(SBIs).Toreplicateforgingartifacts,SBIsare createdbymergingsourceandtargetphotosthathavebeen marginallyalteredfromoneauthenticimage.

Nicol’o et al [6] Take up the challenge of detecting face alterationinvideosequencesthatusecontemporaryfacial manipulationmethods.Usingmorethan10,000videos,the CNNapproachisusedtorecognizefalsevideos.

[7]proposesamethodfordetectingtheappearanceoffacial forgery,whichisusedatthelevelofmesoscopicanalysis.In fact, microscopic research based on image noise becomes illegalinthecaseofvideowithimagenoisedegradationafter videocompression.Similarly,itisdifficultforthehumaneye to classify fake images at a higher level, especially when imagesshowhumanfaces.Therefore,itisrecommendedto useadeepneuralnetworkwithasufficientnumberoflayers asanintermediatemethod.

Adeeplearningframeworkwasemployedbytheauthorsof the study [8] for audio-deep fake detection . The model separabilityisincreasedusinga Long-shorttermmemory (LSTM)-the based network is used to recognize events in sub-sampledsignals

[10]proposed a method for using residual noise to be the difference between the original image and its noise free version.Residualnoisehasbeenshowntobeusefulindeep sensingduetoitsspecificityanddiscrimination,whichcan beachievedthroughneuralnetworkswithadaptivelearning. The method was tested on two datasets: low resolution

FaceForensics++videosandhighresolutionvideosfromthe KaggleDeepfakeDetectionChallenge(DFDC).Inthisarticle, weproposeanadaptivelearningbasedclassifierthatuses convolutionalneuralnetworkstolearnthenoiseofrealand fakevideos.

4. Existing System

CurrentlyexistingsystemisnotusesingleUItodisplay the result for audio, video, image detection. Users have to use differentUIandmodelfordetecttheaudio,image,video.

1.Audio Deepfake Detectors: Rely mostly on signal processingandspectrogramanalysisanditissinglemodel whereusercangiveonlyinputasaudioandnototherinput likeimage.

2.Image Detectors: Use face detection followed by classification but don’t address other media types where users have to face problem if he wants to detect other contents.

3.VideoDetectionModels:Focusonframe-levelanalysisor shortsequencesusingLSTMsor3DCNNsanditisnotcanbe usedforimageoraudiodetection,

Limitations:

 Lack of a unified system that supports all input types.

 NointeractiveUIforreal-timetesting.

 Performance may degrade when deepfakes come from newer generation models not present in trainingdata.

OverallwhiletherearemanyexistingsystemforDeepfake detection, manyeitherlackrealtimeadaptabilityorthey might require complex infrastructure or may not provide user friendly outputs for everyday users to detect the contents.

5. Proposed System

The proposed system aims to a more advanced and an efficient waytogivetheusersdeepfakedetectionsystemfor helpsthemtodetectthedeepfakecontent.Sothatusers, will nothave toface issuefordetectthe deepfakecontent andaswedevelop thesingleUI basedonstreamlit where users can give their desire content to check the reality hiddenbehindthecontentthatmeansitwillknowtheusers thatwhetherthegiveninputisrealordeepfake.

So basically , proposed system is a multi-modal deepfake detection platform built using machine learning and deep learningtechniques,integratedwithaStreamlit-baseduser interface.

Volume:12Issue:04|Apr2025 www.irjet.net

Components:

ImageDetectionModule:UsesMTCNNforfacedetectionand InceptionResNetV1forbinaryclassification(real/fake).This module help the users to give the input as image and this moduleonthebasisofprovidedcontentitwilldisplaythe resultafterdetection,whethertheprovidedcontentisreal ornot.

Fig1.RealtoFakeimage

Audio Detection Module: Converts input audio into mel spectrogramsandclassifiesthemusingaCNNmodel.Under thismodule,userscangivetheaudioasinputtodetectthe audioreality,whethertheaudioisdeepfakeornot.

Fig2.AudiodetectionProcess

Video Detection Module: Extracts frames, uses ResNet to extract features, and applies LSTM to model the temporal relationshipforclassification.Itwillallowtheuserstodetect thevideo,toknowwhetherthegivenvideoismanipulated ornot.

UserInterface:

BuiltusingStreamlittoallowusersto:

1.Uploadimage/audio/videofiles

2. Viewpredictionsinstantly

Fig3:VideoDetectionProcess

All three modules , are trained on respective required dataset , and gives the accuracy above 85, That helps the users to know the content is real or manipulated via new emergingtechnologies.

6. Methodology

6.1AudioDeepfakeDetection:

Dataset:Fortrainedthemodelweusedthe ASVspoof2019 LogicalAccessDataset.

Preprocessing:Oncethemodelistraineditwillacceptthe audioasinputandAudioconvertedintomelspectrograms forfurtheranalysisandresult.

Model Architecture: CNN with convolutional, pooling, and denselayers.

6.2ImageDeepfakeDetection

Face Detection: MTCNN to locate and crop facial regions. Oncetheimageiscroppedthenittransfertothemodelto detecttherealityofimage,whetherismanipulatedornot.

Model: InceptionResNetV1 fine-tuned for deepfake classification.

6.3VideoDeepfakeDetection:

Dataset: For training the video detection model Celeb-DF datasetisused.

Preprocessing:Oncethemodelistraineditacceptsthevideo as input and Extract video frames and faces to check the realityofvideo.

Model: ResNet for spatial feature extraction; LSTM to capturetemporaldynamics.

Volume:12Issue:04|Apr2025 www.irjet.net

6.4StreamlitUIIntegration:

A simple and lightweight web interface which helps the userstouploadanddetectthecontentrealityononeplace. User selects input type (image/audio/video), uploads the file,andviewstheresultinrealtime.

7. Result

The image model shows high sensitivity to facial manipulation artifacts .The audio model performs well on various types of voice spoofing .The video model successfullycapturesbothframe-levelandsequence-level manipulation. Performance metrics, including precision, recall, and F1-score, were also calculated for each model. The results indicate that the system is highly effective at distinguishing between real and fake media, with minimal falsepositivesandnegatives.

WebuiltaninteractivewebapplicationusingStreamlit,fully integrated with the trained model. The web application allowsusersto:

 Givetheinputasimage,audio,video.

 Optionallyprovidespeficdetailsfortheimageusing grad-cam.

 Displaytheresultasfakeorreal.

Fig4.OverallFlowchartofProject.

Fig7.1: UserInterfaceI

Fig7.2: Imagedetectionpredictionresult

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 p-ISSN:2395-0072

Volume:12Issue:04|Apr2025 www.irjet.net

8. Conclusion

This paper introduces a unified deepfake detection system that supports audio, image, and video inputs. Byusingspecializedmachinelearningmodelsforeach typeand integratingthemintoaneasy-to-useStreamlit

interface,thesystem becomes a practical tool for detecting deepfakes in real-world scenarios. The multi-modal approach enhances robustness, and results show strong performance across all tested media formats. The system demonstrated high accuracy and real-time performance, makingita valuable toolforapplicationsinsecurity,media, and social platforms. The proposed approach represents a significant step toward creating unified, scalable solutions for deepfake detection across various media formats. This models make the society people more informative and responsible to reduce the fake content proliferation from the society and world and eliminate the spread of defamationonanyperson.

9. Future Scope

A)ModelFusion:Combinetheoutputsofdifferentmodelsfor ensembledecision-making.

B)Cross-DatasetGeneralization:Improverobustnessagainst unseendatafromotherdeepfakegenerationtechniques.

C)MobileApp:Deploylightweightversionsformobileuse. D)

D)Real-TimeStreamingDetection:Extendvideodetectionto livestreams.

E)Explainability: Add explainable AI features to visualize manipulatedregionsorsuspiciousaudiosegments.

F)Modelsperformancecanbeenhancedsoitcanbemore accuratetodetectthecontentreality.

10. References

[1] Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, Matthias Nießner. (2019). FaceForensics++: Learning to Detect Manipulated Facial Images.IEEEConferencePublication

[2]YuezunLi,XinYang,PuSun,HonggangQiandSiweiLyu. (2020).Celeb-DF:ALarge-scaleChallengingDatasetforDeep FakeForensics.IEEEConferencePublication

[3] Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, RussHowes,MenglinWang,CristianCantonFerrer.(2020) The Deep Fake Detection Challenge (DFDC) Dataset. arXiv:2006.07397Vol4

[4]KaedeShioharaToshihikoYamasaki.(2022)Detecting DeepfakeswithSelf-BlendedImagesIEEE/CVFConference onComputerVisionandPatternRecognition(CVPR)

[5]Tolosana,R.,Vera-Rodríguez,R.,Fierrez,J.,Morales,A.,& Ortega-Garcia,J.(2020).DeepFakesandBeyond:ASurveyof FaceManipulationandFakeDetection.ArXiv:2001.00179.

Fig7.3: Audiodetectionresult

Fig7.3: Videodetectionresult

Volume:12Issue:04|Apr2025 www.irjet.net

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 p-ISSN:2395-0072

[6] Nicol’o Bonettini, Daniele Cannas, Sara Mandelli, Luca Bondi,PaoloBestagini,StefanoTubaro.(2020)VideoFace Manipulation Detection Through Ensemble of CNNs. 25th InternationalConferenceonPatternRecognition(ICPR)

[7]DariusAfchar,VincentNozick,JunichiYamagishiandIsao Echizen.MesoNet:aCompactFacialVideoForgeryDetection NetworkinarXiv:1809.00888v1[cs.CV]4Sep2018.

[8]A.Abbasi,A.R.R.Javed,A.Yasin,Z.Jalil,N.Kryvinska,and U. Tariq,“A large-scale benchmark dataset for anomaly detectionandrareeventclassificationforaudioforensics,” IEEEAccess,vol.10,pp.38885–38894, 2022.

Abbasietal.:PreparationofPapersforIEEETRANSACTIONS andJOURNALS[9]Z.Khanjani,G.Watson,andV.P.Janeja, “How deep are the fakes? focusing on audio deepfake: A survey,”arXivpreprintarXiv:2111.14203, 2021.

[10] A.Malik,M.Kuribayashi,S.M.AbdullahiandA.N.Khan, ”DeepFakeDetectionforHumanFaceImagesandVideos:A Survey,”inIEEEAccess,vol.10,pp.18757-18775,2022,doi: 10.1109/ACCESS.2022.3151186. S. Hochreiter and J. Schmidhuber,“Longshort-termmemory,”N.

[11]Raza A, Munir K, Almutairi M, “A novel deep learning approach for deepfake image detection” Applied Sciences 2022Sep29.

[12] Suratkar S, Kazi F, “Deep fake video detection using transferlearningapproach”ArabianJournalforScienceand Engineering.2023Aug2021

[13] Khalil, Hady A., and Shady A. Maged. "Deepfakes creation and detection using deep learning." 2021 InternationalMobile,Intelligent,andUbiquitousComputing Conference(MIUCC)IEEE,2021

[14] GuptaG,RajaK,GuptaM,JanT,WhitesideST,PrasadM. “A Comprehensive Review of DeepFake Detection Using Advanced Machine Learning and Fusion Methods” Electronics.2023Dec252020

[15] Passos LA, Jodas D, Costa KA, Souza Júnior LA, RodriguesD,DelSerJ,CamachoD,PapaJP.“Areviewofdeep learning‐basedapproachesfordeepfakecontentdetection” ExpertSystems.2022

[16] ChenB,LiT,DingW.“Detectingdeepfakevideosbased on spatiotemporal attention and convolutional LSTM”. InformationSciences.2022Jul1

[17]MasudU,SadiqM,MasoodS,AhmadM,AbdEl-LatifAA. “LW-DeepFakeNet:alightweighttimedistributedCNNLSTM network for real-time DeepFake video detection” Signal, ImageandVideoProcessing.2023Nov;17

[18] Saikia, Pallabi, et al. "A hybrid CNN-LSTM model for video deepfake detection by leveraging optical flow features." 2022 international joint conference on neural networks(IJCNN).IEEE,2022.

[19] Al-Dhabi, Yunes, and Shuang Zhang. "Deepfake video detectionbycombiningconvolutionalneuralnetwork(cnn) and recurrent neural network (rnn)." 2021 IEEE international conference on computer science, artificial intelligenceandelectronicengineering(CSAIEE).IEEE,2021

[20] ZhangT.“Deepfakegenerationanddetection,asurvey” MultimediaToolsandApplications.2022Feb

[21] Rebello,Lian,etal."DetectionofDeepfakeVideousing Deep Learning and MesoNet." 2023 8th International Conference on Communication and Electronics Systems (ICCES).IEEE,