A SURVEY ON DEEPFAKES CREATION AND DETECTION
Rajnandini Santosh Bhingare , Sunil HirekhanDepartment of Electronics and Telecommunication, Government College of Engineering, Aurangabad. ***
Abstract: Intheevolutionaryworldoftechnology,Deeplearninghasbeenskillfullyemployedtoresolvedifferentkinds of high ranging problems which human cannot resolve. However Deeplearning have also been applied to generate fake content which can challenge national democracy, privacy, security threats etc. One of those deep learning impowered approach is “DeepFake”. DeepFake technique can generate fake images and videos that humans cannot make difference betweenforgedandrealmedia.Thismayleadtothreateningtoworldsecurityaswellasprivacy.Themalicioususeofthis techniqueexceedingthanpositiveusedaybyday.Topreventandcontrolthisthreatvariousresearchesworkedtodetect itforresolvingtheproblem.
In this survey paper, we are going to see manipulation techniques, types of DeepFake creation and detection with referencetopreviousworkonDeepFakes.
I. INTRODUCTION
By using digital manipulation technique Fake images and videos which include facial data bring about specifically by DeepFake techniques. The word DeepFake refers to Deep Learning and Fake. Deepfake is a method in which the fake imagesorvideoscanbecreatedbyswitchingthefacesofapersontothefaceofanotherperson.Deepfakeimagescanbe createdbyusingeasytohandleandimposingtoolslikeGAN(GenerativeAdversial Network).Tohandlethepublicbelief manyunrealisticthingslikefakenews,celebrityvideosswappedfacesimagescanbecreatedbyusingDeepFakemethod [1],[2].
In 2017, the first deepfake video was created by switching the face of a well known person to the porn actor. For misreporting purpose, videos of famous leader speeches were created and this was frightening to the world reliability. Some DeepFakes are not very laborious to detect as they are created for the entertaining purpose. Nevertheless, finding thedigitalcohesionistoughiftheDeepfakeimageorvideoincludesordinaryoracommonperson.
Research area is enormously increasing, for the detection of DeepFaake images and videos. Searching out the facts in digital field, it is more and more condemnatory. This fake detection is carried out by some international projects like DARPA (Defense Advanced Research Project Agency) which supported MediFor (Media Forensics). Also, NIST (National InstituteofStandardsandTechnology)startedMediaForensicsChallenge(MFC18). FortheDeepfakeease,Facebookhas beenoperatingon
detecting models and their attempts are accelerating the detection and verification. Lately, Facebook undertaken the DeepFakeDetectionChallenge(DFDC)COLLABORATEDwithMicrosoftandhighgearedAImodelinchallengecoulddetect artificialvideoswith82.56%accuracy.[3]
II. DeepFake Manipulation Techniques
Dependinguponthelevelofthemanipulation,thefacialmanipulationscanbeclassifiedintofourcategories:namely, 1] EntireFaceSynthesis2]IdentitySwap3]AttributeManipulation4]ExpressionSwap
1. Entire Face Synthesis:
ByusingasignificantDeeplearningtechniquei.e.,GAN;thistechniquebyT.Karrasetal.generatescompletelyunrealface images.RecentlyStyleGANgeneratedanexcellentfaceimagewithadvancedreality.[7]
2. Identity Swap:
Theperspectiveofthisidentityswapiscarriedoutwiththehelpoftwodifferent methods: i)FaceSwapii)DeepFake.In thistypeofmanipulation,thefaceofasourcepersoninvideoisswappedwiththefaceoftargetperson.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
3. Attribute Manipulation:
For this manipulation StarGAN is used by Y. Choi et al [9]. This technique is carried out by E. Gonzalez et al [8]. for changing the features of facelikehair or skin color, gender ofa person, adding or removing glasses, etc., This method of manipulationisalsoknownasfacemodifyingtechnique.
4. Expression Swap: Generally,thistechniqueisfocusesonFace2FaceandNeuralTexturesbyJ.Thiesetal[10].In thistypeofmanipulationexpressionsaremodified,i.e.,expressionofasourcefaceswappedwiththeexpressionoftarget face.[11]
Fig1: example of Entire face synthesis from http://www.whichfaceisreal.com/ and fake images from https://thispersondoesnotexist.com., Attribute manipulationreal images are extracted from http://www.whichfaceisreal.com/andfakeimagesaregeneratedusingFaceApp,identity swap,faceimagesareextracted fromCeleb DFdatabase,ExpressionSwapimagesareextractedfromFaceForensics++.
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
III. Generation of DeepFakes
Day by day, number of techniques are introduced for manipulate perceptible denotations. Most of the users are having differentcomputerabilities andhighqualityofamendedvideoscauseDeepFakehaveeventuallyfavored.Forgenerating fakecontentmostusualproceduresareappending,detachingoreliminatingitemsfromanimagearewidelyinuse.
Fig2:ImageandVideoManipulation.[1].
ToincreasethereasonablecontentinperceptibleaspectOperationslike,splicing,copy move,inpainting[Splicing:putting anewobjectbyreprintingitthroughanotherimage.Copy move:
Reprintinganobjectfromsameimage.Inpainting:tocompleteanimagemissingdataorinformationisfilled.]canbedone byvariousDeepFaketoolswhichareextensivelyinuse.
Deep learning based approaches (like GAN) [12 14] are popular for its effectiveness of serving complicated and high spatial information. A deep network called auto encoder decoder is the suitable alternative of deep learning which is extensively useable for spatial declining and image contraction. This method of generating DeepFake was first used by Reddit with FakeApp where encoder decoder pair was applied [15], [16]. In this method, an encoder is designed for reducing image outline and decoder is designed for regeneration of face image. For switching the source face with the targetface,twopairsencoder decoderisneeded.Eachpairofencoder decoderisusedtoinstructanimageset.Thespatial characteristicsofencoderareaggregatedwithintwopairsofnetworks.
In short, auto encoder and GAN by H. Haung et al [17]. accepted to update the powerful explanation unusually for face manipulationwhileanexcellentlevelofrealisticimagehasbeenattained.
E.Zakharov et al [20]. introduced image tampering can be attained with the help of sketch or T. Park et al [21]. a text description. For the manipulation of an image StyleGAN accepted to modify the painting style, swapping of apples and orangesbyP.Isolaetal[22].ForexpressionmodificationswappingfacesmanymethodsarelaunchedbyY.Choi,etal[23]. C.Chanetal[24].newlylaunchedmethodsaresubjecttothemotiontransformationfromsourcepersontotargetperson.
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
Fig3:Encoder Decoderpair.Tocreatedeepfake2pairsofencoderanddecoderareused.Infirstpairoriginalfacesare encodedandinsecondpairFaceofAisdecodedwithfaceofB.[3]
IV. Detection of DeepFakes
Nowadays, a threat of generating tampered media using DeepFake tools become a critical question. Unusually, non professional individuals can also easily generate fake videos as enough information is obtained from internet. With DeepFaketools,non existingfacescanbecreatedbyusingGANtechniqueandtamperingof facesinvideoclipslineupfor modifying specifications. As soon as this threat of DeepFake came into picture detecting DeepFake methods have been launched.Theforgedvideosynthesisactionwasacquiredbynaturalattributesknownashandcraftedfeatureswhichwere derivedfromtracesanddifferencesinuntimelyexperiments.Fordetectionofforgedimagesandvideosdifferentmethods areusedwhichareasfollows:
A.DetectionofForgedImage
The faces from images can be swapped by using the set of data from the abundance. In video fusion, face swapping has moreproposalslikeconversionintopicturesandunusuallyinsecuritypurposeswhichleadstofascinate.
The Deep learning approaches like CNN and GAN are used for swapping faces, this technique has made face swapping fartherdifficulttoverify.Zangetal[25].adaptedthebagofwordstociteasetofattributesandprovideditintoavariety of classifiers like SVM, Random Forest (RF) AND Multilayer Perceptron (MLP) for distinguishing between forged faces fromthereal.
The DeepFakes which are generated by using GAN technique are more complicated and not easy to verify as they are genuineorforged.X.Xuanetal[26].replacedanotherapproachinwhichpre and post processingfunctionsareuseable forexpansionofinformationandtoenhancetheportability.
TheperformanceexhibitsasCNN createdimagesdistributessomeconventionaldefectswhichpermitsonetoartifactsits nature even over concealed structure, details and methods for training. In inconsistent right plane, CNN which includes universalimageconsistencydatawhichexhibitspositiveconceptionworkapplicationthatisanotherexplanationZ.Liuet al[27].
Fig4:Eyescolorisdifferent(top),Teethareabruptlyshaped.[1]
B. Detection of Forged Videos
The fame information is powerfully derogated after video compression hence utmost image detection techniques cannot be applicable for video. Some of the methods are designed only for motionless frames and video frames have sensual attributesandthesearedifferentamongframesets.Actually,thefakefaceswhicharecreatedwithCGIanddeeplearning tools are not much distinct hence, they both are deficient in distinctive appearances which are characteristics of human facescapturedbygenuinecameras.
D. T. Dang Nguyen et al [28] work’s detection counts dimensional transitory distortion of a 3D model which applies for faces.Specially,realfacesarecomplicatedandhavedifferentdimensionaldirectionsandthisencouragesmoreconcernof 3D model. There are two methods to detect manipulated videos which are as: detection using sequential attributes throughframesandmethodsuchthatinvestigatesintoframesusingDeeplearning.
1. Sequential attribute along video frames:
Manipulatingtheutilization ofspatio sequential attributesofvideoseries to expose deepfakes. Sabiretal [28]. reviewed thatsequentialreasoningisnotimposedefficientlyintheemulsionprocedureofdeepfakes.
Byusingsomedistinctvisualartifacts,deepfakesleveragesinvideocanbedescribed.Thesetypesoffakescanbeseenin the eyesandteeth.For example, missing or depictedaswhitespot in eyeorabruptlyshapedteethwhichcan beseenas whitespotsasshowninfig.(4).ThiswasobservedbyF.Maternetal[30].
Y.Li,etal[31].introducedasystemwhichisdependsonblinkingofeyeandhasadistinctrateandtimeperiodinpersons whichis not emulated in forgedvideos. Some other methodsare depended on deformation fragments[32], countenance featureposition[33]orheadpostureunreliability[34].
In [32], to match the source face in video more deformed fragmentations are needed as deepfake techniques can only created finite closure images. CNN works on face section and its adjacent region. Nevertheless, deformation permits unusualcloneswhicharedetectedbyCNN.
In[33],GANbasedsystemcancreatehighqualityofrealisticfacesandwithlotsfactsbutonlydeficientanexactdiscipline overareasofsomepartsofface.Becauseofthisareaoffacialsection,suchaseyes,mouthandnosecanbeemployedasthe discriminativefeaturesfordetectingthegenuinenessoftheGANimages.
Themainexpediencyofthistechniqueisthatthevisualartifactsarenotstrainedbyresizingandcompression.Also,some fakemediacanbeidentifiedbymeansofhandcraftedsolutionswhichincorporatedeclinedrisk.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
2. Detection of Video frames using Deep learning
To detect deepfake videos the methods using sequential attributes along frames are depend on deep recurrent network models.Generally,videoframesaredefragmentedand evaluatesinsidea singleframetoachievediscriminantattributes. Theseattributesarethenallocatedtodeeporshallowclassifierstodistinguishbetweenfakeandrealisticvideos.
Ingeneral,deepfakevideosaregeneratedwithfiniteclosure,whichneedsimilarfacedeformationtoachievetheparallel patternofrealfaces.In[35],twoelementarysurveyinghasfourlevelsofpoolingandconvolutionsandthendevelopedby robust system with one latent layer. The second surveying is alternatively derived from a different outset that has extendedconvolutions.
V. Conclusion
Ongoingachievementsofdigitalleverages,especiallyDeepFakesthissurveydealswiththemanipulationtype,methodsto generateDeepFakesanddetectDeepFakeimagesandvideosseparately.
Generally, most of the manipulating faces can be detect and controlled easily as they may be generated with CGI. In fact, the fake media generated using deep learning tools then detection task might be difficult. However, this scenario can be changedwiththehelpofcontinuouslyimprovingdetectiontechniques.
To provide robust tool, alternative for traditional image/video detectors, Tursman et al. introduced a real time system [36].Suchachievementcanfurtherdefendmediadenotationsfromharm.
REFERENCES
[1] L. Verdoliva, "Media Forensics and DeepFakes: An Overview," in IEEE Journal of Selected Topics in Signal Processing,vol.14,no.5,pp.910 932,Aug.2020,doi:10.1109/JSTSP.2020.3002101.
[2] Tolosana, Ruben & Vera Rodriguez, Ruben & Fierrez, Julian & Morales, Aythami & Ortega Garcia, Javier. (2020). DeepFakes and Beyond: A Survey of Face Manipulation and Fake Detection. Information Fusion. 64. 10.1016/j.inffus.2020.06.014.
[3] Nguyen,Thanh&Nguyen,CuongM.&Nguyen,Tien&Nguyen,Duc&Nahavandi,Saeid.(2019).DeepLearningfor DeepfakesCreationandDetection:ASurvey.
[4] [4]Bloomberg (2018, September 11). How faking videos became easy and why that’s so scary. Available at https://fortune.com/2018/09/11/deepfakes obama video/
[5] Chesney, R., and Citron, D. (2019). Deepfakes and the new disinformation war: The coming age of post truth geopolitics.ForeignAffairs,98,147.
[6] Schroepfer, M. (2019, September 5). Creating a data set and a challenge for deepfakes. Available at https://ai.facebook.com/blog/deepfakedetection challenge.
[7] T. Karras, S. Laine, and T. Aila, “A Style Based Generator Architecture for Generative Adversarial Networks,” in Proc.IEEE/CVFConferenceonComputerVisionandPatternRecognition,2019.
[8] E.Gonzalez Sosa,J.Fierrez,R.Vera Rodriguez,andF.AlonsoFernandez,“FacialSoftBiometricsforRecognitionin the Wild: Recent Works, Annotation and COTS Evaluation,” IEEE Transactions on Information Forensics and Security,vol.13,no.8,pp.2001 2014,2018.
[9] Y.Choi, M. Choi, M. Kim, J. Ha,S. Kim, and J. Choo, “StarGAN:Unified GenerativeAdversarial Networks for Multi DomainImageto ImageTranslation,”inProc.IEEE/CVFConferenceonComputerVisionandPatternRecognition, 2018.
[10] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner, “Face2face: Real Time Face Capture and ReenactmentofRGBVideos,”inProc.IEEE/CVFConferenceonComputerVisionandPatternRecognition,2016.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
[11] J.Thies,M.Zollhofer,andM.Nießner,“DeferredNeuralRendering:¨ImageSynthesisusingNeuralTextures,”ACM TransactionsonGraphics,vol.38,no.66,pp.1 12,2019.
[12] Punnappurath, A., and Brown, M. S. (2019). Learning raw image reconstruction aware deep image compressors. IEEETransactionsonPatternAnalysisandMachineIntelligence.DOI:10.1109/TPAMI.2019.2903062.
[13] Cheng, Z., Sun, H., Takeuchi, M., and Katto, J. (2019). Energy compaction based image compression using convolutional autoencoder. IEEE Transactions on Multimedia. DOI: 10.1109/TMM.2019.2938345. [14] Chorowski,J.,Weiss,R.J.,Bengio,S.,andOord,A.V.D.(2019).Unsupervisedspeechrepresentationlearningusing wavenet autoencoders. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 27(12), pp. 2041 2053.
[14] Faceswap:Deepfakessoftwareforall.Availableathttps://github.com/deepfakes/faceswap
[15] FakeApp2.2.0.Availableathttps://www.malavida.com/en/soft/fakeapp/
[16] T.Karras,S.Laine,andT.Aila,“Astyle basedgeneratorarchitectureforgenerativeadversarialnetworks,”inIEEE ConferenceonComputerVisionandPatternRecognition,2019,pp.4401 4410
[17] H. Huang, P. Yu, and C. Wang, “An introduction to image synthesis with generative adversarial nets,” arXiv:1803.04469v2,2018.
[18] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,”inInternationalConferenceonLearningRepresentations,2018.
[19] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, “Few shot adversarial learning of realistic neural talking headmodels,”arXivpreprintarXiv:1905.08233v2,2019.
[20] T. Park, M. Y. Liu, T. C. Wang, and J. Y. Zhu, “Semantic image synthesis with spatially adaptive normalization,” in IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2337 2346. [22] P. Isola, J. Y. Zhu, T. Zhou,andA.A.Efros,“Image to imagetranslationwithconditionaladversarialnetworks,”inIEEEConferenceon ComputerVisionandPatternRecognition,2017.
[21] J. Y. Zhu, T. Park, P. Isola, and A. Efros, “Unpaired image toimage translation using cycle consistent adversarial networks,”inIEEEInternationalConferenceonComputerVision,2017.
[22] C. Chan, S. Ginosar, T. Zhouy, and A. Efros, “Everybody dance now,” in International Conference on Computer Vision,2019.
[23] Zhang,Y.,Zheng,L.,andThing,V.L.(2017,August).Automatedfaceswappinganditsdetection.In2017IEEE2nd InternationalConferenceonSignalandImageProcessing(ICSIP)(pp.15 19).IEEE.
[24] X.Xuan,B.Peng,W.Wang,andJ.Dong,“OnthegeneralizationofGANimageforensics,”inChineseConferenceon BiometricRecognition,2019.
[25] Z. Liu, X. Qi, and P. Torr, “Global texture enhancement for fake face detection in the wild,” arXiv preprint arXiv:2002.00133v3,2020.
[26] D. T. Dang Nguyen, G. Boato, and F. De Natale, “3D model based video analysis for computer generated faces identification,”IEEETransactionsonInformationForensicsandSecurity,vol.10,no.8,pp.1752 1763,Aug2015.
[27] T. Bianchi and A. Piva, “Image forgery localization via block grained analysis of JPEG artifacts,” IEEE Trans. Inf. ForensicsSecurity,vol.7,no.3,pp.1003 1017,2012.
[28] F.Matern,C.Riess,andM.Stamminger,“Exploitingvisualartifactstoexposedeepfakesandfacemanipulations,”in IEEEWACVWorkshoponImageandVideoForensics,2019.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072
[29] Y.Li,M. C.Chang,andS.Lyu,“InIctuOculi:ExposingAIcreatedfakevideosbydetectingeye,”inIEEEWorkshop onInformationForensicsandSecurity,2018.
[30] Y.LiandS.Lyu,“Exposingdeepfakevideosbydetectingfacewarpingartifacts,”inIEEECVPRWorkshops,2019.
[31] X.Yang,Y.Li,H.Qi,andS.Lyu,“ExposingGAN synthesizedfacesusinglandmarklocations,”inACMWorkshopon InformationHidingandMultimediaSecurity,June2019,pp.113 118.
[32] X.Yang,Y.Li,andS.Lyu,“Exposingdeepfakesusinginconsistentheadpose,”inIEEEInternationalConferenceon Acoustics,SpeechandSignalProcessing,2019.
[33] D.Afchar,V.Nozick,J.Yamagishi,andI.Echizen,“Mesonet:a compactfacial videoforgerydetection network,”in IEEEInternationalWorkshoponInformationForensicsandSecurity,2018,pp.1 7.
[34] G. Huang, Z. Liu, L. van der Maaten, and K. Weinberger, “Densely connected convolutional networks,” in IEEE ConferenceonComputerVisionandPatternRecognition,2018,pp.4700 4708.