Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 11 | Nov 2024

p-ISSN: 2395-0072

www.irjet.net

Decoding Deepfakes: An LSTM-Driven Approach with Attention Mechanisms and Grad-CAM Explainability Aditya Aiya1, Nishant Wani1, Mayur Ramani1 1 School of Computer Science Engineering and Applications, DY Patil International University, Pune- 411035

---------------------------------------------------------------------***---------------------------------------------------------------------Addressing these issues has led the research Abstract:-Deepfake technology has alarmed a number of

community to propose numerous approaches for deepfake detection. Neural networks, including Convolution Neural Networks (CNN), Recurrent Neural Networks (RNNs), and hybrid methods. These models are based on spatial and temporal feature analysis of manipulated media in a video for dissimilarity detection. The development of deepfake detection is highly competitive and achieving both high accuracy and interpretability is overwhelming. This is particularly the case with the arms race of deepfake development that keeps advancing, thus finding it more and more complicated to trace.[2]

security and truth campaigners around the world because it makes possible to spread misinformation. Deep fake videos are increasingly becoming a problem and they can have serious consequences for how authentic our digital content is. In this work, we propose a deep learning architecture that fuses Bidirectional Long Short-Term Memory (Bi LSTM) networks with attention mechanisms so as to accurately determine whether videos are real or fake. In this work, we utilize the Celeb-DF (CelebDeepfakes) Dataset which consist of deepfake videos with high-quality to train and evaluate our model. By utilizing Grad-CAM (Gradient-weighted Class Activation Mapping), accompanied with the explanation of AI ethics, transparency and explainability all these measures we aim to provide visual explanations in order for help audience observe models' decision-making processes leading towards being more accountable. It can complete an accuracy of 91% and AUC:90% when robust havoc precision, recall, F1-scores. The results of this study showcase potential for explainable AI systems in deepfake detection through LSTM-based models with attention mechanisms and reinforce the necessity to prioritize interpretability as an attribute when designing reliable and fair AI systems.

The study proposes a hybrid Bidirectional ShortTerm Memory (LSTM) network with an Attention mechanism, which is able to replicate long-term dependencies, and also, emphasizes the most important components of an entire sequence. This research used BiLSTM-Attention model because LSTM can learn the temporal dynamics of sequences especially in the case of video data temporal changes is an important feature for manipulation detection. The architecture of this model helps in improving accuracy as well but also interpretability, which is essential for getting verification of how the detection system comes to a decision in the first place. The Attention mechanism enables the model to pay attention to video frames it deems relevant to the task, providing an interpretability of the media that is most contributing to the detection decision [3][4].

Keywords: DeepFakes, Celeb-DF, BiLSTM-Attention, Grad-Cam, Explainability

1. INTRODUCTION

The Celeb-DF dataset, which is used in this study, includes a wide variety of deepfake and natural videos of celebrity faces. This dataset is well-known for having difficult, high-quality deepfake samples and is therefore a top choice for benchmarking detection algorithm robustness. This particular dataset was chosen since it is the one which the study intends to use in general: to test various models on advanced manipulations such as those that change facial motions and temporal consistency across long videos. FaceForensics++ [5] and DeepFake Detection Challenge [6]have been previously used to evaluate many detection algorithms, and curational benchmarks[7][8].

The advent of deepfake technology in recent times has blown huge worries for many fields like digital communication and also media and security. Mostly videos, deepfakes are modified content designed with deep machine learning methods to look as though they are real, even though the entire video is fake. These technologies use deep learning models, mostly Generative Adversarial Network (GANs) and autoencoders, to change or add various features such as voice, emotion, facial expression, etc., making it progressively harder for us to tell the difference between genuine and fake videos. Consequently, it is also an increasing threat that deepfakes would be adopted as malicious behaviour from monetary crimes, privacy invasion, misinformation and political manipulation. That is why there is a need for such deepfakes detection system which is extremely reliable and precise [1]

Impact Factor value: 8.315

Deepfake detection has been key to the development of explainable AI (XAI). Transparency and interpretability are important, especially in the cases in

ISO 9001:2008 Certified Journal

Page 506