International Research Journal of Engineering and Technology (IRJET) Volume: 12 Issue: 11 | Nov 2025
www.irjet.net
e-ISSN: 2395-0056 p-ISSN: 2395-0072
Deepfake Audio Detection Using Machine Learning Manjushree K R1, Anusha M2, Dheeraj K U3, Harshavardhan K B4, Pooja Balaganur5 1Assistant Professor, Information Science and Engineering, Bapuji Institute of Engineering and Technology,
Davangere, affiliated to VTU Belagavi, Karnataka, India.
2.3.4.5 Bachelor of Engineering, Information Science and Engineering, Bapuji Institute of Engineering and
Technology, Karnataka, India ------------------------------------------------------------------------------***--------------------------------------------------------------------------------Abstract - The rapid evolution of artificial intelligence has enabled the creation of highly realistic synthetic audio, commonly
referred to as deepfake audio. Such fabricated speech poses serious threats to privacy, security, and trust in digital communication. This paper presents a machine learning–based framework for detecting deepfake audio by analyzing extracted acoustic features. The proposed system utilizes Mel Frequency Cepstral Coefficients (MFCCs) to capture critical frequency-based characteristics of audio samples. A Random Forest classifier is trained on a curated dataset containing both authentic and manipulated speech samples to distinguish real from synthetic voices. The model is integrated into a Flaskbased web interface that allows users to upload audio files, visualize playback, and receive real-time prediction results. Experimental results demonstrate that the system achieves high accuracy in identifying fake audio samples while maintaining low false-positive rates. The approach provides an efficient and scalable solution for enhancing the reliability of digital voice data and can be extended to multimedia deepfake detection in the future. Key Words: Deepfake Audio, Machine Learning, Audio Forensics, Random Forest Classifier, Feature Extraction, Flask Web Application, MFCC Features.
1. INTRODUCTION The rapid growth of artificial intelligence and deep learning has revolutionized content creation in speech, images, and video. However, these advancements have also led to serious security and ethical challenges, particularly through the rise of deepfake audio—synthetically generated voices that closely mimic real human speech. Such audio can be misused for fraud, misinformation, impersonation, and evidence manipulation, making manual or traditional verification methods inadequate. To address these risks, this work proposes a lightweight machine learning–based framework for deepfake audio detection. The system uses robust preprocessing techniques and is deployed through a Flask web application, enabling users to upload audio, analyze authenticity, and visualize results easily. The framework aims to provide an accessible and effective solution to counter the growing threat of audio- based deepfakes. The paper is organized as follows: Section II reviews related research, Section III describes the proposed methodology, Section IV presents experimental results, and Section V concludes with future directions.
1.1 Description Audio deepfake detection has become a crucial research area as AI-driven voice cloning, text-to-speech systems, and neural vocoders increasingly enable the creation of highly realistic synthetic speech. These deepfakes pose serious risks, including identity theft, fraud, misinformation, and biometric spoofing, making reliable detection essential. The goal of audio deepfake detection systems is to differentiate genuine speech from artificially generated or manipulated audio using advanced signal processing and machine learning techniques. Modern approaches employ features like MFCCs, spectrograms, and temporal patterns, combined with deep learning models such as CNNs, RNNs, and transformer-based architectures, trained on diverse real and fake audio datasets to identify subtle artifacts undetectable to human listeners.
1.2. Existing System 1. 2. 3. 4. 5.
Existing deepfake audio detection systems use signal processing, machine learning, and deep learning approaches. Common detection features include spectrograms, MFCCs, prosodic cues, and phoneme timing irregularities. Deep learning models such as CNNs, LCNNs, and RawNet2 are commonly employed. Benchmark datasets like ASVspoof and WaveFake are used for training and evaluation.
© 2025, IRJET
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 673