Skip to main content

Distinguishing AI-Generated Voices from Human Voices Using Spectral Analysis

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 07 | Jul 2025

p-ISSN: 2395-0072

www.irjet.net

Distinguishing AI-Generated Voices from Human Voices Using Spectral Analysis Agasthya Bhatia1 1Dhirubhai Ambani International School, Mumbai, Maharashtra, India

---------------------------------------------------------------------***--------------------------------------------------------------------The motivation for this research stems from the increasing Abstract - This study investigates the effectiveness of

sophistication of AI voice synthesis systems and their potential misuse in various applications. Recent advances in neural voice synthesis, particularly with models like WaveNet, Tacotron, and more recent transformer-based approaches, have made it increasingly difficult to distinguish synthetic speech from human speech using traditional methods [4][5].

spectral analysis techniques in distinguishing between AIgenerated and human voices. Using frequency spectrum data from three human voice samples and five AI voice generation systems (Apple Translate, Google Translate, ElevenLabs, Murf Labs, and Natural Readers), we conducted comprehensive spectral feature analysis including spectral centroid, bandwidth, rolloff, skewness, kurtosis, Shannon entropy, and high-frequency content ratios. Our findings reveal significant distinguishable patterns between human and AI voices, with particular differences in spectral centroid distribution, entropy levels, and high-frequency content. The analysis demonstrates that spectral analysis alone can provide moderate to strong distinguishing capabilities, with an overall classification potential of approximately 65-75%. Results show that human voices exhibit broader frequency utilization (4206 Hz average rolloff vs 2903 Hz for AI), higher spectral complexity (6.387 vs 6.017 bits entropy), and more natural high-frequency content (29.78% vs 21.96%). The study validates the hypothesis that human voices demonstrate more "free-flowing frequency patterns" compared to AI systems.

1.1 Research Objectives The primary objectives of this study are:

Key Words: Voice synthesis detection, spectral analysis, AI voice generation, deepfake detection, digital forensics, frequency domain analysis

To evaluate the effectiveness of statistical spectral analysis for voice authentication

To develop threshold-based classification methods using spectral features

To validate the hypothesis that human voices show more natural frequency distribution patterns

2. LITERATURE REVIEW Spectral analysis has been widely used in audio signal processing for voice characterization. The detection of synthetic speech has become increasingly important with the advancement of neural voice synthesis technologies [6].

Previous research has explored various approaches to synthetic voice detection, including machine learning techniques with complex neural networks and multi-modal analysis [2][3]. However, the computational complexity of these methods often limits real-time applications. This study focuses specifically on statistical spectral analysis methods that could provide efficient, interpretable, and implementable solutions for voice authenticity verification.

Impact Factor value: 8.315

This research focuses on analyzing frequency spectrum characteristics of a controlled dataset using the same speech content (Martin Luther King Jr.'s "I Have a Dream" speech excerpt) across all samples to ensure content consistency and eliminate content-based variations in the analysis.

The rapid advancement of AI voice synthesis technology has created an urgent need for reliable detection methods to distinguish artificial voices from human speech. Modern text-to-speech (TTS) systems can produce increasingly realistic synthetic voices, raising concerns about potential misuse in deepfakes, fraud, and misinformation campaigns [1]. This study addresses the fundamental research question: Can spectral analysis alone effectively distinguish between AI-generated and human voices?

|

To identify unique spectral characteristics that differentiate AI-generated and human voices

1.2 Research Scope

1.INTRODUCTION

© 2025, IRJET

2.1 Spectral Features in Voice Analysis Key spectral features commonly employed in voice analysis include [7][8]: Spectral Centroid: Represents the "center of mass" of the spectrum, perceptually related to the brightness of sound. It

|

ISO 9001:2008 Certified Journal

|

Page 161


Turn static files into dynamic content formats.

Create a flipbook
Distinguishing AI-Generated Voices from Human Voices Using Spectral Analysis by IRJET Journal - Issuu