International Research Journal of Engineering and Technology (IRJET) Volume: 13 Issue: 02 | Feb 2026
www.irjet.net
e-ISSN: 2395-0056 p-ISSN: 2395-0072
Analysis of Framework for Robust Gender Recognition from Speech Signals Digambar B. Gote1 , Prof. Dr. T. B. Mohite-Patil2 ME (E & TC) Student, D. Y. Patil College of Engg. & Technolog, Kolhapur Prof. Dr.T.B.Mohite-patil D. Y. Patil College of Engg. & Technology Kolhapur, ------------------------------------------------------------------------***------------------------------------------------------------------------Motivated by these challenges, this work presents a Abstract: Speech-based gender recognition is a fundamental
compact and explainable speech-based gender recognition framework that synthesizes insights from recent deep learning and optimization-driven approaches. The proposed pipeline emphasizes effective acoustic representation, attention-based feature refinement, and systematic analysis of robustness and interpretability. Rather than introducing excessive architectural complexity, the focus is placed on identifying and validating a minimal yet effective set of design choices that contribute to reliable gender discrimination across varied acoustic conditions.
paralinguistic task with wide applicability in speech-driven human–computer interaction, assistive technologies, and intelligent voice services. Despite significant progress achieved through deep learning, existing methods often suffer from limited robustness to noise and channel variability, sensitivity to utterance duration, and poor interpretability of model decisions. This work proposes a compact and explainable framework for gender recognition from speech that emphasizes effective acoustic representation and attentiondriven feature refinement. Log-Mel and cepstral features are analyzed in conjunction with a lightweight convolutional neural network augmented by an attention mechanism to selectively emphasize informative spectro-temporal regions. A focused experimental analysis evaluates the impact of utterance duration, noise conditions, and channel mismatch on model behavior. In addition, attention-based visualization is employed to provide insights into the decision-making process, improving transparency and trustworthiness. The results demonstrate that the proposed framework achieves a balanced trade-off between robustness, efficiency, and interpretability, making it suitable for practical real-world deployment.
Contributions of this work are summarized as follows:
Keywords: Speech processing, Gender recognition, Attention-based learning, Acoustic feature analysis, Explainable AI
I. Introduction
II.
Speech-based gender recognition has emerged as an important paralinguistic task in speech processing, with applications spanning human–computer interaction, voicebased authentication, assistive technologies, and adaptive dialogue systems. Human speech inherently encodes genderrelated characteristics through physiological and behavioral factors such as vocal tract length, fundamental frequency distribution, formant structure, and speaking style. Advances in deep learning have significantly improved the ability to model these cues by learning discriminative representations directly from acoustic signals. However, recent studies reveal persistent challenges related to robustness under noisy and channel-mismatched conditions, sensitivity to utterance duration, and limited interpretability of model decisions. Moreover, the increasing reliance on complex architectures often leads to trade-offs between performance, computational efficiency, and transparency, which are critical considerations for real-world deployment.
© 2025, IRJET
|
Impact Factor value: 8.315
A lightweight attention-enhanced CNN framework for speech-based gender recognition.
A systematic analysis of key factors including feature representation, utterance duration, noise robustness, and channel mismatch.
An explainability-driven evaluation using attentionbased visualization to enhance interpretability and trust.
A consolidated experimental protocol that balances performance, robustness, and deployment feasibility.
Literature Survey
Review of Recent Speech-Based Gender Recognition Studies (2022–2025) Sindha and Rana [1] developed an optimized artificial neural network for vocal gender recognition by integrating a selfattention mechanism to emphasize gender-discriminative acoustic regions in time–frequency space. The work typically begins by converting a speech waveform into a shorttime time–frequency representation using the STFT,
|
ISO 9001:2008 Certified Journal
|
Page 325