Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 04 | Apr 2024

p-ISSN: 2395-0072

www.irjet.net

UNVEILING SPEECH EMOTIONAL SPECTRUM THROUGH SOUND USING CONVOLUTIONAL NEURAL NETWORKS Prof. H. Sheik mohideen1, A. Aslam Sujath2, P. Pradeep3, E. Selvakumar4 1Assistant Professor, Department of CSE, Government College of Engineering, Srirangam, Tamil Nadu, India 234UG student, Department of CSE, Government College of Engineering, Srirangam, Tamil Nadu, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Research in Speech Emotion Recognition (SER)

become increasingly important. Our SERS project aims to bridge this gap and create more intuitive and empathetic interactions between humans and machines. Ultimately, our focus is not just on innovation, but on improving the overall user experience. In navigating the complex landscape of human emotion, our voice emotion recognition system will play a key role in driving more meaningful and responsive interactions in the ever-evolving field of artificial intelligence.

has garnered significant attention, particularly in the realm of Human-Computer Interaction (HCI) with a focus on personal assistants and assistive robots. This method involves analyzing subtle tones and pitches in speech, utilizing aural cues to classify human emotions like calm, happy, sad, anger, fear, surprise, and disgust. Annotated datasets like RAVDESS facilitate this research, containing recordings of actors expressing various emotions. Deep learning techniques, especially convolutional neural networks (CNNs), are emerging as powerful tools for processing emotional speech signals. CNNs automatically learn hierarchical representations from raw data, making them adept at capturing complex patterns in audio signals. This approach enhances humantechnology interactions by enabling machines to recognize and respond to human emotions conveyed through language. Thus, the integration of SER into HCI research contributes to improving interactive computer systems' design.

1.1 CNNs in speech emotion recognition Convolutional neural networks (CNNs) are a valuable tool for speech emotion recognition (SER) due to their ability to effectively analyze the intricate patterns found in audio signals. Originally designed for image processing, CNNs have been successfully adapted to handle continuous data like audio, showcasing impressive performance in extracting both basic and advanced features crucial for emotion recognition. In the realm of SER, CNNs employ hierarchical feature learning to pinpoint variations in pitch, intensity, prosody, and other acoustic attributes that signify different emotional states. Moreover, CNN's parameter sharing approach aids in reducing the number of trainable parameters and enhancing the model's adaptability to diverse emotional expressions.

Keywords: Speech Emotion Recognition, Preprocessing, CNN Classification, Feature Extraction (MFCC, ZCR, CHROMA)

1. INTRODUCTION Human computer intelligence is an upcoming field of research which aims to make computers learn from experiences and decide how to a particular situation. This has resulted in improved interaction between users and the computer. With the help of certain algorithms and procedures, the computer can be made fit to detect various characteristics present in the audio sample and deduce emotion underlying. In the field of human-computer interaction, aims to develop a Speech Emotion Recognition System (SERS) using Convolutional Neural Networks (CNN). The selection of the RAVDESS dataset, which provides a diverse set of emotional audio samples, adds depth to our approach. The main goal is to train the system to classify spoken words into seven different emotions. By using deep learning techniques, particularly CNNs, we aim to equip machines with the ability to decipher human emotions and respond effectively. The importance of this project lies in its potential to improve applications ranging from virtual assistants to emotion recognition technology. As technology becomes more integrated into our daily lives, machines' ability to understand and adapt to human emotions will

Impact Factor value: 8.226

2. REALTED WORK In recent years, the application of speech emotion recognition has witnessed widespread adoption in the realm of human-computer interaction, offering machines the ability to comprehend and learn human emotions. However, despite significant advancements, the performance of emotion recognition systems falls short of researchers' expectations. Addressing this, two primary challenges in speech emotion recognition are identified: the identification of effective speech emotion features and the construction of a suitable recognition model. Previous studies have explored various feature parameters to enhance emotional recognition tasks utilized pitch frequency, short-term energy, formant frequency, and chaotic characteristics, constructing a 144dimensional emotion feature vector achieved encouraging results by combining energy, zero crossing rate, and firstorder derivative parameters for speech emotion recognition. Despite the progress, the challenge of high dimensionality and feature redundancy persists, necessitating the filtration

ISO 9001:2008 Certified Journal

Page 2274