Efficient Speech Emotion Recognition using SVM and Decision Trees

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 04 Issue: 07 | July -2017

p-ISSN: 2395-0072

www.irjet.net

EFFICIENT SPEECH EMOTION RECOGNITION USING SVM AND DECISION TREES T. V. Vamsikrishna1, P. Naga vyshnavi2 1Assitant

Professor, Computer Science and Engineering Department, Vignan’s lara, Andhra Pradesh , India, Science and Engineering Department, Vignan’s lara , Andhra Pradesh, India ---------------------------------------------------------------------***--------------------------------------------------------------------2Student,Computer

Abstract - Speech emotion recognition(SER) is a hot

research topic in the field of Human Computer Interaction (HCI). The interaction between human beings and computers will be more natural if computers are able to perceive and respond to human non-verbal communication such as emotions. In this paper- Speech Emotion Recognition Using Binary Support Vector Machines- seven emotional states are considered: anger, boredom, disgust, fear, happy, sad and neutral. The speech features extracted are variance, standard deviation, energy, pitch, timing, etc. Emotional Speech Corpus used in this work is Emo-DB and it contains 535 speech segments from 10 people, 5 male and 5 female. The speech segments are given as input to openSMILE for feature extraction. The extracted features may contain irrelevant features, so feature selection algorithms are used to eliminate those irrelevant features. The selected feature set is divided into training set and test set. Training set is used to train the classifier and the performance of the classifier is evaluated over test set. The overall evaluation results of the classifier over test set is 78.9% and the accuracy of the classifier over training set is 92.6%. Thus the average accuracy of the classifier is 85 percentage. Key Words: SER; Telugu Emo-DB; SVM

1.INTRODUCTION Speech is one of the most fundamental and natural communication means of human beings. With the exponential growth in available computing power and significant progress in speech technologies, spoken dialogue systems (SDS) have been successfully applied to several domains. The goal of affective interaction via speech, several problems in speech technologies, including low accuracy in recognition of highly affective speech and lack of affect-related common sense and basic knowledge, still exist. So, in order to accomplish the goal of affective communication through speech, emotions are considered. Emotions are fundamental for humans, impacting perception and everyday activities such as communication, learning, and decision making. They can be expressed through speech, facial expressions, gestures, and other nonverbal clues. For a

|

Impact Factor value: 5.181

|

better communication, emotions of the other person need to be recognized. Many applications can benefit from an accurate emotion recognizer. For example, customer care interactions (with a human or an automated agent) can use emotion recognition systems to assess customer satisfaction and quality of service (e.g., lack of frustration). For Speech Emotion Recognition, a better quality speech corpus is very important. From a lot of pre-recorded emotional speech corpuses, a German database, Emo-DB was selected to be used in this project. The speech samples from the corpus was given to a speech processing tool like Open SMILE to extract features like pitch, loudness, variance, and standard deviation, etc. Selected features are used to train a classifier for seven different classes (anger, boredom, and disgust, fear, happy, sad, and neutral) of emotions. The classifier used in this project is Support Vector Machine (SVM). The classifier is then tested using a validation set to assess the recognition performance.The main stages in the proposed system are Input data collection, Preprocessing, Feature extraction, Feature selection, Classification and Recognition.

2.SPEECH EMOTION recognition(SER) SER is the Speech signal is the fastest and most natural method of communication between humans. This fact has motivated researchers to think of speech as a fast and efficient method of interaction between human and machine.It involves the data collection, preprocessing, feature extraction, feature selection, classification and recognition. Data collection is a critical task in speech emotion recognition [2], and the collected data may contain lot of noises that must be cleaned in the preprocessing phase which involves filter,wrapper and embedded. The output obtained in the preprocessing phase is given as input to the next feature extraction phase. For classification Support vector machine is used and the selected features are trained in LIBSVM, a tool for SVM classifier. Classification can be thought as of two separate problems like binary and multiclass classification. In binary classification only two classes are involved, whereas ISO 9001:2008 Certified Journal

| Page 3343

Turn static files into dynamic content formats.

Create a flipbook