The aim of this project work is to propose a speech emotion recognition method based on speech features and speech
transcriptions (text). Modelling emotional behaviours is a challenging task due to the variability in perceiving and describing
emotions. We try to perform emotion analysis on the speech by collecting speech and textual features and applying a deep neural
network model which can classify the sentiments of the speech. Ideally, we would like to experiment with several deep neural
network models which take in different combinations of speech features and text as inputs. Speech features such as MelFrequency Cepstral Coefficients (MFCC) help retain emotion related low-level characteristics in speech whereas text helps
capture the semantic meaning, both of which help in different aspects of emotion detection.