International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 05 | May 2022
www.irjet.net
e-ISSN: 2395-0056 p-ISSN: 2395-0072
A Survey on Emotion Recognition Using CNN Prof, S. S. Kale, Prachi Naidu, Priyanka Gaikwad, Akanksha Kaundinya, Sakshi Gajbhiye Computer Science, NBN Sinhgad School of Engineering, Ambegaon, Maharashtra, India ---------------------------------------------------------------------***------------------------------------------------------------------
Abstract - Many automated system applications, such as
robotics, artificial intelligence, and security rely heavily on face expression recognition. Accurately recognising facial expressions can be quite difficult for machines. It has a wide range of applications in the areas of training, online business, health, and security. This study examines a variety of CNNbased face expression recognition systems. It comprises methods proposed by various researchers. The study also demonstrates how to use CNN for FER. This paper also examines CNN-based techniques and problems to consider when deciding whether or not to use CNN to solve FER.) Key Words: CNN, FER, Emotion Recognition, Survey, Deep learning
1. INTRODUCTION One of the important ways humans interact is through facial expressions. In humans this ability is highly developed. Today, computers are used to automate everything. Developing the ability for computers to recognize emotions is a very popular subject. Emotion recognition can be very useful in marketing research, academics, robotics and security. This paper survey focuses on models that are aimed at real-time facial emotion recognition as static emotion recognition is not very useful. A study on various feelings of humans put forward by Darwin stated that a human's face is a significant factor on how humans communicate [1]. The six basic emotions which Ekman et al proposed are Fear, Disgust, Anger, Happiness, Sorrow and Surprised [2]. CNN, which is part of the Deep Learning system, may be used to learn facial expressions. The features like ears, eyes, mouth and hair play an important part in achieving the outcome. The extraction of face traits reduces the amount of time and resources required for the procedure without sacrificing important information. For this, task deep learning methods are used which has hidden layers and classification layers. Hidden layers are also called feature extraction layer. Hidden layer consists of convolution layer followed by pooling layer used for feature extraction. This is followed by classification part.
2. TRADITIONAL APPROACH Most papers have proposed their research using MLP (Multi-layer Perceptron Model), SVM (Support vector machine) and KNN (K-Nearest Neighbors). The difference between these traditional methods and CNN is that in old
© 2022, IRJET
|
Impact Factor value: 7.529
methods features need to be extracted manually whereas CNN learns these features to detect an emotion.
3. DATASET There are various datasets available online for different purposes. The most commonly used dataset was FER-2013 with almost 30,000 images. FER-2013 has 7 categories. (Sad, angry, happy, surprise, disgust, neutral, fear)
4. PRE- PROCESSING Images come in a range of sizes and colors. They also come from different sources. All adjustments on the raw data before it is input to the machine learning or deep learning algorithm are referred to as preprocessing. Sometimes, the data quantity we have is not sufficient to perform classification. We use data augmentation to solve that problem. Data augmentation can be done by flipping, rotating, adding noise, cropping, etc.
5. CNN ARCHITECHTURE After pre-processing the data, it is fed into CNN model. CNN is a deep learning algorithm; it is different from machine learning languages as it has hidden layers for image processing. It involves convolution operations. CNN has two main layers 1, Hidden layers 2. Fully connected layers. Hidden layers are used for feature extraction. Hidden layers also have two layers, convolution and pooling layers. The first layer is convolution layer. It is used to extract features from the input images. A filter of size NXN glides over the input image and a dot product is taken of that area. The output that is stored is called feature map feature. Pooling layer is followed by fully connected layers. The output from all the previous layers is flattened and fed into fully connected layer. This layer usually comes before output. In this layer classification process begins. To avoid overfitting, i.e., when a model works so well on training data that it has negative impact on testing data, we use dropout layers. Dropout layers drop nodes from the neural network randomly. Activation is the last component used to increase non-linearity of the output. Mainly, ReLu function is used in CNN. This function returns 0 if your value is negative or returns the same value you gave if it is between 0 to infinity.by pooling layer. The aim of this layer is to reduce size of the feature. Convolution layer is followed map which reduces computational cost and time. Max pooling takes the max value from the filter as the output whereas average pooling calculates the average. It reduces the size of the feature map without losing any important
|
ISO 9001:2008 Certified Journal
|
Page 115