Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 11 | Nov 2024

p-ISSN: 2395-0072

www.irjet.net

Image Classification using CNN and CIFAR-10 Pratham Kiran Mehta1, Rahul Jain2, Manan Batra3 1,2,3B. Tech student, Computer Science and Engineering, Vellore Institute of Technology, Tamil Nadu, India

---------------------------------------------------------------------***--------------------------------------------------------------------1.1 Different Models of CNN Used for Image Classification at the forefront of advances in the application of Computer

Abstract - Convolutional Neural Networks (CNN) have been Vision. Their automated and adaptive nature helps in the extraction of the hierarchical structures present in input images, allowing them to capture and interpret complex spatial patterns. This ability has made CNNs a powerful and useful tool in the field of image analysis. The work proposed in this paper reviews the use of CNNs for categorizing images into certain categories and outlines the methods that need to be employed for pre-processing this novel dataset to feed into the CNN model. In this regard, by considering the ten standard object classes of CIFAR-10 dataset, several CNN models were trained to classify these images and compared them against the others to show the effectiveness of each. When training and testing the model, its performance is quantitatively evaluated using measures such as accuracy, precision, validation loss and the loss function. These metrics define the degree of success of the selected architecture. This work contributes to the existing knowledge of CNNs applied to image classification tasks with new datasets and can serve as helpful suggestions in changing the basic structure for object recognition aims.

LeNet which was first introduced in 1998 by Yann LeCun and his co-workers Corinna Cortes and Christopher Burges was targeted for the handwritten digit recognition. LeNet is often described as the ‘Hello World ’of deep learning and is one of the first successful convolutional neural network (CNN) architectures. Its network architecture comprises of numerous convolutional layers, pooling layers, and fully connected layers. Particularly exceptional, the presented model has five convolutional layers followed by two fully connected layers. LeNet was the first to introduce CNNs in the field of deep learning for computer vision processes. However, the model above failed at first to learn due to what is known as the vanishing gradients problem. To rectify this, max-pooling layers were included to be added within the convolutional layers to minimize the size of the images. This is not only useful to avoid overfitting but also improves the training speed of CNNs. AlexNet was created by I. Sutskever, G. Hinton and A. Krizhevsky. There are similarities with the LeNet architecture at this level, with a larger number of layers and stacking of convolutional layers. In AlexNet architecture, it has five convolutional layers that are blended with max-pooling and other layers, three fully connected layers, and two dropout layers. In each layer there is an activation function of ReLU kind, and the output layer has an activation function of Softmax kind. In total, the architecture has about 60 million parameters.

Key Words: Convolutional Neural Network, CIPHAR-10, Image processing, Image classification, ResNet

1.INTRODUCTION Convolutional Neural Networks (CNN) are a subtype of deep learning algorithms that are mostly used to solve identification and detection problems, such as image recognition, detection, and division. These CNNs resemble other neural networks but the use of multiple convolutional layers makes them slightly complex. These convolutional layers use a function called a convolution, which is a form of matrix multiplication. This entails the use of small parts of the input data to extract the appearance characteristics, while at the same time preserving the spatial configurations of pixels.

ZFNet is a CNN architecture composed of convolutional neural networks and fully connected layers as well. It was created by Rob Fergus and Matthew Zeiler. Like AlexNet, ZFNet also follows the network architecture with several layers of convolutional layers and sets of pooling layers but the size of the middle convolutional layers has been tuned, as has the stride and the filter size of the first layer. The architecture is based on the model developed by Zeiler and Fergus which was used to train the models with the ImageNet data set. ZFNet consists of seven layers: a convolutional layer, a downsampling max-pooling layer, a concatenation layer, another convolutional layer that uses a linear activation function and which has a stride of one. To increase the regularization, dropout is implemented before the output layer which is a fully connected layer. After observing the article, ZFNet is more efficient in terms of computational requirements as compared to AlexNet because deconvolutional layers are placed in between CNNs that provide an approximation inference. GoogLeNet is designed by Jeff Dean, Christian Szegedy, Alexandro Szegegy

CIFAR-10 is another popular set of images that is maintained by the Canadian Institute for Advanced Research and is used to train most vision and machine learning models. This is one of the most popular datasets used in studies related to machine learning. CIFAR-10 comprises sixty thousand color images of 32×32 pixels and are classified into ten classes. These classes include airplanes, trucks, cars, frogs, dogs, cats, deer, and birds, and each class has 6000 images.

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 317