Comparative Study of Performance of K Nearest Neighbor and Support Vector Machine Classifiers in Sen by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 02 | Feb 2024

p-ISSN: 2395-0072

www.irjet.net

Comparative Study of Performance of K Nearest Neighbor and Support Vector Machine Classifiers in Sentiment Analysis Tula Kanta Deo1, Rajesh Keshavrao Deshmukh2, Gajendra Sharma3 1 Department of Computer Science and Engineering, Kalinga University, Naya Raipur, India 2Department of Computer Science and Engineering, Kalinga University, Naya Raipur, India

3 Department of Computer Science and Engineering, Kathmandu University, Kavre, Nepal ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract – Sentiment analysis is the most important branch

stores the entire training dataset in memory and performs computation only when a prediction is required. The choice of the hyper-parameter K (number of neighbors) is crucial in KNN. A smaller value of K leads to a more flexible decision boundary, potentially resulting in overfitting, while a larger value of K may lead to under-fitting. KNN is easy to implement and understand, making it an ideal choice for beginners and as a baseline model for comparison with more complex algorithms[2].

of natural language processing. It deals with the classification of text. The class can be positive, negative or other. This study evaluate and compare the performance of K Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers. The datasets used in this study are all_tweets dataset and financial phrase bank dataset. These datasets are preprocessed. The preprocessed datasets are split into 80% training and 20% testing subsets. The training dataset are used for feature extraction and training of the classifiers. The testing datasets are used for feature extraction and evaluation of the classifiers. The results and discussions of this study shows the performance of KNN and SVM is consistent with most of the studies. In this study, SVM outperform KNN.

KNN is intuitive and easy to understand, requiring minimal assumptions about the data. Since KNN does not build an explicit model during training, the training phase is fast and computationally inexpensive. KNN can handle both binary and multi-class classification problems and is robust to noisy data[3].

Keywords: Sentiment Analysis, K Nearest Neighbor, Support Vector Machine, Precision, Recall, Accuracy, F1 score.

KNN requires computing distances between the new instance and all instances in the training dataset, making it computationally expensive for large datasets. KNN is sensitive to outliers and noise in the data, which can affect the accuracy of predictions. KNN's performance deteriorates in high-dimensional feature spaces due to the curse of dimensionality[3].

1.INTRODUCTION Sentiment analysis(SA), also known as opinion mining, is a subfield of natural language processing (NLP) that deals with automatically determining the emotional tone of a piece of text. It aims to understand whether the text expresses positive, negative, or neutral sentiment towards a topic, entity, or event[1].

SVM is a widely used supervised learning algorithm known for its effectiveness in classification and regression tasks. SVM has gained popularity for its ability to handle linear and non-linear classification problems efficiently. SVM aims to find the hyperplane with the maximum margin, which represents the distance between the support vectors of different classes. This property makes SVM less sensitive to outliers and improves its generalization ability. SVM utilizes kernel functions such as linear, polynomial, radial basis function, and sigmoid to handle non-linear decision boundaries by implicitly mapping the input space into a higher-dimensional feature space. SVM introduces slack variables to handle misclassification errors and soft-margin classifiers, allowing for some instances to be misclassified to achieve better overall performance. SVM often yields sparse solutions, meaning the decision boundary depends only on a subset of the training data, making it memory-efficient and suitable for large-scale datasets[4].

A comparative study between the KNN and SVM classifiers for sentiment analysis involves evaluating the performance of these classifiers on sentiment classification tasks. The objective of this study is to evaluate and compare the performance of KNN and SVM classifiers on sentiment classification tasks. This includes assessing their accuracy, precision, recall and F1-score metrics to understand how well each classifiers performs in sentiment analysis. KNN classifier is renowned for its simplicity and effectiveness in classification tasks. It belongs to the family of instance-based algorithms, where predictions are made based on the similarity of new instances to known instances in the training data. KNN is a non-parametric algorithm; that is, it makes no assumptions about the underlying data distribution. This makes it versatile and applicable to a wide range of datasets. KNN is a lazy learning algorithm because it postpones the learning process until the prediction phase. It

Impact Factor value: 8.226

SVM performs well even in high-dimensional feature spaces, making it suitable for complex classification tasks

ISO 9001:2008 Certified Journal

Page 501