Sentiment Analysis of Amazon Customer Reviews: An Ensemble Learning Approach with Data Augmentation by IRJET Journal

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 11 Issue: 07 | July 2024

www.irjet.net

p-ISSN: 2395-0072

Sentiment Analysis of Amazon Customer Reviews: An Ensemble Learning Approach with Data Augmentation Abhinav Palanivel Student, IB DP Canadian Internation School, Bangalore, Dr. Nandini N. Associate Professor, Department of Computer Science and Engineering, Dr. Ambedkar Institute of Technology, Bengaluru, India. ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - This study examines the effectiveness of

Bayes, Random Forest Classifiers, Support Vector Machines, and Logistic Regression are strong machine learning algorithms, they may find it difficult to accurately capture these subtleties in order to tackle these issues, this work examines how well ensemble learning methods — more especially, stacking — work when combined with data augmentation to analyze sentiment in Amazon product evaluations. Since Amazon product reviews are so common and have so much data available, we decided to concentrate on them. Our goal is to create a strong model that can reliably classify consumer sentiments and negotiate the complexity of informal language by analyzing sentiment in this big dataset. Our hypothesis is that ensemble learning can outperform individual models by leveraging the strengths of numerous models, so overcoming their limits. In addition, we investigate data augmentation using resampling methods to possibly strengthen the model’s capacity to handle unknown data points and increase generalizability.

ensemble learning methods, specifically bagging and boosting, for data augmentation and sentiment analysis of Amazon product evaluations. Businesses have to evaluate the sentiment of online reviews in order to understand their customers’ needs and develop their products. Sentiment analysis is difficult, though, due to the use of casual language and subtleties like slang and sarcasm. We investigate the comparative performance of ensemble learning models over individual models and investigate how resampling strategies can enhance the algorithms’ capacity to handle unseen data points through data augmentation. Our results show that in the sentiment categorization of Amazon shoe reviews, Gradient Boosting performs better than other ensemble learning models such as Random Forest and Ada Boost. By demonstrating the value of ensemble techniques and data augmentation for processing informal language, this study advances sentiment analysis. Key Words: Gradient Boosting, Random Forest, Ada Boost, Bagging, Boosting and Ensemble Model

This paper makes several significant contributions to the field of sentiment analysis. We start by discussing the unique difficulties involved in sentiment analysis of Amazon product reviews. Secondly, we demonstrate the potency of ensemble learning combined with stacking as a strong solution to these problems. Third, we examine how data augmentation by resampling affects the model performance, offering important information about its possible advantages for this particular task. The rest of this paper will describe our all-inclusive methodology, show the outcomes of our trials, and evaluate the various models’ performances. After that, we discuss our findings, examining how they might affect sentiment analysis of Amazon product evaluations and suggesting possible lines of inquiry for future study.

1. INTRODUCTION Online reviews are now a priceless resource for both companies and customers. By the means of sharing consumer experiences and the provision of comprehensive product and service information, these evaluations assist prospective customers in making wellinformed selections. Businesses can gain important insights into consumer happiness, product strengths and shortcomings, and overall brand impression by conducting sentiment analysis on internet reviews. Businesses can improve the way develop their products, target their marketing campaigns more effectively, and ultimately increase customer happiness by carefully examining these reviews.

1.1 Literature review

Sentiment analysis of product evaluations on Amazon, however, poses a special difficulty. Due to the overwhelming amount of casual language and evaluations that users frequently use, typical machine learning models operate in a difficult context. Slang, irony, and conflicting feelings frequently mask the genuine sentiment expressed in a review, which can result in misunderstandings and imprecise sentiment analysis. Even though Gaussian Naive

Impact Factor value: 8.226

Sentiment analysis approaches are being studied by an increasing number of studies to better comprehend customer attitudes in online product reviews. Xing Fang et al. in 2015 presented a general sentiment analysis procedure for the purpose of categorizing the sentiment of Amazon product reviews. Promising findings are obtained in this work that investigates sentiment polarity categorization at the sentence and review level [1]. Xeenia

ISO 9001:2008 Certified Journal

Page 408