International Research Journal of Engineering and Technology (IRJET) Volume: 11 Issue: 10 | Oct 2024
www.irjet.net
e-ISSN: 2395-0056 p-ISSN: 2395-0072
Analyzing Consumer Sentiment Using Big Data from E-Commerce Sites Maria Anurag Reddy Basani Texas A &M University, Corpus Christi -------------------------------------------------------------------------***----------------------------------------------------------------------Abstract This study presents a comprehensive sentiment analysis framework that integrates six widely spoken languages: English, Spanish, French, German, Chinese, and Arabic. Utilizing advanced DL models, including BERT, CNN, and LSTM, we evaluate their performance on a large-scale multilingual dataset comprising 2.5 million reviews, processed using big data technologies such as Apache Spark and Hadoop. Our results demonstrate that BERT achieved an impressive accuracy of 95% across all languages, significantly outperforming CNN, which achieved only 65% accuracy, and LSTM, which recorded a moderate accuracy of 80%. Notably, CNN exhibited particularly poor performance with Chinese and Arabic data, reflecting its difficulty in handling complex linguistic features. This research underscores the importance of leveraging big data to develop inclusive sentiment analysis models capable of effectively handling diverse linguistic contexts, setting a new benchmark in the field.
Keywords: Sentiment Analysis, Multilingual, Big Data, Deep Learning, BERT, Classification Performance, Natural Language Processing, CNN, LSTM, Cross-Language
1
Introduction
1.1
Background
Sentiment analysis, also referred to as opinion mining, involves extracting the general attitude of consumers toward particular topics by analyzing their expressed thoughts and opinions R. S. Kumar et al. (2021). In the context of ecommerce, vast amounts of user-generated content, such as reviews, ratings, and comments, are continuously generated Li et al. (2022). Traditional sentiment analysis methods are insufficient for handling such data at scale Wankhade et al. (2022). Tools like Apache Hadoop and Apache Spark have become essential for processing the large datasets produced by these platforms Gabdullin et al. (2024). These technologies enable e-commerce companies to efficiently analyze consumer sentiment from millions of reviews, which would be impossible with standard tools Ivanov et al. (2024). By leveraging distributed computing frameworks like Hadoop and Spark, businesses can manage unstructured data, including text reviews, from multiple sources Arif and Zeebaree (2024). These tools are crucial for performing large-scale sentiment analysis, as they process huge datasets in parallel, significantly reducing computation time Modi et al. (2024). For instance, a large dataset of product reviews can be processed in minutes using Apache Spark, allowing companies to gain real-time insights into customer preferences Yadav (2024). Moreover, integrating TensorFlow or PyTorch for Machine Learning (ML) tasks enables companies to apply advanced models, such as Deep Learning (DL), to extract sentiment from reviews more accurately. In addition to handling volume, these tools help process diverse types of data—such as text reviews and numerical ratings—enabling companies to perform sentiment analysis on both structured and unstructured data Yang et al. (2024). For example, Natural Language Processing (NLP) models built using TensorFlow can handle noisy, unstructured text data from customer reviews, addressing challenges such as spelling errors, short comments, and slang. These insights allow companies to tailor their products and services based on customer feedback, improving customer satisfaction and driving sales. This study leverages distributed computing frameworks and DL libraries like Apache Spark, HDFS, TensorFlow, and PyTorch to perform a large-scale sentiment analysis of consumer reviews from e-commerce platforms. Using datasets in Turkish, Arabic, and English, we explore the performance of ML and DL models, as well as pre-trained language models, in processing real-world, unstructured e-commerce data. 1.2
Motivation
The increasing amount of customer feedback generated on e-commerce platforms, including millions of reviews and ratings, necessitates specialized tools for effective analysis. Traditional methods fall short when it comes to handling the
© 2024, IRJET
|
Impact Factor value: 8.315
|
ISO 9001:2008 Certified Journal
|
Page 717