International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 09 Issue: 04 | Apr 2022
p-ISSN: 2395-0072
www.irjet.net
Derogatory Comment Classification Sudhanshu Chaurasia1, KMR Dayaasaagar2, Jayesh Girdhar3 ,Dhanesh.S4,Sunil Shelke5 1,2,3,4 UG
Student, Dept. of Computer Engineering, Pillai College of Engineering, New Panvel, India. 5 Assistant Professor, Dept. of Information Technology, Pillai College of Engineering, New Panvel, India. ---------------------------------------------------------------------***--------------------------------------------------------------------2. Literature Survey Abstract - Social media platforms have become a facet for people’s opinion and reviews, wherein people share their opinions on a wide variety of topics. However, people exploit this facility to take a dig at those with whom they don't find their opinions match with. People use this facility to post harmful, racial, gender biased, threatful comments. As a result, social media platforms are quickly becoming indispensable. They often struggle to facilitate conversation, effectively forcing many communities to shut down user comments. This motivates us to look into this problem and build a model which will detect and classify derogatory comments. We will collect the dataset from Kaggle and experiment with the help of deep learning approaches like Naïve-Bayes, SVM, LSTM and BERT. algorithms. Thereby, we will classify the derogatory comments. We will compare these algorithms and conclude which algorithm is more effective.
A. Base-Line Model:Naive NVB-SVM was developed by Sida Wang and Christopher Manning. Bayes (NB) and Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization .In this case Naive Bayes -SVM (NVB-SVM) provides more accuracy for further classification. B. BERT: which stands for Bidirectional Encoder Representations transformers. Bidirectional Encoder Representations from Transformers (BERT) is a NLP model that was designed to pretrain deep bidirectional representations from unlabeled text and, after that, be finetuned using labeled text for different NLP tasks [7]. That way, with BERT model, we can create state-of-the-art models for many different NLP tasks [7]. We can see the results obtained by BERT in different NLP tasks at [7].
Key Words: Language recognition, multilingual, offensive word, NBSVM
C. Toxic Comment Detection using LSTM This papers authors have developed a LSTM model to classify speech as hate or not with an accuracy of 95% accuracy. LSTM stands for long short-term memory and is an improvement over a normal RNN. The model not only classifies a given sentence as toxic or non-toxic but also gives the percentage of toxicity or non-toxicity of the given sentence.
1. INTRODUCTION In today’s current society, there is a big problem when it comes to online toxicity. Internet is an open discussing space for everyone to freely express their opinions. With the massive increase in social interactions on online social networks, there has also been an increase of hateful activities that exploit such infrastructure. This toxicity tends to negatively impact how a lot of people tend to engage in conversation and deters some from engaging in online conversation entirely.
D. Research on Text Classification Based on CNN and LSTM: In this paper a new model is developed by combining a LSTM and CNN which outperforms the performance of both the algorithms individually 2. 1 Summary of Related Work
As a result, online platforms tend to struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments. Harassment and abuse are discouraging people from sharing their ideas and disturbing the internet environment. Platforms struggle to effectively facilitate conversations, leading many communities to limit or completely shut down user comments if it’s toxic. Motivated by this problem, we want to build a technology that detects and classifies the derogatory comment. For our model, the input will be a comment. We will use three different models to predict scores for “toxicity” (which is our target), “severe toxicity”, “obscenity”, “identity attack”, “insult”, “threat”. Then Support Vector Machine (SVM) models are often used as baselines for other methods in text categorization .In this case Naive Bayes -SVM (NVBSVM) provides more accuracy for further classification.
© 2022, IRJET
|
Impact Factor value: 7.529
The summary of methods used in literature is given in Table 1.
|
Literature
MODEL
Accuracy
Sida Wang, Baselines and Bigrams: Simple, Good Sentiment and Topic Classification, 2020
NVB-SVM
87%
Hong Fu, Social Media Toxicity Classification Using Deep Learning ,2021
BERT
98%
ISO 9001:2008 Certified Journal
|
Page 3749