Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 09 | Sep 2025

p-ISSN: 2395-0072

www.irjet.net

Comparative Evaluation of Supervised Machine Learning Techniques for Hate Speech Detection on a Curated Dataset Nisar Ahmad Kangoo1@ 1 Higher Education Department, Union Territory of Jammu and Kashmir

-------------------------------------------------------------------------***------------------------------------------------------------------------Hate speech exists within many forms that vary from Abstraction - This research employs a large dataset titled verbal and nonverbal expressions, symbolic expressions to “A Curated Dataset for Hate Speech Detection on Social its dissemination through veiled language, making the Media Text”, hosted in the Mendeley Data repository, to

phenomena challenging to identify and respond to. For example, the social media domain is often characterized by more covert forms of hate depending more on implication and insinuation than clear-cut aggressive or direct hostility, which makes confronting these more difficult to effectively deal with.

detect hate speech and analyze its frequency and traits. The dataset contains 451,709 text samples: 371,452 labeled as non-hateful and 80,250 as hateful. Our analysis combines machine learning techniques with natural language processing methods. We find that a substantial portion of the hateful content is directed at demographic groups. Additionally, we identify common themes and linguistic patterns associated with hate speech. This work contributes to ongoing efforts to combat online hate, and holds implications for community moderation and content governance.

Aim: The aim of this research paper is to rigorously assess multiple methodologies which involve supervised machine learning in an extensive amount of data so as to determine which of the algorithmic methodologies are most effective for detection and the analysis of hate speech in digital communication. By taking steps to identify and then remove hate speech on social networking sites and other online communities, we can play an important role in maintaining religious, regional, cultural and social peace on a global scale and thus lead to a more inclusive and respectful online discourse.

Keywords: Hate speech, machine learning, NLP, online content moderation, text analysis

1. INTRODUCTION Hate speech is defined as any speech that degrades, discriminates against, or ultimately incites specific violence on a person or group of people based on characteristics like race, color, creed, gender identity, sexual preference, national origin, or any other salient group identifier that defines a person. In this sense, the phenomenon embodies a pervasive and sharply polarizing one in the context of a contemporary world society where communication extends beyond the borders of territory and culture. Hate speech, which is essentially based on hostility and discrimination, is highly harmful to the social fabric, a serious attack on the basic principles of free speech, and can cause serious damage to the identities and rights of the groups and individuals to whom it is directed, perpetuating existing practices in discrimination and social exclusion.

2. LITERATURE REVIEW So far, numerous scholars and researchers from the academic community have focused their efforts and expertise on the complex and challenging problem of hate speech detection, which continues to be one of the most critical problems in the modern-day digital communications. These industrious researchers have scrupulously analyzed and researched a wide range of different information that has been systematically gathered from major social media platforms including YouTube, Twitter-now known as X-, MySpace, Wikipedia, Usenet, Instagram as well as Facebook, so as to provide a holistic approach to the phenomenon they are investigating. The preparation of the summary and the conclusion from the previous research papers which is a rich source of information on this topic are presented in Table 1 below this text. Additionally, the associated literature also provides important perspectives on the identification of hate speech in other languages beyond English which is the dominant language used thereby establishing the existence of this important problem worldwide.

Having a complete understanding of what hate speech is, how it manifests, and the complex consequences of it-at both a personal level and over the wider social context-is extremely important to creating an environment that supports diversity, mutual respect and inclusion in an increasingly digital and interconnected global community.

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 224