Issuu

sustainability Article

How to Detect Online Hate towards Migrants and Refugees? Developing and Evaluating a Classifier of Racist and Xenophobic Hate Speech Using Shallow and Deep Learning Carlos Arcila-Calderón 1, * , Javier J. Amores 1 , Patricia Sánchez-Holgado 1 , Lazaros Vrysis 2 , Nikolaos Vryzas 2 and Martín Oller Alonso 3 1 2

Citation: Arcila-Calderón, C.; Amores, J.J.; Sánchez-Holgado, P.; Vrysis, L.; Vryzas, N.; Oller Alonso, M. How to Detect Online Hate towards Migrants and Refugees? Developing and Evaluating a Classifier of Racist and Xenophobic Hate Speech Using Shallow and Deep Learning. Sustainability 2022, 14, 13094. https://doi.org/10.3390/ su142013094 Academic Editors: Stefano Ruggieri and Alessia Passanisi Received: 19 September 2022 Accepted: 11 October 2022

Facultad de Ciencias Sociales, Campus Unamuno, University of Salamanca, 37007 Salamanca, Spain Multidisciplinary Media & Mediated Communication Research Group (M3C), Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece Department of Social and Political Sciences, Università degli Studi di Milano, 20122 Milano, Italy Correspondence: carcila@usal.es

Abstract: Hate speech spreading online is a matter of growing concern since social media allows for its rapid, uncontrolled, and massive dissemination. For this reason, several researchers are already working on the development of prototypes that allow for the detection of cyberhate automatically and on a large scale. However, most of them are developed to detect hate only in English, and very few focus specifically on racism and xenophobia, the category of discrimination in which the most hate crimes are recorded each year. In addition, ad hoc datasets manually generated by several trained coders are rarely used in the development of these prototypes since almost all researchers use already available datasets. The objective of this research is to overcome the limitations of those previous works by developing and evaluating classification models capable of detecting racist and/or xenophobic hate speech being spread online, first in Spanish, and later in Greek and Italian. In the development of these prototypes, three differentiated machine learning strategies are tested. First, various traditional shallow learning algorithms are used. Second, deep learning is used, specifically, an ad hoc developed RNN model. Finally, a BERT-based model is developed in which transformers and neural networks are used. The results confirm that deep learning strategies perform better in detecting anti-immigration hate speech online. It is for this reason that the deep architectures were the ones finally improved and tested for hate speech detection in Greek and Italian and in multisource. The results of this study represent an advance in the scientific literature in this field of research, since up to now, no online anti-immigration hate detectors had been tested in these languages and using this type of deep architecture. Keywords: hate speech; racism; xenophobia; migration; social media; deep learning

Published: 13 October 2022 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in

1. Introduction

published maps and institutional affil-

Violent speech is not an exclusive communicational dysfunction of our contemporary societies, but it is today, when it seems more worrying than ever due to its massive diffusion on digital platforms. The internet and information and communication technologies have today allowed online hate speech to increase unabated. In this new context, social media has become the forum in which this type of message spreads more quickly and uncontrollably, as evidenced by the latest reports published by the Anti-Defamation League [1,2]. This growth in online hate speech also coincides with an unstoppable increase in registered hate crimes in Europe [3], which could evidence the correlation between both phenomena pointed out by Müller and Schwarz [4]. Moreover, if this connection is so, since most of the hate crimes committed in Europe are due to racist and/or xenophobic reasons (according to the data collected by the OSCE’s hate crime reporting), we could affirm that most of the increasing hate speech that is spread online is based on this type of discrimination

iations.

Copyright: © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Sustainability 2022, 14, 13094. https://doi.org/10.3390/su142013094

https://www.mdpi.com/journal/sustainability