Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 12 | Dec 2024

p-ISSN: 2395-0072

www.irjet.net

Comprehensive Analysis of Methods for Detecting Malicious URLs and Classifying Harmful Content on Social Media Platforms Ritwik Sinha1, Rahul Gupta2 1M.Tech. (CSE) Scholar, Department of Computer Science and Engineering, S. R. Institute of Management and

Technology Lucknow, Uttar Pradesh, India

2Assistant Professor, Department of Computer Science and Engineering, S. R. Institute of Management and

Technology Lucknow, Uttar Pradesh, India ---------------------------------------------------------------------***--------------------------------------------------------------------urgent. This paper aims to provide a comprehensive Abstract - The rapid rise of social media has made it a review of the procedures and methodologies employed to detect malicious URLs and classify community posts on social media websites.

focal point for communication, commerce, and entertainment, but it has also become a target for malicious activities, including phishing attacks, malware distribution, and the spread of misinformation. One of the key aspects of mitigating these risks is the identification of malicious URLs and the classification of community posts that could potentially harm users or violate platform policies. This paper provides an in-depth analysis of the procedures used to detect malicious URLs and classify community posts on social media platforms. It reviews the challenges involved, existing methodologies, and emerging technologies in the areas of machine learning, natural language processing (NLP), and cyber security techniques. The paper also discusses the effectiveness, limitations, and future directions of these approaches.

2. Literature Review Abbas and Martin (2020) provide a comprehensive review of phishing detection techniques, focusing on URL-based approaches. They categorize phishing detection techniques into two main types: signature-based and anomaly-based methods. Signature-based methods compare incoming URLs with a known list of malicious URLs, while anomaly-based methods detect unusual URL patterns. The review also explores machine learning techniques like decision trees, support vector machines (SVMs), and random forests for more sophisticated detection. They conclude that while signature-based methods are fast, anomaly-based and machine learning techniques offer better detection rates for novel phishing URLs [1].

Key Words: Malicious URLs, Harmful Content, Social Media Platforms, natural language processing (NLP), and cyber security techniques etc.

1 .INTRODUCTION

Lee and Lee (2021) conduct an extensive survey on content moderation strategies used in social media platforms. They examine traditional content filtering techniques such as keyword-based detection, as well as machine learning approaches, including supervised and unsupervised learning. The paper highlights the increasing importance of context-aware systems, especially in detecting hate speech and abusive language. The authors suggest hybrid methods, combining rule-based techniques with machine learning, to achieve better accuracy and scalability. They also discuss challenges such as handling multilingual content and real-time moderation [2].

Social media platforms, such as Facebook, Twitter, Instagram, and TikTok, have become pivotal in shaping modern communication. While they offer immense benefits in terms of connectivity and information dissemination, they also come with significant risks. Among the most dangerous threats are malicious URLs and harmful posts, which can lead to security breaches, social manipulation, and physical harm. Malicious URLs are often used for phishing attacks, malware distribution, and fraud. They are typically disguised as legitimate links, making them difficult to identify by users. Social media sites, with their open architecture, are especially vulnerable to such threats. On the other hand, community posts—which include text, images, and videos—can harbor harmful content like hate speech, cyberbullying, misinformation, and spam, all of which can cause societal harm.

Zhao et al. (2022) propose a deep learning-based approach for detecting phishing websites, specifically using convolutional neural networks (CNNs). They develop a model that learns from URL features like domain name, URL length, and character patterns. The paper demonstrates how deep learning models outperform traditional methods, such as decision trees and SVMs, in terms of accuracy and generalization to new phishing techniques. The authors also discuss the trade-

As the scale of social media platforms grows, the need for efficient and accurate detection methods for both malicious URLs and harmful posts has become even more

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 374