Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 08 | Aug 2025

p-ISSN: 2395-0072

www.irjet.net

SMART PHISHING URL DETECTION SYSTEM USING MACHINE LEARNING Anwesha Sahoo¹, Bidipta Das², Ankita Mukherjee³, Monidipa Ghoshal⁴, Anubhab Chattopadhyay⁵, Dr. Krishna Bhowal⁶ ¹²³´µB.Tech Student, Dept. of CSE, Academy of Technology, Adisaptagram, India ¶Associate Professor, Dept. of CSE, Academy of Technology, Adisaptagram, India ---------------------------------------------------------------------***--------------------------------------------------------------------1.2 Objectives and Scope Abstract - Phishing is one of the most widespread and damaging forms of cybercrime today. With the increase in the use of digital services, attackers are increasingly creating deceptive websites to trick users into sharing sensitive data. This paper proposes a Smart Phishing URL Detection System using machine learning techniques. The system extracts various features from a URL and uses a trained classification model to detect whether the URL is phishing or legitimate. The model was trained on a publicly available dataset and implemented in a Flask-based web application, allowing real-time detection through a userfriendly interface. Experimental results show high accuracy, making this solution effective for practical applications.

This project aims to develop a smart phishing URL detection system using machine learning, capable of classifying URLs as phishing or legitimate. The solution is built as a web-based interface where users can input any suspicious URL and receive real-time predictions, backed by a robust model trained on URL-based features.

2. LITERATURE REVIEW Machine learning has been used extensively for phishing detection over the past decade. Researchers have implemented techniques ranging from blacklists and content-based filters to advanced ensemble classifiers.

Key Words- Phishing Detection, Machine Learning, URL Classification, Flask Web App, Gradient Boosting Classifier.

2.1 Traditional Approaches

1. INTRODUCTION

Traditional detection systems rely on static blacklists or heuristic rule engines. However, these are reactive in nature and can be easily bypassed by attackers who slightly alter the URL structure or domain names.

Phishing is a social engineering technique that deceives users into revealing personal and confidential information by mimicking legitimate websites. Traditional approaches, like blacklists or browser filters, are often inadequate due to the dynamic nature of phishing URLs. Hence, we propose a machine learning-based approach to identify phishing websites by analyzing URL features and predicting malicious intent.

2.2 Machine Learning for Phishing Detection ML techniques such as decision trees, random forests, and gradient boosting classifiers have shown improved accuracy in detecting phishing attempts. These models analyze several features from the URL structure, domain information, and presence of special symbols.Some existing studies have implemented Random Forest or Decision Tree models for phishing detection but have not integrated them into real-time systems. Our work enhances this by providing a working web application that makes detection accessible to non-technical users.

1.1 Background and Need for Automation The rising dependence on online platforms for banking, shopping, and communication has led to a dramatic increase in phishing attacks. Static blacklist-based detection systems are reactive in nature and often fail to keep up with new threats. Therefore, there is a need for dynamic, real-time, and intelligent phishing detection systems. According to the Anti-Phishing Working Group (APWG), over 1 million phishing attacks were recorded in 2023 alone, indicating a sharp rise in cyber threats. Given the dynamic nature of such attacks, a machine learningdriven approach provides a scalable and intelligent defense mechanism.

Impact Factor value: 8.315

Table -1: Comparison of ML Models for Phishing Detection Model GBC(Gradi ent Boosting Classifier)

Accuracy( %) 96.1

Precision( %) 95.9

ISO 9001:2008 Certified Journal

Recall( %) 96.2

Referenc e Sahoo et al.(2017)

Page 111