Machine Learning-Based Phishing Detection by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 10 Issue: 07 | July 2023

p-ISSN: 2395-0072

www.irjet.net

Machine Learning-Based Phishing Detection Mohammed Naif1, Allen Jeriel K2, Sneh Patil3, Ananya Rangaraju4, Peraka Divyanjali5, Kandukuri Pavan Sai Praveen6 ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Millions of users have been successfully

the dataset produced to predict phishing websites to train machine learning models and deep neural networks. In order to create a dataset from which the necessary URLand website content-based attributes may be extracted, both phishing and benign URLs of websites are collected. Each model's performance level is assessed and contrasted. Before feeding into the algorithms, the feature extraction will be carried out based on the address bar, domain-based, HTML, and Javascript extraction. In this project, machine learning methods like Decision Trees, Random Forests, Multilayer Perceptrons, XGBoost, and Support Vector Machines will be used. The models will be assessed, and accuracy will be a factor in the evaluation. Since XGBoost features built-in L1 (Lasso Regression) and L2 (Ridge Regression) regularisation, which prevents the model from overfitting, the algorithm should perform better in this situation. Additionally, XGBoost has the capacity to handle missing numbers right out of the box. When a node has a missing value, XGBoost attempts both the left and right hand splits and learns which one results in the highest loss for each node. The website's security is a vulnerability in this project because there isn't enough security on the site, which leaves it open to cyberattacks.

connected globally by the internet today, and as a result, users' reliance on this platform for data browsing, online transactions, and information downloads has grown. Cybersecurity is a term for a collection of technologies and procedures used to safeguard software and hardware against intrusion, harm, and attacks. DoS attacks, Man-inthe-Middle attacks, Phishing attacks, SQL Injection attacks, etc. are some of the most often seen cybersecurity threats. There has been an uptick in consumers losing access to their very sensitive and private information over the past few years. These days, fraudsters utilise such methods to trick their victims in an effort to steal personal information including their username, password, bank account information, and credit card information. Attacks against users are frequently delivered via spoofing emails, illegal websites, malware, etc. To handle complicated and massive amounts of data, a structured automated technique is necessary. The most common and effective approach that can be used to address this issue is machine learning, according to research. The most widely used machine learning methods include neural networks, decision trees, logistic regression, and support vector machines (SVM). A group of deep learning and machine learning models will be trained in this study to identify phishing websites.

1.1 Novelty Our suggested methodology, which considers not only the URL-based features of phishing websites but also their Domain-based features, as well as the HTML and Javascript based features during feature extraction, aims to reduce the False Positive Rate as well as the False Negative Rate and improve overall accuracy. The model may be trained to recognise phishing sites that substitute textual content with embedded objects like flash, java scripts, and HTML files by applying this additional set of features.

Key Words: Machine Learning, Cyber Security, Phishing, Neural Network, Website Security

1.INTRODUCTION The majority of our daily activities, including shopping and banking, have been moved online thanks to the network. A network that is unregulated or unprotected serves as a launchpad for a variety of cyberattacks, creating major security risks not only for networks but also for regular computer users, even seasoned ones. The users must be protected from these intrusions, which is crucial. Phishing website attacks are among the most frequent types of cyberattacks. A phishing website is a popular social engineering technique that imitates reliable URLs and web sites. The phisher targets unsuspecting web users through such attacks in an effort to deceive them into disclosing private information in order to use it fraudulently. Due to the end user's weakness, an attacker can even utilise new approaches to target those seasoned users who have already provided personal information while believing that the page is real. Software-based phishing detection solutions are therefore recommended as user decision support tools. Machine learning algorithms and techniques are used in this project. The goal of this research is to use

Impact Factor value: 8.226

2.ALGORITHM

Compile a dataset from open source platforms that includes both phishing and trustworthy websites.

Take the necessary information out of the URL database.

Utilise EDA techniques to analyse and pre-process the dataset.

Create training and testing sets from the dataset.

Run a few deep learning and machine learning algorithms, such as SVM, Random Forest, and Autoencoder.

ISO 9001:2008 Certified Journal

Page 1083