Issuu

International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 05 | May 2022

www.irjet.net

e-ISSN: 2395-0056 p-ISSN: 2395-0072

Phishing Detection Using Machine Learning Based on URL’s Ritika Verma1, Muskaan Singh2, Aarti Goswami3 1,2,3Dept.

of CSE Engineering, MIET college, UP, India -------------------------------------------------------------------------***------------------------------------------------------------------------

Abstract- Phishing is described as gaining private information from a user via hacking into an affiliate’s website. To combat phishing, a variety of strategies have been offered. This menace, however, cannot be eliminated by a single miraculous bullet. Data mining is an effective method for detecting phishing assaults. An intelligent approach to identifying phishing attempts is shown in this article. We employ a variety of data mining techniques to classify websites as real or fraudulent. To construct an accurate intelligent phishing analysis system, many categories are employed. The performance of data mining approaches was assessed using classification accuracy, ROC (area under receiver) operating characteristic (AUC) curves, and F-size. The results demonstrate that Random Forest performs the best among the categorization algorithms, with a 97.36 percent accuracy rate. Random forest algorithm is very fast and can handle various phishinganalysis sites. 1. Introduction A phishing URL is created to get the personal data of the user, such as usernames and passwords, or to attack or send some malicious data to the user’s system. Ideally, the attacker manipulates the user to click the links and get sensitive information. Phishers can clone the legitimate link data to trick the user into filling in the sensitive information. Phishing links can be used to get a user’s confidential details also. This is a very difficult condition for users. The phisher can misuse it for personal gain. According to a survey, phishing attacks are increasing day by day. Therefore, a lot of effort has been made in this area to minimize these phishing attacks. By viewing the content of a website or web page, or using URL metadata, we can determine if the site is a phishing site or not. In our project, we deal with website URL metadata, whether it is a phishing site or not. By using metadata in the URL, we no longer need to attempt phishing websites or download any of their content, making it much more secure access. We can look into certain parts of the URL, such as the number of slashes, keywords in part of the URL path, etc. After getting the necessary information, we only need information data about a series of URLs to be classified using some algorithms. In our project, we used the Support Vector Machine (SVM) and Random Forest algorithms. 2. Methodology We used machine learning in our project to deal with the phishing attack problem. Since we have a large amount of data about phishing attack patterns, it can be a good application of the machine learning approach. Our idea is to use basic ML algorithms on the pre-defined dataset to deal with phishing detection in real-time. Since we aimed to deploy the model in real-time we decided to create a web extension with the help of JavaScript. Also, we deployed the ML model into a chatbot built using python so that, a person can also detect phishing by sending a URL to the bot. For building the model that can be deployed in real-time, we focused on three parameters; first of all, to choose a dataset and train the model in such a way that its accuracy must be high so that end-user won’t get false results. Secondly, since we want real-time protection we need to choose the algorithm such that it won't take a longer time to execute and the user must not wait for a longer time for getting the result. And lastly, our dataset must have false positive (the website thatseems like phishing but they aren't) and true positive (really phishing sites) URL data in it, so that the ML model can be trained very well and end-user can get true results.

Fig 1: Phishing Detection

Impact Factor value: 7.529

ISO 9001:2008 Certified Journal

Page 208