International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 10 Issue: 05 | May 2023
p-ISSN: 2395-0072
www.irjet.net
DETECTION OF PHISHING WEBSITES USING MACHINE LEARNING Vedavyas J1, Bhupathiraju Deepthi2, Harini Hardageri3, G Veda Samhitha4, H Niveditha5 1Assistant Professor, Department of CSE, Ballari Institute of Technology & Management, Ballari
2,3,4,5Final Year Students, Department of CSE, Ballari Institute of Technology & Management, Ballari
---------------------------------------------------------------------***-------------------------------------------------------------------Abstract - Phishing is a technique widely employed to 2. LITERATURE REVIEW deceive unsuspecting people into exposing their personal information by means of fake websites. Phishing website URLs are made with the intention of collecting user data, such as usernames, passwords, and details of online financial activities. Phishers use websites that are semantically and visually identical to those authentic websites. By using antiphishing technologies to recognize phishing, we may inhibit the rapid evolution of phishing strategies caused by the rapid advancement of technology. To prevent phishing efforts, machine learning is used as a powerful tool. The four techniques used in this paper are AdaBoost Classifier, XGBoost Classifier, Random Forest Classifier, Gradient Boosting Classifier, and Support Vector Machine (SVM).
A thoughtful piece of writing known as a literature review communicates the information that is currently available, including significant findings and theoretical and methodological commitments to a particular subject. M. Somesha et al. investigated the architecture of a system that comprises of feature collection, feature picking, and classification procedures. A list of website URLs are used as input to the feature collector, and it pulls the required features from three sources (URL obscuring, anchoring text and other sources based). The obtained features are then fed into the IG attribute positioning algorithm. The proposed model's drawback is that it depends on external services, which means that if these services aren't available, work performance would be limited.Moreover, the suggested model could be unable to identify phishing websites that replace textual content with embedded objects [1].
Key Words: AdaBoost Classifier, XGBoost Classifier, Random Forest Classifier, Gradient Boosting Classifier and Support Vector Machine (SVM).
1. INTRODUCTION
A very successful phishing website detection model (OFS-NN) based on neural network technology and an appropriate feature selection method was addressed by Erzhou Zhu et al. A sign termed feature validity value (FVV) has been constructed in this suggested model to evaluate the effects of each of those parameters on the identification of such websites. An algorithm is now being built to find the best features on the phishing attacks based on this recently created sign. The issue with the neural network greatly lessened by the selected strategy. The issue of the neural network's over-fitting will be greatly reduced by the chosen algorithm. The neural network is trained using these ideal attributes to create an ideal classifier that can identify phishing URLs. Nevertheless, the problem is that the OFS must continually gather additional features due to the expanding number of features that are vulnerable to phishing attempts [2].
Phishing is the risky illegal behavior in online world. Phishing efforts have considerably increased over many years as many people utilize the online services given by governmental and private institutions. When con artists discovered a profitable business model, they did so. Phishers use a range of methods to target the unwary, including voice over IP (VOIP), messaging, spoof links, and fake websites. It's simple to create a fraud website that is similar to real website, but it's not. Even the information on these websites would be identical to that on the genuine versions. The target of these websites is to gather user information, such as account numbers, login credentials, debit and credit card passwords, etc. Attackers also pose as high-level security measures and ask users to respond to security questions. Those who respond to those inquiries are more likely to fall victim to phishing scams. Many investigations have been done to stop phishing assaults by various groups throughout the world. By identifying the websites and educating people to recognize phishing websites, phishing assaults can be stopped. Machine learning techniques are the best ways to spot phishing websites.
Derek Doran and Mahdieh Zabihimayvan talked about the Fuzzy Rough Set (FRS) theory, which was developed into a tool that selects the best features from a small number of standardized datasets. Afterwards, a few classifiers receive these features in order to detect phishing. A dataset of 14,500 website models is used to train the models in order to examine the feature identification for FRS in developing a common detection of phishing. Nevertheless, disadvantage is that the method's unique properties are not stated [3].
One of the key techniques assisting artificial intelligence is machine learning (AI). It is founded on algorithms designed to comprehend and recognize patterns from massive amounts of data to build a system that can forecast anomalous behavior and occurrences. It changes over time as it picks up on typical behavioral tendencies.
© 2023, IRJET
|
Impact Factor value: 8.226
|
ISO 9001:2008 Certified Journal
|
Page 436