AI-Driven Phishing URL Detection by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 08 | Aug 2024

p-ISSN: 2395-0072

www.irjet.net

AI-Driven Phishing URL Detection Ankit Das1, Anushka Behere2 1 Student, Dept. of Computer Science Engineering (Cyber Security), Thakur College of Engineering and Technology,

Mumbai, Maharashtra

2Student, Dept. of Computer Science Engineering (Cyber Security), Thakur College of Engineering and Technology,

Mumbai, Maharashtra ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Phishing attempts are getting more complex and

URLs. Since obfuscation-based features have been widely used for phishing attacks [2,3], we also study the effect of the obfuscation techniques on different type of malicious URLs to determine which attack type is mostly affected with what kind of obfuscation technique.

employ questionable URLs to trick people into giving out sensitive information. This study investigates how well AI and machine learning can identify various phishing efforts. We evaluated the efficacy of three machine learning models— XGBoost, SGD, and AdaBoost—in identifying malicious URLs by examining a UC Irvine dataset and examining characteristics such URL length, special characters, HTTPS usage, and the existence of suspicious keywords. XGBoost outperformed AdaBoost and SGD, according to our results, with the maximum accuracy of 99.95%. This illustrates how sophisticated machine learning techniques may be used to improve the identification of phishing attempts and emphasizes the necessity of ongoing model adaption and improvement in order to combat changing cyberthreats.

The number of attacks has significantly increased as a result of this change in phishing strategies. Over 255 million phishing attacks were reported, according to a thorough analysis by SlashNext that examined billions of link-based URLs, attachments, and natural language messages via email, mobile, and browser channels. Unbelievably, since 2021, the frequency of these attacks has increased by 61% [4]. These results highlight a crucial point: the increasingly complex strategies used by hackers can no longer be defeated by outdated security solutions like firewalls, secure email gateways, and proxy servers. To further complicate detection attempts, attackers are now starting their assaults from trusted platforms, such as personal and professional messaging apps, in addition to compromised servers.

Key Words: Phishing Attacks, Cybersecurity, AI, Machine Learning, XGBoost, SGD, AdaBoost, Feature Extraction, Accuracy, Precision, Recall, Adversarial Attacks.

1.INTRODUCTION

The difficulty of recognising fraudulent URLs is made more difficult by the ongoing development of phishing techniques. Cybercriminals constantly modify their tactics to evade detection, making it challenging for current security measures to stay up to date. Retrieving relevant data from URLs, such as length, the existence of particular protocols (HTTP/HTTPS), and the quantity of special characters, is necessary for effective detection [5,6]. However, to further impede the detecting process, attackers use techniques like URL obfuscation and the malicious exploitation of reliable websites. Furthermore, real-time detection systems have scalability issues, particularly for organisations with limited infrastructure, because they require a significant amount of computational power to evaluate URLs and the material that goes along with them.

The internet's rapid expansion has completely changed how individuals obtain services and information, providing previously unheard-of levels of convenience and connectedness. However, the rise in digitalization has also given rise to a number of cyberthreats, the most prevalent and harmful of which is phishing. Phishing is a deceptive technique used by cybercriminals to pretend to be reputable companies and trick victims into divulging private information such as passwords, credit card details, and other sensitive data. Phishing assaults have changed throughout time, become more complex and difficult to identify. Phishing attacks have evolved into new, sneakier forms in recent years, such as manipulating and changing website URLs. Attackers now use sophisticated techniques to produce spoof URLs that are almost identical to real ones, fooling even the most watchful internet users. Heuristicbased technique in [1] can identify newly created malicious web-sites in real-time by using signatures of known attack payloads. However, this approach would fail to detect novel attacks that result in zero-day exploits and signature detection is often evaded by attackers using change in patterns and obfuscation techniques. Obfuscation techniques used by the attacker to evade static detection in malicious

Impact Factor value: 8.315

The incorporation of artificial intelligence (AI) and machine learning (ML) into cybersecurity has become a viable approach to improve phishing detection skills in response to these issues. Large-scale datasets are used by AI-driven systems to find patterns and anomalies suggestive of phishing attempts, greatly increasing detection accuracy [7]. With the help of these technologies, one may keep one step ahead of cybercriminals by constantly learning about and adjusting to new threats.

ISO 9001:2008 Certified Journal

Page 725