Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 10 | Oct 2025

p-ISSN: 2395-0072

PhishGuard

www.irjet.net

AI: A Framework for Multilingual Detection using XLM-RoBERTa & mT5

Phishing

J Supreetha Sri 1, Nishika Timilsina2, Kundan Jha3, Rahul Trivedi4, Saismaran Revinipati5, Sannidhya Ghosh6 ¹National Model Senior Secondary School, 14-B, Kalluri Nagar, Peelamedu, Coimbatore -641016, India 2 Kathmandu Model Secondary School, Bagbazar, Kathmandu 44600, Nepal 3 Chirec International, 1-55/12, Botanical Garden Rd, Sri Ram Nagar, Kondapur, Telangana 500084, India 4 S.N.Kansagra school, University Rd, opposite Akashwani Quarter, Panchayat Nagar, Rajkot, Gujarat 360005, India 5 Chirec International, 1-55/12, Botanical Garden Rd, Sri Ram Nagar, Kondapur, Telangana 500084, India 6 Phoenix Greens School Of Learning, Kokapet,Hyderabad,Telangana 500075, India ----------------------------------------------------------------------------***------------------------------------------------------------------------------

At the same time new opportunities for cyber threats, especially phishing scams, are created [1]. In these scams, individuals are tricked into providing personal information, downloading harmful software, or making financial mistakes. In 2024 alone, global financial losses caused by scams were estimated to exceed $1.03 trillion [2]. In addition to financial impact, risks to identity and personal privacy have increased significantly[3].

Abstract -

Phishing attacks, which frequently trick users with extremely complex messages, continue to be a serious threat to digital communication platforms. The creation of PhishGuard AI, a multilingual, AI-powered phishing detection system that combines an user-friendly Chrome Extension with a transformer-based language model, is presented in this study. A lightweight frontend for real-time user interaction and a FastAPI backend housing optimised models make up the system architecture. To train XLM-RoBERTa for binary classification, two publicly accessible datasets, the SMS Spam Collection and a Phishing Email dataset, were selected and preprocessed. mT5 was used to generate multilingual explanations. Along with a unique preprocessing workflow that included tokenization, padding, and data cleaning, a strong training pipeline that made use of Hugging Face's Trainer API and Adam Woptimiser was employed. Model performance on imbalanced datasets was evaluated using evaluation metrics like accuracy, precision, recall, F1-score, and ROC-AUC. Data encryption, local storage protocols, and secure API design were among the ethical measures incorporated. In general, PhishGuard AI improves user protection and digital literacy by correctly detecting phishing attempts and informing users with concise, relevant explanations.

Traditionally, protection against such threats has been provided through spam filters and blacklist-based systems. While these tools can provide some level of protection, their effectiveness has steadily reduced as phishing techniques have evolved [4]. Messages are now crafted to mimic trusted brands and use convincing language to bypass conventional filters. Because of this, people are frequently left vulnerable to these [5]. To prevent this, more intelligent and advanced systems which can identify subtle differences in language and behaviour are required. Among the most affected groups are teenagers and seniors [6]. In many cases teens are exposed to new digital platforms before they are aware enough to detect scams. Seniors, though often cautious, may not be familiar with the latest online communication trends [7]. Hence, the ability to recognize scams is limited for both groups. This highlights the need for tools that not only detect threats but also educate users so that they can develop safer habits [8].

Keywords - Cybersecurity, Machine Learning, Natural Language Processing (NLP), Neural Networks, Phishing Detection, XLM-RoBERTa

To meet this need, an AI-powered tool was developed to detect phishing and scam messages [9]. To increase the educational value, rather than simply marking a message as phishing, the tool also provides an insight into the reasoning behind its classification to help users avoid similar threats in the future [10]. This solution works as an user-friendly browser extension, integrating directly with popular email platforms to offer immediate protection and

1. INTRODUCTION With the integration of digital technologies into our everyday lives, greater convenience and constant connectivity have been enabled.

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 417