International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 9 Issue 05 | May 2022
www.irjet.net
p-ISSN: 2395-0072
Detection of Phishing Websites Prof. Sumedha Ayachit1, Pradnya Digole2, Parth Chaudhari3, Pratiksha Bhalerao4, Madhuri Aradwad5 1Professor,
Dept. Of Information Technology, JSCOE, Pune, Maharashtra, India Of Information Technology, JSCOE, Pune, Maharashtra, India Department of Information Technology, Jayawantrao Sawant College of Engineering, Pune ---------------------------------------------------------------------***--------------------------------------------------------------------2,3,4,5 Dept.
Abstract –
Additionally, if the URL itself is marked as a criminal URL, the user will be protected from that website. Python language, an open-source, with the help of a variety of libraries, easy-to-understand syntax, and many resources, proves to be the best way to use a learning machine. One way to do this is to check the URL listed in the so-called malicious websites for a reliable source. But the drawback of this method is that the list is incomplete that is, it grows every day. And, because of such a large list, the downtime of the system will always grow which can be frustrating for the user. Therefore, we use a web crawling method, in which the URL can be classified by Yara rules. It takes the web traffics, web content and Uniform Resource Locator (URL) as input features, based on these features classification of phishing and nonphishing websites are done. Companies have used powerful computers to filter supermarket scanner data and analyze market research reports for years. However, the continuous improvement of computer processing power, disk storage, and mathematical software significantly increase the accuracy of the analysis while reducing costs. The analysis and discovery of useful information from the World Wide Web pose a major challenge for local researchers. Such types of events for gaining valuable information by extracting data mining techniques are known as Web Mining. Web mining is the use of data mining techniques to automatically find and extract information from the web. The goal is to ensure safe browsing regardless of the website the user wishes to visit. Even if a user decides to visit a criminal website to steal sensitive information, steps will be taken to protect the user from harm.
The World Wide Web handles a large amount of data. The web doubles in size every six to ten months. World Wide Web helps anyone to download and download relevant data and important content for the website can be used in all fields. The website has become the main target of the attacker. Criminals are embedded in the content on web pages with the intent to commit atrocities. That audio content includes ads, as well as known and important userfriendly data. Whenever a user finds any information on a website and delivers audio content. Web mining is one of the mining technologies that create data with a large amount of web data to improve web service. Inexperienced users using the browser have no information about the domain of the page. Users may be tricked into giving out their personal information or downloading malicious data. Our goal is to create an extension that will act as middleware between users and malicious websites and reduce users' risk of allowing such websites. In addition, all harmful content cannot be fully collected as this is also a liability for further development. To counteract this, we use web crawling and break down the new content you see all the time into specific categories so that appropriate action can be taken. The problem of accessing criminal websites to steal sensitive information can be better solved by various strategies. Based on a comparison of different strategies, Yara’s rules seem to work much better. Key Words: YARA rules, Malicious website, Phishing URL, Web Crawling
2. Background
1. INTRODUCTION
Our work relates to research in the areas of heuristic web content detection, machine learning web content detection, and deep learning document classification. One body of web detection work focuses solely on using URL strings to detect malicious web content. [1] proposes a crawling web page for detecting malicious URLs. They focus on using manually defined features to maximize detection accuracy. [2] Also focuses on detecting malicious web content based on URLs, but whereas the first of these uses a manual feature engineering-based approach, the second shows that learning features from raw data with a deep neural network achieves better performance. [3]
Malicious web pages are those that contain content that can be used by attackers to exploit end-users. This includes web pages containing criminal URLs that steal sensitive information, spam URLs, JavaScript scripts for malware, Adware, and more. Today, it is very difficult to see such weaknesses because of the continuous development of new strategies. Moreover, not all users are aware of the different types of abusive attackers that can benefit from them. Therefore, if there is an accident on a web page, the user is unaware of it, this tool will help him to stay safe despite his lack of website knowledge.
© 2022, IRJET
|
Impact Factor value: 7.529
|
ISO 9001:2008 Certified Journal
|
Page 2362