Skip to main content

Detecting spam mail using machine learning algorithm

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 05 | May 2022

p-ISSN: 2395-0072

www.irjet.net

Detecting spam mail using machine learning algorithm Muddala Bhavani1, Machetti Bala Santoshi2, Ummidi Anu Pravallika3 ,Nuni Madhav4 1234Final

Year B.Tech, CSE, Sanketika Vidya Parishad Engineering College, Visakhapatnam, A.P, India Guided by: Mrs. Dr. K.N.S. Lakshmi, Professor, SVPEC, Visakhapatnam, A.P, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Spam emails defined as unrequested and

unwanted commercialized emails or deceptive emails received by a specific person or a company. Some of the Spam identified through natural language processing and machine learning methodologies. ML (machine learning) methods are used to render spam classifying emails to either valid messages or unwanted messages by applying of Machine Learning classifiers The proposed work useful for differentiating features of the content of documents .Huge work that has been applied in the area of spam filtering which is restricted to some domains. Research on spam email detection either focuses on natural language processing methodologies on single machine learning algorithms or one natural language processing technique on multiple machine learning algorithms. In this Project,a model-based approaches is developed to review the machine learning methodologies used for automatic spam detection.

Key Words: NLP, Feature Selection, spam detection. 1. INTRODUCTION E-mails square measure used everybody ,they additionally keep company with unessential ,undesirable bulk mails ,that are referred to as Spam Mails [15].Anyone with access to the net will receive spam on their devices .Email system is one amongst the foremost effective and usually used sources of communication .The rationale of the recognition of email system lies in its value effective and quicker communication nature .sadly, email system is obtaining vulnerable by spam emails .Spam emails square measure the uninvited emails sent by some unwanted users additionally referred to as spammers[1] with the motive of creating cash .The e-mail users pay most of their valuable time in sorting these spam mails[2] .Multiple copies of same message square measure sent associate degreed again over and over however additionally irritates the receiving user .Spam emails don't seem to be solely intrusive the user’s emails however they're additionally manufacturing great amount of unwanted knowledge and therefore touching the network’s capability and usage .In this paper ,a Spam Mail Detection (SMD) system is planned which can classify email knowledge into spam and ham emails. The method of spam filtering focuses on 3 main levels: the e-mail address, subject and content of the message[4] .All mails have a standard structure i.e. subject of the e-mail and therefore the body of the e-mail. A typical spam mail may be classified by filtering its content .The © 2022, IRJET

|

Impact Factor value: 7.529

|

method of spam mail detection relies on the belief that the content of the spam mail is totally different than the legitimate or ham mail .as an example words associated with the packaging of any product, endorsement of services ,qualitative analysis connected content etc. The method of spam email detection may be broadly speaking categorized into 2 approaches: information engineering and machine learning approach[5] . Knowledge engineering is a network-based approach in which IP (internet protocol) address ,network address along with some set of defined rules are considered for the email classification. The approach has shown promising results however it’s terribly overwhelming. The upkeep and task of change rules isn't convenient for all users. On the opposite hand, machine learning approach doesn't involve any set of rules and is economical than information engineering approach [6].The classification algorithmic rule classifies the email supported the content and alternative attributes. For many of the classification issues the method of feature extraction and choice is extremely necessary. Options play an important role within the method of classification .During this paper, a correlation primarily based feature choice(CFS) [7]method is employed for feature extraction. The CFS approach extracts the simplest options from the pool of options for economical classification results. The planned spam mail detection system is impressed from the effectiveness of machine learning approach. . In spam mail detection system, at the start email information is collected. the e-mail information collected is raw and unstructured in nature. so as to scale back the computations and get correct results, email information must be preprocessed. the info is pre-processed by removing stop words, stemming and word tokenization is additionally performed to acquire valuable info. Then, CFS primarily based i.e. correlation primarily feature choice is performed to induce the simple selected options from the pool of options. The pre-processing step reduces the spatial property of knowledge and options within the sort of bag of words area unit then extracted. For the classification a bagged hybrid approach (which is combination of Naïve scientist classifier and J48) is used therefore on produce the classification stronger and extra correct. Spam emails unit capable of filling up inboxes or storage capacities, deteriorating the speed of the net to a wonderful extent. These emails have the ability of corrupting one’s system by importing viruses into it, or steal useful data and scam gullible of us. The identification of spam emails

ISO 9001:2008 Certified Journal

|

Page 1365


Turn static files into dynamic content formats.

Create a flipbook
Detecting spam mail using machine learning algorithm by IRJET Journal - Issuu