Email Spam Detection Using Machine Learning

Page 1

International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 11 | Nov 2022

www.irjet.net

e-ISSN: 2395-0056 p-ISSN: 2395-0072

Email Spam Detection Using Machine Learning Prof. Prachi Nilekar, Tamboli Abdul Salam, Manish Kumar Gupta, Krishna Sharma, Safwan Attar ALARD COLLEGE OF ENGINEERING & MANAGEMENT (ALARD Knowledge Park, Survey No. 50, Marunje, Near Rajiv Gandhi IT Park, Hinjewadi, Pune-411057) Approved by AICTE. Recognized by DTE. NAAC Accredited. Affiliated to SPPU (Pune University). ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract – Nowadays, Email spam has become a big

is used to fit the model, and the test dataset is used to evaluate the model.

problem, with the fast growth of internet users, email spams are also increasing. People are using them for phishing, illegal and unethical practices and frauds. Sending malicious links through spam emails that can harm for our system and may also get into your system. It is very simple for spammers to create a fake profile and email account, they show like a real person in their spam emails, these spammers simply target people who are not aware of these frauds. then there is a need to identify those spam mails which are frauds, this project will identifies those spams using techniques of machine learning, this paper will discuss machine learning algorithm's and apply all these algorithm's to our dataset. it select the best algorithm, for this project algorithm will be chosen based on the best accuracy and precision in email spam detecting.

Fig -1: Train and Test Model Machine learning algorithms used to classify the text into two different categories, spam and ham. The algorithm will predict the score more accurately. The objective of developing this model is to detect and score word faster and accurately.

Key Words: (Machine Learning, Naive Bayes, Support Vector Machine, DTS, Random Forest, Bagging, Boosting)

1. INTRODUCTION

2. MACHINE ALGORITHMS

Machine learning approaches are more efficient, a set of training data is used, these samples are the set of email which are pre classified. Machine learning approaches have a lot of algorithms that can be used for email filtering, these algorithms are “Naive Bayes, support vector machines, Neural Networks, K-nearest neighbor, Random Forests, etc.”

Naive Bayes: Naive Bayes is a classification algorithm suitable for both binary and multiclass classification. Naive Bayes performs better for categorical input variables than for numerical variables. It is useful for making predictions based on historical results and forecast data.

Why Machine Learning: Machine learning allows the user to feed a computer algorithm an immense amount of data and have the computer analyze and make data-driven recommendations and decisions based on only the input data.

P(B) is Marginal Probability: Probability of Evidence. Support Vector Machine: SVMs are used in intrusion detection, face detection, email classification, gene classification, web pages, etc. It can handle classification and regression on linear and non-linear data.

What is Train and Test datasets: The main difference between training data and test data is that training data is the subset of original data that is used to train a machine learning model, whereas test data is used to check the accuracy of the model. The training dataset is usually larger in size than the test dataset. Train and test dataset are two key concepts in machine learning, where the training dataset

|

Impact Factor value: 7.529

CLASSIFICATION

P(A) is Prior Probability: The possibility of a hypothesis before seeing the evidence.

What is DATASET: Dataset is a collection of data or related information that is composed for separate elements. A collection of datasets for e-mail spam contains spam and non-spam messages.

© 2022, IRJET

LEARNING

|

ISO 9001:2008 Certified Journal

|

Page 735


Turn static files into dynamic content formats.

Create a flipbook