Overview of Anti-spam filtering Techniques

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 01 | Jan -2017

p-ISSN: 2395-0072

www.irjet.net

Overview of Anti-spam filtering Techniques Sushma L.Wakchaure, Shailaja D.Pawar,Ganesh D.Ghuge ,Bipin B.Shinde Amrutvahini Polytechnic, Sangamner Professor, Dept. of Computer Engineering, Amrutvahini Polytechnic College, Maharashtra, India. ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Electronic mail (E-mail) is an essential

SpamGuru, a collaborative anti-spam filter that combines several learning, tokenization, and user interface elements to provide enterprise-wide spam protection with high spam detection rates and low false-positive rates. E-mail or electronic mail is an electronic messaging system that transmits messages across computer networks. Users simply type in the message, add the recipient’s e-mail address (es) and click the send button. Users can access any free e-mail service such as Yahoo mail, Gmail, Hotmail, or register with ISPs (Internet Service Providers) in order to obtain an e-mail account at no cost except for the Internet connection charges. Besides that, e-mail can be also received almost immediately by the recipient once it is sent out. E-mail allows users to communicate with each other at a low cost as well as provides an efficient mail delivery system. The reliability, user-friendliness and availability of a wide range of free e-mail services make it most popular and a preferred communication tool. As such, businesses and individual users alike rely heavily on this communication tool to share information and knowledge. Businesses can drastically cut down on communication cost since e-mail is extremely fast and inexpensive; furthermore it is a very powerful marketing tool. Businesses can capitalize from this technology since it is a very popular advertising tool. However, the simplicity of *Corresponding author. E-mail: alaa_taqa@um.edu.my. Sending e-mail and the almost nonexistent cost poses another problem: Spam. Spam refers to bulk unsolicited commercial e-mail sent indiscriminately to users. Table 1 enumerates some of them. Based on the Ferris Research (2009), spam can be categorized into the following: 1. Health; such as fake pharmaceuticals; 2. Promotional products; such as fake fashion items (for example, watches); 3. Adult content; such as pornography and prostitution; 4, Financial and refinancing; such as stock kiting, tax solutions, loan packages; 5. Phishing and other fraud; such as “Nigerian 419” and “Spanish Prisoner”; 6. Malware and viruses; Trojan horses attempting to infect your PC with malware; 7. Education; such as online diploma; 8. Marketing; such as direct marketing material, sexual enhancement products; 9. Political; US president votes.

communication tool that has been greatly abused by spammers to disseminate unwanted information (messages) and spread malicious contents to Internet users. Current Internet technologies further accelerated the distribution of spam. Spam-reduction techniques have developed rapidly over the last few years, as spam volumes have increased. We believe that no one anti-spam solution is the “right” answer, and that the best approach is a multifaceted one, combining various forms of filtering with infrastructure changes, financial changes, legal recourse, and more, to provide a stronger barrier to spam than can be achieved with one solution alone. Spam Guru addresses the part of this multi-faceted approach that can be handled by technology on the recipient’s side, using plug-in tokenizes and parsers, plug-in classification modules, and machine-learning techniques to achieve high hit rates and low false-positive rates. Effective controls need to be deployed to countermeasure the ever growing spam problem. Machine learning provides better protective mechanisms that are able to control spam. This paper summarizes most common techniques used for anti-spam filtering by analyzing the e-mail content and also looks into machine learning algorithms such as Naïve Bayesian, support vector machine and neural network that have been adopted to detect and control spam. Each machine learning has its own strengths and limitations as such appropriate preprocessing need to be carefully considered to increase the effectiveness of any given machine learning. Key Words: Anti-spam filters, text categorization, electronic mail (E-mail), Spam Bayes, training email filters, content filtering, false negatives, user level spam filtering machine learning,

1. INTRODUCTION Spam-reduction techniques have developed rapidly over the last few years, as spam volumes have increased. We believe that the spam problem requires a multi-faceted solution that combines a broad array of filtering techniques with various infrastructural changes, changes in financial incentives for spammers, legal approaches, and more [1]. This paper describes one part of a more comprehensive anti-spam research effort undertaken by us and our colleagues:

© 2017, IRJET

|

Impact Factor value: 5.181

2. E-MAIL STRUCTURE E-mail messages are divided into 2 parts: Header information and message body. Header information or the header field consists of information about the message’s

|

ISO 9001:2008 Certified Journal

|

Page 429


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.