A Paper on Web Data Segmentation for Terrorism Detection using Named Entity Recognition Technique

Page 1

International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 01 | Jan -2017

e-ISSN: 2395 -0056

www.irjet.net

p-ISSN: 2395-0072

A Paper on Web Data Segmentation for Terrorism Detection using Named Entity Recognition Technique Ms. Pooja S. Kade1 , Prof. N.M. Dhande2 1Student 2HOD

of Computer Science & Engineering, RTMNU University, A.C.E, Wardha, Maharashtra, India. of Computer Science & Engineering,RTMNU University, A.C.E, Wardha, Maharashtra, India.

---------------------------------------------------------------------------****----------------------------------------------------------------------------------

Abstract-Terrorism has grown day by day, its roots quite

large data sets and make the most use of obtained results. Data mining as well as web mining is used together at times for efficient system development. Web mining also consists of text mining methodologies that allow us to scan and extract useful content from unstructured data. Text mining allows us to detect patterns, keywords and relevant information in unstructured texts. Both Web mining and data mining systems are widely used for mining from text. Data mining algorithms are efficient at manipulating organized data sets, while web mining algorithms are widely used to scan and mine from unorganized and unstructured web pages and text data available on the internet. Websites created in various platforms have different data structures and are difficult to read for a single algorithm. Since it is not feasible to build a different algorithm to suit various web technologies we need to use efficient web mining algorithms to mine this huge amount of web data. Web pages are made up of HTML (Hypertext markup language) in various arrangements and have images, videos etc intermixed on a single web page. So here we propose to use DOM Tree concept to extracting text data from web pages and smartly designed web mining algorithms to mine textual information on web pages and detect their relevancy to terrorism. In this way we may judge web pages and check if they may be promoting terrorism. This system proves useful in anti terrorism sectors and even search engines to classify web pages into the category. Their relevancy to the field helps classify and sort them appropriately and flag them for human review.

deep in some parts of the world. With increasing terrorist activities it has become very important to control terrorism and stop its spread before certain time period. So as identified that internet is a major source of spreading terrorism through speeches, images and videos. Terrorist organizations use internet to brain wash individuals and younger’s and also promote terrorist activities through provocative web pages that inspire helpless people and college student to join terrorist organizations. So here we propose an efficient web data mining system and segmentation technique to detect such web properties and mark them automatically for human review. Websites created in various platforms have different data structures and are difficult to read for a single algorithm so we use DOM Tree concept to extract the web data and SIFT feature for edge extraction that organized web data. Also we use Kmeans algorithm for segmentation and KNN for classification. In this way we may judge web pages and check if they may be promoting terrorism or not. This system proves useful in anti terrorism sectors and even search engines to classify web pages into the different category. Key Words: Data mining, Web mining, Patterns, DOMTree technique, object recognition, Segmentation.

1. INTRODUCTION Terrorism has grown its roots quite deep in certain parts of the world. With increasing terrorist activities it has become important to curb terrorism and stop its spread before a certain time. So as identified internet is a major source of spreading terrorism through speeches and images. Terrorist organizations use internet to brain wash individuals and also promote terrorist activities through provocative web pages that inspire helpless people to join terrorist organizations. So here we propose an efficient web data mining system to detect such web properties and flag them automatically for human review. Data mining is a technique used to mine out patterns of useful data from

Š 2017, IRJET

|

Impact Factor value: 5.181

2. LITERATURE SURVEY 1. TwiNER: Named Entity Recognition in Targeted Twitter Stream [1] In this paper, NER system is used for targeted Twitter stream and is called as TwiNER to address the challenge of named entity recognition.

|

ISO 9001:2008 Certified Journal

|

Page 902


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.