International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017
p-ISSN: 2395-0072
www.irjet.net
PREDICTION OF USER RARE SEQUENTIAL TOPIC PATTERNS OF INTERNET USERS M.SANGEEGTHA1, D.SWATHI2, J.PRIYANKA3, SHALINI YUVARAJ4 1Assistant
Professor, Department of Computer Science and Engineering, Panimalar Engineering College, Tamilnadu, India [2][3][4]UG Students, Department of Computer Science and Engineering, Panimalar Engineering College, Tamilnadu, India --------------------------------------------------------***-------------------------------------------------------------
Abstract-The advances of technology overtime have enabled the access to textual documents to Internet users all over the world with ease. Sequential patterns have been a focused theme in data mining. Finding the behaviour of a Sequential pattern are helpful in finding many analysing applications like predicting next event has been vital. But there exist a difficulty, since the mining may have to generate or examine a combinatorial abrupt number of intermediate subsequence. In this paper, we scrutinize abnormal behaviours of Internet users in Gmail and Twitter, we propose Sequential topic patterns (STP) and coin the problem of mining User-aware Rare Sequential Topic Patterns (URSTPs) in document streams on the Internet. Some contents are frequent for specific users, so it can be used to find the abnormal behaviour of the user in real-time scenario. To achieve this, a set of algorithms is presented. It includes algorithms for pre-processing the user contents, generate all STP support values for efficient pattern growth, and selecting user-aware rare sequential topics by using rare pattern domain analysis.
Keywords: document streams, sequential topic pattern, pattern-growth, domain analysis, rare sequential topic. INTRODUCTION Data mining, also known as knowledge discovery in databases has largely been a promising area for database research. Web services like Gmail and twitter provide a rich and freely accessible database for document streams generated and published by the users.
Chart- 1: Depicts Gmail and Twitter users The number of users of Gmail and twitter are enormously increased over years. So this paper concentrates on the users using these real time applications. The document streams are in the form of micro blogs, tweets, chat messages and emails are extracted to provide a thorough insight on the behavioural analysis of an internet user. There may be some correlations among these obtained topics in successive documents for a specific user, and these correlations could be described by Sequential Topic Patterns. STP not only summarizes on topic modelling, but also investigates the user intrinsic characteristics and psychological statuses. The real intension of publishing these document streams are hard to reveal directly from individual messages, but both content information and temporal relations of messages are required for analysis, especially for abnormal behaviours without prior knowledge. STPs Š 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
|
Page 965