Improving Association Rule Mining by Defining a Novel Data Structure

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 04 Issue: 07 | July -2017

p-ISSN: 2395-0072

www.irjet.net

Improving Association Rule Mining By Defining A Novel Data Structure Vinayak Suresh Shukla, Prof.Dr.Mrs.S.A.Itkar 1Student,PES’s

Modern College Of Engineering,Pune 5. Engineering Dept, PES’s Modern College Of Engineering, Pune 5. ---------------------------------------------------------------------***--------------------------------------------------------------------2HOD,Computer

Abstract - In recent years, growth in digital data storage in

here we implemented a new compressed data structure which will store the original data in compressed manner without losing original content. This new compressed data structure is obtained by applying three sequential techniques on the original data. viz Shuffling Inverted index mapping Run length encoding.

rapidly increased due to ease of use and lower coast digital storage media. This data is high dimensional and heterogeneous in nature. The process of knowledge discovery is being affected due to high dimensional and heterogeneous data. This process can be abbreviated as association rule mining (ARM). Though, many association rule mining algorithms have been proposed in recent years to deal with large volume of data, the mining process under-performs when the data size is very large in terms of records. Hence the aim of this work is not to design a new algorithm for mining, but to design a new data structure to store data reliably .The original data is simplified, recognized and access time increased for that data, to meet up efficiency in terms of time and main memory requirements. Lower main memory requirements and faster data access are achieved by means of Shuffling, Inverted Index Mapping and Run Length Encoding. Hence the resulting data structure can be used along with the existing association rule mining algorithms to speed up mining and reducing main memory requirements, without changing original algorithms. This is further improved by replacing Run Length Encoding by Modified Run Length Encoding Algorithm for better memory utilization and efficiency of mining algorithms.

Then it is further improved by Modified Run Length Encoding This new data structure will help in handling, speeding up access to data and fast computing when used with any of the existing mining algorithm.

2.REVIEW OF LITERATURE Association rule mining [2]is the most important technique used for mining the rules. This aims to extract strong and relative items within input data set. Basically it was designed for market basket analysis which would help shopkeepers arranging the sale items in order to grow the business. Apriori[4] is the first algorithm for association rule mining. It is based on comprehensive search. It works in two steps as I. finding all frequent item sets from data set and II. Deriving association rules from those frequents item sets. If data set has k single items, and N number of transactions, then [2^k1\] item sets can be generated. And of the data set size is huge, the processing is hard, because the complexity of above operation will be O(N X M X k) just to obtain candidate sets. The frequent pattern FP-growth[5] algorithm was introduced to reduce the number of transactions and related comparisons. Data needs to be scanned only once since FPgrowth stored frequent items in a tree structure. But it suffers with large number if I/O and large number of memory required storing all sets.

Key Words: Association Rule Mining (ARM), Data Compression, Data Structure, Index Compression, Knowledge Discovery, Modified RLE,.

1.INTRODUCTION In early years, digitization of all technical and non-technical fields has leaded to the production of large amount of digital data every day[1]. Storage of this large data is efficient since the cost of storage on media is less than early storage medias . Hence the cost of storage has negligible association with amount and heterogeneity of this data. But it has tremendously affected the mining process. Because of large amount of data, mining has become interesting but time consuming. Association rule mining[2] is well known and researched field for mining rules form given data. Many algorithms were introduced in order to speed up data analysis. Different strategies were used like reduce either number of candidate sets, number of transactions or number of comparisons of both. But any mining process with extremely high dimensional data [1]can be a hard process. So instead of changing existing algorithms or finding new one to speed up mining, it is better to introduce a new data structure [3]to store large data in compressed manner. So

© 2017, IRJET

|

Impact Factor value: 5.181

To overcome the existing drawback, like large memory requirements and long computational times, many evolutionary association rule mining algorithms were proposed [6],[7],[8],[9]. But, even after considering mining rules from different views, working with high-dimensional data was hard. So as a solution, it is found that creating a new data structure in order to reduce the original data set size would be advantageous, and that can be used for speeding up the mining, with current algorithms without changing their original functioning.

|

ISO 9001:2008 Certified Journal

| Page 1988


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.