An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_util) for Mining High Ut

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 04 Issue: 07 | July -2017

p-ISSN: 2395-0072

www.irjet.net

An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_util) for Mining High Utility Item sets from Transactional Database. Mr. Sunil H. Sangale1, Prof Dr. D.V.Patil2, Prof. R.C. Samant3 1 PG Student, Dept. of Computer Engg. R.H. Sapat College, Pune University, Nashik , Maharashtra, India 2 Head Of Dept. of Computer Engg. R.H. Sapat College, Pune University, Nashik , Maharashtra, India 3 Asst. Professor, Dept. of Computer Engg. R.H. Sapat College, Pune University, Nashik , Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - High utility itemsets mining from a big

find the itemsets. To address the issues of frequent mining, utility mining came into existence. In utility mining, each item is associated with a unit profit and the quantity of that item. An item set is called high utility item set (HUI) if its utility is no less than a user specified minimum utility threshold min_util. Efficient mining the high utility itemsets in databases is not an easy task because the downward closure property used in FIM does not hold for the utility of item sets. In other words, pruning search space for HUI mining is difficult because a superset of a low utility item set can be high utility. To tackle this problem, the concept of transaction weighted utilization (TWU) model was introduced. In this model, an item set is called high transaction-weighted utilization item set (HTWUI) if its TWU is no less than min_util, where the TWU of an item set represents an upper bound on its utility. Depending on the threshold value, the search space can be very small or very large. Besides, the choice of the threshold greatly influences the performance of the algorithms. If the threshold is set too low, many high utility itemsets are generated and it is difficult for the users to comprehend the results. A huge search space makes mining algorithms incompetent or even run out of memory, because the more HUIs the algorithms generate, the more resources they consume. On the contrary, if the threshold is set too high, no HUI will be found. To find a proper value for the min_util threshold, users need to try different thresholds by estimating and re-executing the algorithms over and over until being satisfied. In this paper, we discourse all of the above challenges by proposing a novel framework for high utility item set mining, with the desired number of HUIs to be mined. This technique is proposed for mining the complete set of top HUIs in databases without the need to specify the min_util threshold. This strategy is concerned with any kind of one phase algorithm which have item set with their utility.

transactional database is an emerging concept in data mining which refers to the discovery of knowledge like high utility itemsets (profits) with user-specified minimum utility threshold min_util. Since a number of relevant algorithms have been proposed in past years, they fall into the problem of producing a large number of candidate itemsets for high utility itemsets. Though, setting min_util properly is a difficult problem for users. Generally discourse, finding a suitable minimum utility threshold by trial and error is a tedious process for users. If min_util is set very small value, then very large set of High Utility Itemsets will be generated, which may cause the mining process to be very inefficient. On the further case, if min_util is set very large, it is expected that no High Utility Itemsets will be found. Such a huge number of candidate itemsets decrease the mining performance in terms of time and space complexity. In this paper, we discourse the above issues by proposing a new framework for high utility itemset mining, with desired number of HUIs to be mined. Here we have done a structural comparison of the two algorithms with discussions on their advantages and limitations. Experiential evaluations on both real and synthetic datasets show that the performance of the proposed algorithms is close to that of the optimal case of state-of-the-art utility mining algorithms. This template, modified in MS Word 2007 and saved as a “Word 97-2003 Document ( Size 10 & Italic , cambria font) Key Words: Candidate pruning, frequent itemset, high utility itemset, utility mining, data mining.

1. INTRODUCTION Frequent item set mining (FIM) is a fundamental research concept in data mining. The traditional FIM may yield a large numbers of frequent but low-value item sets and may lose the information on valuable item sets having low selling frequencies. Hence, it cannot satisfy the requirement of users who desire to discover item sets with high profits. Even, the association rule mining algorithm named apriori is used to find the candidate itemsets and then derive the frequent itemsets based on the minimum support value. The apriori used join and prune mechanism to

|

Impact Factor value: 5.181

2. LITERATURE SURVEY R. Agrawal et al in [2] has proposed Apriori algorithm, it is used to find frequent itemsets from the database. In miming the association rules we have the problem to generate all association rules that have support and confidence greater than the user specified minimum threshold respectively.

|

ISO 9001:2008 Certified Journal

| Page 1494

Turn static files into dynamic content formats.

Create a flipbook