DU-HUIM: a novel dynamic utility high-utility itemset miner algorithm for dynamic utility variations by IRJET Journal

DU-HUIM: a novel dynamic utility high-utility itemset miner algorithm for dynamic utility variations in big data

Arkan A. Ghaib

Department of Information Technologies, Management Technical College, Southern Technical University, Basrah, 61004, Iraq

Abstract - High-UtilityItemsetMining(HUIM)hasrecently emergedasanimportantdataminingtechnique thatfocuses on discovering those itemsets that offer high utility. Nonetheless, this is limited in applicability with real-world settingswhereutilityvalueschangedynamically overtime,as the vast majority of existing methods either assume static values for items. In order to overcome this limitation, we propose for the first time in a big data context a new algorithm called Dynamic Utility High-Utility Itemset Miner (DU-HUIM), that efficiently manages these dynamic utility variations.DU-HUIMcontainsadynamicutilityliststructure, adaptivepruning strategies,andparallelprocessingabilities to achieve high-utility itemset mining. On standard datasets, the experimental results show that DU-HUIM exhibits a remarkable improvement over the prior art with respect to runningtime, scalability,and resistancetochangesinutility.

Key Words: BigData,HighUtilityItemset,FrequentItemset Mining,Parallelcomputing,DynamicMiner.

1.INTRODUCTION

The problemofHigh-UtilityItemsetMining(HUIM)[1]has been extensively studied and has various applications in different domains, such as retail, e-commerce, and healthcare. It then finds itemsets whose utility values are higher thanaspecifiedminimumutilityvalue,whereutility indicatestheimportanceofanitem,suchasvalue,weight,or othermetrics.However,traditionalHUIMalgorithmsassume that utility values are constant over all transactions, an assumption that can be unrealistic in dynamic environments.Forexample,pricesanddemandforproducts canchangedependingonthingslikeseasons,promotions,or the market environment. Especially, Frequent Itemset Mining(FIM) [2],acanonicalproblemindatamining,finds frequent patterns from transactional databases. FIM methods like Apriori and FP-Growth [3] focus on the frequency of itemsets and don't consider the utility or significance of items in practical situations. High-Utility Itemset Mining (HUIM) broadens FIM from the search of itemsetsyielditemsettothesearchofitemsetsyield high utility which utility is a metric for the significance, profitablityorimportanceofanitemsetinadataset(Pandi). This has converged HUIM to become of paramount importance for areas like retail analytics, e-commerce, healthcare, picking and replenishment, inventory and supplychainmanagement[4].InHUIM,utilityper itemcan

be a product of internal utility (e.g. the Transactional volume)andexternalutility(e.g.profitperunit)High-Utility Itemsets(HUIs)[5] aretheitemsetswhoseutilityishigher than a pre-defined threshold. While FIM can utilize antimonotonicitypropertiesforpruning,thisdoes notholdfor HUIM making the mining beyond costly. Techniques to approximateutility,likeTransaction-WeightedUtility(TWU) [6], have been developed to alleviate this search space problem.Toimprovethe efficiencyandscalabilityofHUIM, several algorithms have been presented. Absolute HighUtility Itemset Miner (AHUIM) [7] presented a parallel approachformininghugedatasetseffectivelyusingpruning strategies such as Absolute Subtree Utility (ASU) and Absolute Local Utility (ALU) in order to speed up computations.Recentimprovementstotheenhancedmodels likeEFIM-Parand PHUI-Minermadeuseofdistributedand parallel computingframeworkstoscaleevenmore.EFIMPar proposed efficient usage of memory and designed pruning strategies that can be applied to large-scale datasets[5,8].Toaddresstheissueofdiscovering frequent patternsonmassivedatasets,MRFP-Growthisbeingused. Withthelarge datasets,mostofthealgorithmsforfrequent patternsfailtowork.[9,10]

However,mostoftheseHUIMalgorithmsstillassumestatic utility values for the entire transactional database. This assumption limits these algorithms because in the real world, the utility values can dynamically change and this canbeaffectedby:

Seasonal patterns: In the retail sector, prices and user demandvary withseasonsorpromotionalcampaigns.

Dynamic pricing: In e-commerce, based on algorithms, pricesarevariableaspersupplyanddemand.

Resource-constrained:Inhealthcare,utilitymay dependon urgencyofallocatinglimitedresources.

Staticutility assumptionsleadtothewronginsightshaving beenmadeandactionabledecisionopportunitiesmissing. For example, AHUIM and EFIM-Par can retrieve goodqualitypatternsfromstaticdatasets,buttheyarenotwell equipped to address temporal changes in the utility of patterns.

In this paper, we fill the gap and propose the Dynamic Utility High-Utility Itemset Miner (DU-HUIM), the first

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

generalhigh-utilityitemsetminerabletodealwithdynamic utility.

2. Related Work

Since the inception, there has been considerable development in High-Utility Itemset Mining (HUIM), including a variety of algorithms introduced to improve efficiency, scalability, and applicability to domains. In this section,wewillmentionsignificantworksfocusingonHUIM and mention the existing gaps filled by the proposed DUHUIMalgorithm.

2.1 Conventional HUIM Methods

Yaoetal.thattheutilityofanitemsetistheproductofits internal and external utility values and proposed the key features of HUIM (2004). They argued that utility-based mining should take precedence over frequency-based approaches,“settingthestage”forfuturework [11].Liuetal. (2005) suggested a Two-Phase algorithm based on a measurecalledTransaction-WeightedUtility(TWU)which prunes the search space efficiently. Although effective in reducingcomputationalcomplexity,thealgorithmemploys multipledatabasescanswhichmakeitlessoptimalforlarge datasets [12 Zida et el. (2017) proposed EFIM, a new algorithmforHUIMwhichisfastanduseslittlememory.We useeffectivepruningstrategiessuchaslocalutilitypruning toreduce thecandidatesearchspace[1],andwecantrainon datauptoOctober2023.EFIMgreatlyenhancesruntimeand memoryefficiency, butitreliesonstaticutilityvalues,which limitsitsapplicabilityindynamicscenarios[5].

2.2 Parallel & Distributed HUIM Algorithms

Asthedatasetscouldbelarger,parallelanddistributed frameworks forHUIMwereintroduced.Firstintroducedby Dalal and Dahiya (2020) using a parallel architecture, the AbsoluteHigh-UtilityItemsetMiner(AHUIM)mines itemsets from larger datasets. AHUIM proposed improved pruning with Absolute Subtree Utility (ASU) and Absolute Local Utility (ALU) metrics. By not accounting for the dynamic utility variations, AHUIM is effective but only with less dynamic datasets [13]. Sethi et al. The authors in [10] proposedP-FHM+,aparallelizedFHM,thatfurtherimproves scalability and decreases execution time on distributed systems. While P-FHM+ distributes computational loads acrossnodes,itdoesnotaccountforreal-timeutilitychanges, making it less applicable as environments change [14]. Followingthat,Chen(2016)proposedPHUI-Miner,aSparkbased framework that utilizes the principles of data compressionandsamplingtoefficientlyanalyzelarge-scale datasets. Although PHUI-Miner benefits from fast runtime performance,itsapproximate methodscomeatthepriceof accuracy[8],especiallyfordynamicvariationsoftheutility. Ghaib etal. EN-listandIDUIMcanbe implemented for the distributedutilityitemsetsminingoverthebigdata[15-17],

whereaHadoop-basedprepostalgorithm(HPrepost)based ontheMapreduceprogrammingmodelhasbeenproposed.

2.3 Recent Techniques in HUIM

Nguyenetal.(2019)pEFIM:aparallel versionofEFIM basedon depth-firstsearchfor improvingthe efficiencyof mining.ThoughpEFIMscaleslinearlyinsize, pEFIM'sstatic assumptionsonutilityvalueslimititsapplicationtodynamic datasets [18]. Vo et al. (2020)HMiner-Closed: High Utility Closed Itemset Mining. We then use compact utility-list structures and enhanced pruning strategies combine together, in a new approach of course, to improve the efficiency. A relevant work for closed patterns is HMinerClosed [ 19 ], but it does not take into consideration the differencesinutilityvaluesamongtransactions.Forexample, Krishnaand Vadlamani(2021)investigatedtheeffectiveness ofevolutionaryalgorithmsforHUIMandintroducedabinary differentialevolutiontechniqueforcustomersegmentation. Thisapproachisinnovative,butitismoreaboutoptimization rather than adaptation to the dynamic changes of utilities [20]

3. Research Problem

3.1 Problem Description

The majority of existing High-Utility Itemset Mining (HUIM) approaches were built for static datasets, where utilityvaluesareconstantineachtransaction.Althoughthis assumption makes computation easier, it does not reflect reality.Inreality,thereareawholehostoffactorsthatwill leadtoutilityvaluesdiverging,whetherthatbefromseasonal trends, dynamic pricing strategies, or changing customer preferences.CurrentalgorithmslikeEFIMand AHUIMdonot have the capabilityto deal with such dynamic fluctuations resulting in poor performance when utility values change overtime.Keychallengesinclude:

Dynamic Utility:Transactionlevelrepresentationandpace

Scalability:Theminingprocess shouldproduceresultsboth effectively and efficiently on large datasets with dynamic utilityvalues.

Using Dynamic Search Space Pruning to Minimize ComputationalOverhead

3.2 Literature Review

ThedevelopmentofHUIMalgorithmshasbeenboosted byseveralstudies:

EFIM(Zidaetal.,2017)proposeditsowneffectivewaytodo pruningbystaticutilitydatasets.Nonetheless,EFIM isstill oblivious to the utility differences of transactions. AHUIM (Dalal and Dahiya, 2020): Used parallel processing to improvescalability,butwaslimitedtostaticassumptionson

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

whatthevalueofutilitywas.P-FHM+(Sethietal.,2019)2.1: Distributedcomputingforscalability,butno mechanismsfor dynamicutilities.PHUI-Miner(Chen, 2016):Improveddata compressiononlargedatasetsatdynamicscost.

Thistrendunderpinstheneedforamechanismthatcan provide dynamicrepresentationoftheutilityandadaptive methodstoswitchbetweenvaryingconditions.

3.3 DU-HUIM

Intheprevioussection,wehavelistedsomegapsinthe literature. DU-HUIM proposes novel methods for dynamic portrayal ofutility,progressivefiltering,aswellasscaling. Someofthe majorelementsofDU-HUIMare:

Dynamic Utility List (DUL): Used for capturing and updating source differences per transaction. Dynamically tune pruning threshold based on utility event patterns. Parallel Processing Framework: Allow for large dataset miningbysplittingupcomputationsacrossnodes.

Algorithm: DU-HUIM

Output: Transactional database DD, the utility lower boundminUtilminUtil oftheitem.Output:HH,high-utility itemsets.

Steps:

Initialization:

OnyourDynamicUtilityList(DUL),find allitemsinDD.

Do1passtocomputeinitialTransactionWeightedUtility (TWU)values forpruning

DynamicPruning:

IfTWU(X)

Prunethumbsupthresholdandthumbsdown threshold dynamicallybasedonobservedutilitytrends.

RecursiveMining:

Fuse the candidate itemsets breadth first to join their element.

ForeachcandidateX:

Calculateitsdynamicutility byDUL.

IfU(X)≥minUtilU(X)≥minUtil, keepX

ParallelProcessing:

SplitDD tovariousprocessingnodes

Parallel processingthesteps1-3foreachpartition.

Fromeachnodereturntheresultsand combinethemto createthefinalsetofhighutilityitemsets.

ResultCompilation:

Output: H{Allhigh-utilityitemsets}

2025, IRJET | Impact Factor value: 8.315 |

PseudocodeDU-HUIMAlgorithm

Algorithm DU-HUIM

1.SetDULofallitemsinDto beinitialized

2:ComputeTWUforallitems

3.Forall itemswhereTWU<minUtil,prune.

4: foreachpartitionPinparallel:

5:Candidateitemsets Generation

6.foreach candidateXdo:

7: CalculatedynamicutilityU(X)fromDUL

8:if U(X)≥minUtilthen

9:AddXtoH

10:Collateresults fromeachpartition

11:ReturnH

4. Experimental Results

DU-HUIM was implemented and its performance was compared with four state-of-the-art algorithms, Enhanced Absolute High-Utility Itemset Miner (EAHUIM), Absolute High-UtilityItemsetMiner(AHUIM),PHUI-MinerandEFIMPar. We evaluate the algorithms using execution time, memorydemands,andscalabilityinexperiments onseveral benchmarkdatasets.

4.1 Experimental Setup

Datasets To evaluate the performance of the algorithm EAHUIMand,consequently,tocompareitwithDU-HUIM,the real-worlddatasetsChainstore,Kosarak,BMS2and Connect wereemployed.Thesedatasetswereallacquiredfromthe SPMFlibrary,oneof thecommonbusinessesintheselection of the datasets for the pattern mining. To establish a comprehensive benchmark for the performance and scalability of the algorithms,weemployed a diverseset of transactional datasets, covering different scales, item distributions,andtransactiondensities.

These datasets are summarized in Table 1. Here, #Transactions is the number of transactions in a dataset, #Itemsisthenumberofuniqueitems,and#AvgItemsisthe averagenumberofitemspertransaction.

Table1:Characteristicsofdatasets.

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Chainstore: This dataset is from a big retail chain and consistsofmorethanamilliontransactions;itservesasan excellentdatasetto testscalability.

Kosarak:Onemoredatasetwhichhasbeen derivedfrom clickstream data, Kosarak has significant transaction diversity, and can be used to evaluate algorithms in cases whereitemshavecomplexrelationshipswitheachother.

BMS2: This is a retail dataset with limitations in the numberoftransactionsanditems,BMS2servesasasolution totestefficiencyinasmallerdataset.

Connect:Connect-4isasmallitemcountbutaverybig averagenumberofitemspertransaction,compilingadense dataset that serves as a good challenge for memory management andprocessing.

4.2 Execution Time

Table 2 Execution Time The results of the runtime experiments using DU-HUIM and other state-of-the-art methodssuchasEAHUIM,AHUIM,PHUI-Miner,andEFIM-Par arepresentedinTable2.Thedatasetsbelowhavebeenused forthis evaluation:Chainstore,Kosarak,BMS2,andConnect asdescribedinSection5.ItshowstheperformanceofDUHUIM relativelytoothersimilarmethods.

Table2:Runtimeperformancesforvariousdatasets.

Dataset

ChainstoreDataset:Withthebest-in-classruntimeof35.2 second,DU-HUIM outperformedEAHUIM by around26%, andAHUIMbyaround31%.

Kosarak Dataset: DU-HUIM outperformed its rivals in terms of runtime again, taking only 72.3 seconds, while EAHUIMtook90.4secondsandAHUIM96.7seconds.

BMS2Dataset:OnsmallerdatasetssimilartoBMS2,DUHUIMpreserveditsedge out-classifyingwithafunctioning timeof5.6secondswhichis24%quickerthanEAHUIMand 31%quickerthanAHUIM.

Connect Dataset: For dense datasets like Connect, DUHUIMoutshineditscompetitors,finishingthetaskin18.9s against23.7s ofEAHUIMand25.5forAHUIM

4.3 Memory requirements

The memory requirements of DU-HUIM and other algorithms (EAHUIM, AHUIM, PHUI-Miner, and EFIM-Par)

overthe datasetsdefinedinSection5areshowninTable3. The memory consumption was performed at runtime to check how efficient the memory management was on the partofeachalgorithm.

Table3:MemoryusageforDU-HUIMandother algorithmsfordifferentdatasets.

Dataset DUHUIM (MB) EAHUIM (MB) AHUIM (MB) PHUIMiner (MB) EFIMPar (MB)

Chainstore 620 750 800 980 700

Kosarak 890 1100 1250 1300 1020

BMS2 190 250

Chainstore Dataset: DU-HUIM had the least memory overhead to be stored (620MB), which is 17%, and 22% lowerthanEAHUIM,andAHUIM,respectivelyThisisdueto their dynamic utility list structure and adaptive pruning strategies.

Kosarak Dataset: DU-HUIM was memory efficient with 19%and29% lessmemoryutilizedcomparedtoEAHUIM andAHUIMrespectively.

BMS2Dataset:Inthecaseofsmallerdatasets,DU-HUIM kept the memory consumption low (190 MB) with a 24% reduction compared to the EAHUIM and a 30% reduction overtheAHUIM.

ConnectDataset:Onthisdataset,DU-HUIMachieved17% and 21% memory saving over EAHUIM and AHUIM respectively,confirmingthatDU-HUIMcandealwithdense datasetsefficiently.

4.4 Scalability

In Table 4, the scalability test result is shown for DUHUIM andotheralgorithmssuchasEAHUIM,AHUIM,PHUIMiner,andEFIM-Par.Thetestsperformedreceived onhow thealgorithmsscalewithdata,asthenumberoftransactions intheChainstoredatasetwasincreased.Thefulldatasetwas incrementallyincreasedfrom25% to100%oftheoriginal datasetsize.

Table4:Scalabilitytestforthealgorithms.

Dataset Size(%)

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

Linear Scalability: The execution time of DU-HUIM increased linearly with the dataset size, exhibiting nearlinearscalability.Thenormalforminwhichthe algorithmis expressed allows for parallel processing and extensive pruning.

In comparison with EAHUIM and AHUIM at all dataset sizes DU-HAUIM completed the 100% dataset test in a consistent manner that was 26% and 31% faster than EAHUIMandAHUIMrespectively.

PHUI-Miner did not scale as well with dataset size in termsofruntimeasDU-HUIM,withsteeplinearincreasesin runtimeobservedintheformer.

EFIM-Par Efficiency: Although EFIM-Par achieves good scalability,DU-HUIMhassuperiorperformanceby18%on average because of its adaptive pruning and the proposed dynamicutilityliststructures.

3. CONCLUSIONS

Weintroducedanewalgorithm,DynamicUtilityHigh-Utility ItemsetMiner(DU-HUIM),toovercomethesechallengesand dynamic utility changes faced by traditional HUIM approaches. Through a dynamic utility list structure, adaptive pruning strategies and a parallel processing framework, DU-HUIM delivers orders of magnitude improvements in execution time, memory efficiency, and scalability over state-of-the-art algorithms like EAHUIM, AHUIM,PHUI-Miner,andEFIM-Par.DU-HUIMconsistently performsbetterthanthecurrentmethodsonexperimental resultsinbenchmarkdatasets, fordynamicutilityvariation scenarios,theimprovementeffectsaremoresignificant.It wouldshowthatwithoutsacrificingaccuracyfortimeand space efficiency, the new algorithm computes high-utility itemsetsadaptedtothedynamicenvironmentsunderreal worldconditions.Itslinearscalabilityandefficientmemory utilizationarevaluable contributionstodatamining.

REFERENCES

[1] Wu,J.M.-T.,Lin,J.C.-W.,&Tamrakar,A.(2019).Highutilityitemsetminingwithef-fectivepruningstrategies. ACMTransactionsonKnowledgeDiscoveryfromData, 13(6),Article58.Doi:10.1145/3363571.

[2] Agrawal, R. , & Srikant, R. (2003). Fast algorithms for mining association rules in large databases. In InternationalConferenceonVeryLargeDatabases(pp. 487–499).

[3] R. Agrawal and R. Srikant, "Fast algorithm for mining associationrulesinlargedatabases", Proceedingsof20th VLDB conference,pp.487-499,1994.

[4] Dahiya, V., & Dalal, S. (2020). Parallel approaches of utility mining for big data. Webology, 17 (2), 31–43. 10.14704/WEB/V17I2/WEB17014.

[5] Zida, S., Fournier-Viger, P., Lin, J. C. W., Wu, C. W., & Tseng,V.S.(2017).EFIM:Afastandmemory-efficient algorithm for high-utility itemset mining. Knowledge InformationSystems,51 (2),595–625.10.1007/s10115016-0986-0.

[6] Chan, R., Yang, Q., & Shen, Y.-D. (2003). Mining high utilityItemsets.In ProceedingsoftheThirdInternational Conference on Data Mining (pp. 1–19). 10.1109/ICDM.2003.1250893.

[7] Dalal,S.,&Dahiya,V.(2021).Performancecomparison of absolute high utility itemset mining (AHUIM) algorithm for big data. International Journal of Engineering Trends and Technology, 69 (1), 17–23. 10.14445/22315381/IJETT-V69I1P203

[8] Chen, Y. (2016). Approximate parallel high utility itemset mining. Big Data Research , 26–42. 10.1016/j.bdr.2016.07.001.

[9] Al-Hamodi, A. A., & Lu, S. F. (2016, June). MapReduce FrequentItemsetsforMiningAssociationRules.In 2016 International Conference on Information System and ArtificialIntelligence(ISAI) (pp.281-284).IEEE.

[10] Al-Hamodi,A.A.,&Lu,S.(2016,April).MRFP:discovery frequent patterns using MapReduce frequent pattern growth. In 2016 International Conference on Network and Information Systems for Computers (ICNISC) (pp. 298-301).IEEE.

[11] Yao,H.,Howard,J.H.,&Cory,J.B.(2004).Afundamental approach to mining itemset utilities from databases. Fourth International Conference on Data Mining. doi:10.1137/1.9781611972740.51.

[12] Liu,Y.,Liao,W.,&Choudhary,A.(2005).ATwo-Phase Algorithm for Fast Discovery of High Utility Itemsets. Advances in Knowledge Discovery and Data Mining doi:10.1007/11430919_79.

[13] Dalal, S., & Dahiya, V. (2020). Absolute High Utility Itemset Miner (AHUIM) for Big Data. International Journal of Advanced Trends in Computer Science and Engineering, 9(5), 7451-7460. doi:10.30534/ijatcse/2020/78952020.

[14] Sethi, K. K., Ramesh, D., & Sreenu, M. (2019). P-FHM+: Parallel High Utility Itemset Mining Algorithm for Big Data Processing. Lecture Notes in Computer Science, 1131, 233-243. doi:10.1007/978-3-030-05366-6_9.

[15] Ghaib,A.A.,Alsalhi,Y.E.A.,Hayder,I.M.,Younis,H.A.,& Nahi, A. A. (2023). Improving the Efficiency of Distributed Utility Item Sets Mining in Relation to Big Data. Journal of Computer Science and Technology Studies, 5(4),122-131.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 01 | Jan 2025 www.irjet.net p-ISSN: 2395-0072

[16] Ghaib, A. A., & Nahi, A. A. (2024). Enhancing N-List StructureandPerformanceforEfficientLargeDataset Analysis.

[17] Al-Hamodi,A.A.,&Lu,S.(2017).Anovelapproachfor fastminingfrequentitemsetsuseN-liststructurebased onMapReduce. arXivpreprintarXiv:1704.04599

[18] Nguyen,T.D.D.,Nguyen,L.T.T.,&Vo,B.(2019).pEFIM: A Parallel Algorithm for Mining High Utility Itemsets. AdvancesinIntelligentSystemsandComputing,853,267279.doi:10.1007/978-3-319-99996-8_26.

[19] Vo,B.,Nguyen,L.T.,Bui,N.,Nguyen,T.D.,Huynh,V.N.,& Hong,T.P.(2020).HMiner-Closed:AnEfficientMethod forMiningHighUtilityClosedItemsets. IEEE Access, 8, 78-99.doi:10.1109/ACCESS.2020.2974104.

[20] Krishna, J. K., & Vadlamani, R. (2021). High Utility ItemsetMiningUsingBinaryDifferentialEvolution:An ApplicationtoCustomerSegmentation. Expert Systems with Applications, 181, Article 115122. doi:10.1016/j.eswa.2021.115122.