CLUSTERING MODELS FOR MUTUAL FUND RECOMMENDATION

Page 1

CLUSTERING MODELS FOR MUTUAL FUND RECOMMENDATION

1,2,3,4 B.Tech Student, Dept. of Information Technology, VJTI College, Mumbai, Maharashtra, India

5Associate Professor, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India ***

Abstract - The mutual fund industry has expanded significantly, providing investors with several investment options. Mutual fund information is necessary for investors to make prudent investments. Yet, novice investors may find the financial environment to be complex owing to the abundance of information. A mutual fund recommendation system based on machine learning and data analytics overcomes this issue. We have proposed a clustering models for recommending mutual funds by analyzing theories regarding mutual fund investments and returns.

Key Words: Mutual funds, Clustering models, K-means, DBSCAN, Hierarchal, Agglomerative

1. INTRODUCTION

Mutualfundinvestinghasbecomeanintegralcomponentof portfolio management for investors and financial institutions.Yet,choosingthebestmutualfundstoinvestin maybedifficultowingtothevastnumberofpossibilitiesand thecomplexityoftheelementsthataffecttheirperformance. It is essential to accurately forecast the performance of mutualfundsinordertomakeeducatedinvestingselections. In this paper, we have described clustering models for recommending mutual fund investments. The suggested model takes into consideration a number of implicit and explicitparameters,suchasexpenseratios,fund manager experience,pastperformance,andnetassetvalues,inorder tocreateinvestmentrecommendationsthatcorrespondto aninvestor'spreferencesandriskprofile.Themodelssuch as K-means, hierarchical clustering, and DBSCAN group mutual funds based on their comparable traits and performance. This allows the models to offer suggestions basednotjustonthecharacteristicsofindividualfunds,but also on the performance and behavior of funds with comparable characteristics. Using cutting-edge clustering techniques, our models provides a complete solution for investorsseekingtoimprovetheirmutualfundinvestments.

2. PROBLEM

2.1 Problem statement

To propose clustering models for recommending mutual funds. Today, there is a lack of personalized and accurate recommendations for investors due to the vast amount of dataandthecomplexnatureofmutualfunds.Theexisting approachesarelimitedandmaynotprovideasatisfactory solutionfornoviceinvestors.Hence,thereisaneedforan

efficient and reliable recommendation system that can consider the individual preferences and risk tolerance of investorstoprovidetailoredrecommendationsformutual fundinvestments.

2.2 Problem elaboration

With the rise of online trading platforms, retail investors nowhaveaccesstoawidervarietyofinvestmentoptions, but the sheer number of options can be overwhelming. Additionally,manyinvestorsmaylackthefinancialexpertise toevaluatetherisksandreturnsofdifferentmutualfunds effectively.

A mutual fund recommendation system could provide personalizedinvestmentadvicebasedonauser'sinvestment goals, risk tolerance, and other relevant factors. However, designing an effective system would require addressing severalchallenges.Oneoftheprimarychallengesisbuilding a model that can accurately predict the performance of different mutual funds based on historical data. This requiresidentifyingrelevantfeaturesthatarepredictiveof mutual fund returns and developing algorithms that can effectivelylearnfromthisdata.

Anotherchallengeisensuringthatthesystemcanprovide personalized recommendations that reflect each user's unique investment goals and preferences. This requires developingeffectivemethodsforcapturinguserpreferences andincorporatingthemintotherecommendationprocess.

Finally, it is important to ensure that the system is transparentandeasytousefornoviceinvestors.Thismeans designing an intuitive user interface that explains the rationalebehindeachrecommendationandprovidesusers withtheinformationtheyneedtomakeinformeddecisions.

Overall, a mutual fund recommendation system has the potential to empower novice investors and help them navigate the complex world of mutual fund investments. However,designinganeffectivesystemrequiresaddressing severaltechnicalanduser-facingchallenges.

3. DATA

3.1 Data collection

We acquired our data from the Value Research Online website. It is a well-established website that provides financial information and analysis to help investors make

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page435
International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072

informed decisions about their investments. The website offers a wide range of services, including mutual fund research. Additionally, the website follows strict editorial policiestoensuretheaccuracyandreliabilityofitscontent.

The data provided includes critical attributes for equity, debt, and hybrid mutual fund types, which are financial vehicles that investors can use to invest in the financial markets.Equityfundsinvestinstocksandhavehigherrisk andreturnpotential,whiledebtfundsinvestinfixedincome securities with a fixed rate of return and lower risk comparedtoequityfunds.Debtfundsarefurtherclassified basedonthedurationofbondstheyinvestin.Hybridfunds invest in a mix of equity and debt securities, offering a balancedmixofriskandreturnpotential.

3.2 Data preprocessing

Thedatacontainedalotofshortcomingsthatneededtobe dealtwithbeforepassingittothemachinelearningmodel. Data was scattered in separate databases with different schemas. Several records had null values. Hence, data integrations along with data cleaning steps had to be performed. Thus, to make the data more disposable, followingdatapreprocessingstepshadtobeapplied.

1) Dataintegration

Forseparatefeatures,datawasextractedinaseparatecsv file. These columns were different for 3 kinds of mutual funds, i.e., Equity, Hybrid and Debt. Hence, we created a commonschemawasnecessarytounifytheserecordsunder acommondataset.

2)

Featureselection

Basedon the relevance ofall features,onlythosefeatures wereselectedthatmayhelpinpredictingthemutualfunds.

Datacleaning

a) Dealingwithnullvalues:

 Recordsmissingcriticalfeatures:

There are several records in the dataset where critical features such as Sharpe Ratio, Standard Deviation,andSortinoRatioaremissing.Itisdifficult to evaluate risk involved without these features. Hence,recordswithoutthesefeatureswerediscarded completely.

 Recordsmissinganon-criticalvalue:

Such features were filled with the average value (mean)ofthewholecolumn.

b) Dealingwithduplicates:Duplicatesweredeleted.

c) HandlingOutliers:Weusedgraphicalmethodssuchas boxplotsandwhiskerplotstodeterminetheoutliers.

3) FeatureExtraction

a) To make the data more expressive, we converted a fewcategoricalcolumnswithonlyafewvalues,into onehotencodedvector.Clusteringalgorithmsusually usenumericaldataandrawformofcategoricaldata mightbeerroneous.Hence,inorderfortheclustering algorithmstoworkmoreefficientlyandremoveany bias,weconvertedcolumnssuchasfundcategoryand fundstyle.

b) Afewnewfeatureswereaddedtoextractvaluable information from the existing columns. For example,thecolumncalled‘date’wasconvertedto ‘age_in_months’ by applying appropriate mathematicalfunctions.

To work with manager_tenure, only primary manager tenure was extracted from an array of managers.

4) Exploratorydataanalysis

This step involved analyzing data and comparing different gestures with each other. This resulted in a correlationmatrixbetweenallthefeatures.Usingthis matrix,featureswhichvaluesextremelycorrelatedto eachotherhadhadtoberemovedinordertoremove the bias. Hence columns such as NAV_latest, NAV_previous had a correlation of 1. These columns were combined to form only 1 column called NAV_latest.

5) Scalingdata

Beforepassingthedatatothenextstep,thedataneeds tobenormalizedorscaledsothatbiggervaluesdon’t skew the clustering output. All the numerical values werescaled

Byimplementingthesesteps,wecanensurethatthedataset is cleaned, filtered, and transformed into a more useful formatforrecommendationmodeling.

Finally,afterperformingallthesepre-processingsteps,the datacontainedattributesdenotingfundtypelikeequitydebt or hybrid, fund performance metrics like expense ratio, returns and fund manager tenure, fund style like growth, valueor blendandseveral othernumerical attributeslike riskfactor,netassetvalue,standarddeviation,Sharperatio andstandarddeviation.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page436
International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072

Volume: 10 Issue: 04 | Apr 2023 www.irjet.net

4. CLUSTERING MODELS

Weproposefourclusteringmodels:

1) K-means: It is used to cluster and partition data into groupsbasedonsimilaritiesbyminimizingthesumof squareddistancesbetweencentroidsanddatapoints.

2) Hierarchical:Itisusedtogroupdataintoclustersina hierarchical manner, based on the distance between datapoints,withoutneedingtospecifythenumberof clustersbeforehand.

3) Agglomerative:Itisahierarchicalclusteringalgorithm that starts with each point as a single cluster and graduallymergesthemintolargerclusterswithmore pointsbasedontheirsimilarity,untilallpointsbelong toasinglecluster.

4) DBSCAN:Itisadensity-basedclusteringalgorithmthat groupsdatapointstogetherthatarecloselypackedand separatedfromotherclusters,basedonauser-defined minimumnumberofpointsandamaximumdistance betweenthem.

The process of analyzing and clustering data involves varioustechniquesthatcanassistinidentifyingpatternsand structureswithinthedata.Onesuchtechniqueisscalingthe datatonormalizeandstandardizeittoensurethatdifferent featuresorvariablesarecomparableandeasiertointerpret.

Scaled dataset was used to implement these four types of clustering algorithms, i.e. Agglomerative, DBSCAN, Hierarchy,andK-means.Theeffectivenessofthedifferent clustersformedusingthesealgorithmswasevaluatedand checked against two metrics, which were inertia and silhouette.

Inertiameasuresthesumofsquareddistancesbetweeneach pointanditsassignedcentroidinthecluster.Alowerinertia valueindicatesthattheclustersaremoretightlypackedand well-separated, which is a desirable outcome. Silhouette score measures how well each data point fits into its assigned cluster, by comparing the distance between the pointandotherpointsinitsowncluster(cohesion)tothe distance between the point and points in the nearest neighboringcluster(separation).

Ahighsilhouettescore(closerto1)indicateswell-separated clusters, while a low score (closer to -1) indicates poorly separatedclusters.

Bycomparingtheresultsoftheclusteringalgorithmsagainst thesemetrics,itwasdeterminedwhichalgorithmproduced themostoptimalandaccurateclusters.

We defined hyperparameter search dictionaries for these clustering algorithms. The parameters for each algorithm wasspecifiedwithrangesofpossiblevalues.Additionally,a dictionarycontainingalistoffeatureswascreatedtousein thegridsearch.

5. OUTPUT

For each combination of model and hyperparameters, clusteringhasbeenperformedandtheresultsarerecorded. We compare these models on the basis of the silhouette score. Fig.1showsthetopK-meanssilhouettescoreswith maximumscoreof0.256forming2clustershavingcountsof 598and328. SimilarlyFig.2,Fig.3andFig.4showsthetop scores for Hierarchical, Agglomerative and DBSCAN clustering models respectively along with their cluster counts.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page437
Fig 1:TopK-meansSilhouettescore
e-ISSN:
Fig 2:TopHierarchicalSilhouettescore
International Research Journal of Engineering and Technology (IRJET)
2395-0056
p-ISSN:
2395-0072

5 1 Critical clustering features

In order to determine which aspects of the clustering approach were the most important, we constructed a RandomForestClassifiermodel.

We used hyperparameters like Gini index and entropy to identifythekeyfeaturesthatdrivetheformationofdistinct clustersinaclusteringalgorithm.

Fig. 5 shows that the most effective feature while using agglomerative clustering is ‘Equity_fund_style_Growth’ followed by ‘Standard_Deviation’ and ‘Category_Equity’. Likewise, Fig. 6, Fig. 7 and Fig. 8 show the most effective featuresintheHierarchical,K-meansandDBSCANmethods respectively.Itisclearfromtheobservationsthat‘Category’ columns play a major role in almost all the clustering algorithmstodividethemutualfundsintoclusters.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page438
Fig 2:TopAgglomerativeSilhouettescore
Average Maximum Agglomerative 0.209734 0.209734 K-means 0.246271 0.256181 Hierarchy 0.528714 0.591561 DBSCAN 0.743419 0.743419
Fig 4:TopDBSCANSilhouettescore Table 1: Silhouettescoresacrossdifferentalgorithms Fig 5: MosteffectiveparametersofAgglomerative

6. CONCLUSION

In this study, we successfully implemented various clustering algorithms, including k-means, DBSCAN, Hierarchical,andAgglomerative,toeffectivelyclustermutual funds. We evaluated the performance of these algorithms using parameters such as Silhouette score and Inertia, allowing for a comprehensive comparative analysis to identify the optimal method for clustering mutual funds. Additionally,weemployedtheRandomForestalgorithmto determinethemostinfluentialfeaturesthatcontributedto theclusteringresults.Thisinsightfulanalysisrevealedthe order of importance of the features in the mutual fund clustering process, providing valuable insights for future researchandinvestmentdecision-making.

7. FUTURE SCOPE

Developing an efficient clustering model to analyze and categorize users into distinct clusters based on the similaritiesfoundintheirdatapointswillbethenextstep The suggested methods must be further analyzed to determinewhichamongstthemgivesthebestresultonthe givendataset.Thebestclusteringalgorithmcaneffectively group users together based on shared features or characteristicswithinagivenfeaturespace.Onceusersare assigned to their respective clusters, a personalized and effective recommendation can be generated based on the cluster to which the user belongs. Importantly, this recommendation is tailored while taking into careful consideration the unique constraints and limitations that apply to each user, ensuring that it aligns with their

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page439
Fig 6: MosteffectiveparametersofHierarchy Fig 7: MosteffectiveparametersofK-means Fig 8: MosteffectiveparametersofDBSCAN

preferences,requirements,andotherrelevantfactors.This approachensuresthattherecommendationsprovidedare highlyrelevantandvaluable,providinguserswithasuperior experience while accommodating their specific needs and constraints.

Furthermore,asophisticatedrecommendationsystemcan be built that takes into account individual investor characteristics such as investment horizon, risk profile, investment type, minimum investment and so on to recommendthebestpossiblemutualfundschemestothat particularinvestorthatcanaidnoviceaswellasexperienced investorsinchoosingthebestschemetoinvestinoutofthe thousandsavailabletoday.

REFERENCES

[1] Aayush Shah, Aayushi Joshi, Dhanvi Sheth, Miti Shah, Prof, Pramila M Chawan, “Mutual fund recommendation system with personalized explanations”, published in International Research Journal of Engineering and TechnologyVolume9Issue11,November2022

[2] Pei-Ying Hsu, Chiao-Ting Chen, Chin Chou & Szu-Hao Huang,“Explainablemutualfundrecommendationsystem developed based on knowledge graph embeddings”, publishedinAppliedIntelligenceVolume52Issue9on1st July2022

[3] Li Zhanga, Han Zhanga, SuMin Hao, “An equity fund recommendationsystembycombingtransferlearningand theutilityfunctionoftheprospecttheory”,publishedinthe Journal of finance and data science on Volume 4, Issue 4, December2018

[4]Chae-eunPar,Dong-seokLee,Sung-hyunNam,Soon-kak Kwon, “Implementation of FundRecommendationSystem UsingMachineLearning”publishedinJournalofmultimedia informationsystem,Sept30,2021

[5]PremSankarCa,R.Vidyarajb,K.SatheeshKumarb,“Trust based stock recommendation system - a social network analysisapproach”,publishedinInternationalConferenceon InformationandCommunicationTechnologies-ICICT2014

[6]NusratRouf,MajidBashirMalik,TasleemArif,Sparsh Sharma,SaurabhSingh,SatyabrataAichandHee-CheolKi, “Stock Market Prediction Using Machine Learning Techniques: A Decade Survey on Methodologies, Recent Developments,andFutureDirections“publishedinMDPI, Nov8,2021

[7]NghiaChu,BinhDao,NgaPham,HuyNguyen,HienTran “Predicting Performances of Mutual Funds using Deep LearningandEnsembleTechniques“publishedinarXiv.org SchoolofStatisticalFinance,CornellUniversityarchive,Sept 18,2022

[8] K. Pendaraki, Grigorios Beligiannis, A. Lappa, “Mutual fundpredictionmodelsusingartificialneuralnetworksand geneticprogramming”

[9] Krist Papadopoulos “Predicting Mutual Fund RedemptionswithCollaborativeFiltering”

[10] Yi-ChingChoua, Chiao-TingChen, Szu, HaoHuang, “Modeling behavior sequence for personalized fund recommendationwithgraphicaldeepcollaborativefiltering” publishedinExpertSystemswithApplicationsVolume192, April15,2022

[11]GiridharMaji,DebomitaMondal,NilanjanDey,Narayan C.Debnath,SoumyaSen,“Stockpredictionandmutualfund portfolio management using curve fitting techniques” publishedinJournalofAmbientIntelligenceandHumanized Computing,Jan2,2021

BIOGRAPHIES

Aayush N Shah, B. Tech Student, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India.

Aayushi Joshi, B. Tech Student, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India.

Dhanvi Sheth, B. Tech Student, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India.

Miti Shah, B.TechStudent,Dept.ofComputerEngineering andIT,VJTICollege,Mumbai,Maharashtra,India.

Prof. Pramila M. Chawan,isworking as an Associate ProfessorintheComputerEngineeringDepartmentofVJTI, Mumbai.ShehasdoneherB.E.(ComputerEngineering)and M.E.(Computer Engineering) from VJTI College of Engineering, Mumbai University. She has 30 years of teaching experienceandhas guided 85+M. Tech. projects and130+B.Tech. projects.Shehaspublished148papersin the International Journals, 20 papers in the National/InternationalConferences/Symposiums.Shehas worked as an Organizing Committee member for 25 International Conferences and 5 AICTE/MHRD sponsored Workshops/STTPs/FDPs. She has participated in 17 National/InternationalConferences.WorkedasConsulting Editor on – JEECER, JETR, JETMS, Technology Today, JAM&AEREngg.Today,TheTech.WorldEditor–Journalsof ADRReviewer-IJEF,Inderscience.ShehasworkedasNBA Coordinator of the Computer Engineering Department of VJTIfor5years.ShehadwrittenaproposalunderTEQIP-Iin

June2004for‘CreatingCentralComputingFacilityatVJTI’. Rs.EightCroreweresanctionedbytheWorldBankunder TEQIP-Ionthisproposal.CentralComputingFacilitywasset upatVJTIthroughthisfundwhichhasplayedakeyrolein

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page440

improvingtheteachinglearningprocessatVJTI.Awardedby SIESRPwithInnovative&DedicatedEducationalistAward Specialization: Computer Engineering & I.T. in 2020 AD Scientific Index Ranking (World Scientist and University Ranking2022) – 2ndRank-BestScientist,VJTIComputer Science domain 1138th Rank- Best Scientist, Computer Science,India.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 04 | Apr 2023 www.irjet.net p-ISSN:2395-0072 © 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page441

Turn static files into dynamic content formats.

Create a flipbook