Recommendation System using Machine Learning Techniques by IRJET Journal

Recommendation System using Machine Learning Techniques

Shailesh D. Kalkar1 , Prof. Pramila M. Chawan

1M. Tech Student, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India

2Associate Professor, Dept. of Computer Engineering and IT, VJTI College, Mumbai, Maharashtra, India ***

Abstract - The goal of a recommendation system is to predict user interests and infer their mental processes. Based on the user's demands and while taking into account their interests, this system can give them the informationtheyneed. A more thorough analysis of the data is required to provide better recommendations. Numerousrecommendationsystems have been developed using diverse methodologies. As OTT platforms, shopping, travel, andother websitesproliferate and strive to quickly improve their user suggestions, the research into such systems has gained popularity uptothispoint. Inthis paper, we have implemented movies recommendation system using machine learning techniques. We have studied and compared different recommendation models and using the best model we have implemented the moviesrecommendation system for recommending movies to the user. Machine learning is used in the movies recommendation system because it gives an entity the potential to learn artificially without explicit programming.

Key Words: Recommendation System, Machine Learning, Movies, Recommendation models, Content filtering, Collaborative filtering

1. INTRODUCTION

Systems for making recommendations are widely utilized today in everything from entertainment to retail applications. Additionally, the data needed for these applicationsisgrowingeverydayasaresultoftheinternet's widespread accessibility. Therefore, there is room for improvementandaneedtoofferbettersuggestionsthatcan effectively handle enormous amounts of data. Our recommendationengineformoviesisprimarilybuiltusing machine learning, cosine similarity metrics, and contentbased filtering approaches. Based on the user's past behaviour or explicit feedback, content-based filtering techniques employs movie features to suggest additional filmsthatarecomparabletotheuser'sfavorites.Twovideos canbeviewedastwovectorsinmdimensionaluserspacein cosine similarity. The cosine of the angle between the vectors is used to calculate how similar they are to one another. In our system, machine learning is employed to createrecommendationmodelsandtoretrieveinformation. An entity can learn artificially through machine learning withoutexplicitprogramming.

We know that in the content filtering[9] we recommend movies to the user based on the movie details like title,

actors etc. and also based on the user’s past history. Recommendationsystemwhicharepurelybasedoncontent filtering have certain drawbacks like there isn't enough variety or novelty, Scalability is difficult etc. And in the collaborative filtering[1], It compiles the user ratings for serviceslikeitems,movies,etc.,findspatternsamongusers based on their ratings, and generates fresh recommendations for the user based on inter-user comparisons. Recommendation systems which are purely basedonthecollaborativefilteringhavecertaindrawbacks like cold start problem, hard to include side features for serviceslikeitem,moviesetc.Thesideelementsformovie recommendations may include a user's country or age. Includingavailablesidefeaturesraisesthemodel'scalibre. Using a machine learning technique in the movies recommendationwillsurelyhelptoimprovetheefficiencyof therecommendationsystem.WehaveusedTmdbdatasetfor the movies recommendation. In this paper, we have first studiedthedatasetproperlyandthendonetheexploratory data analysisonthedatasettorecognizethepatternsand understanditproperly.Thenwehavedonepreprocessingof the dataset. Creating different machine learning based recommendation models using content and collaborative filtering. Then we have done training and testing of these models.Thenusingthebestrecommendationmodelwehave createdtheRestAPIandthencreatedtherecommendation systemGUIforthemovies.ThenintheGUIweneedtoenter moviesdetailsliketitleetc.andthenwewillrecommending similarmoviestotheuser.

2. LITERATURE REVIEW

Therearevariousrecommendationapproacheslikecontent, collaborative filtering, demographic etc. We can use those recommendation approaches along with various machine learningtechniquesforimprovingrecommendationforthe movies.Therearedifferentmethodsandtechniquesinthe machinelearningwhichcanbeusedintherecommendation system for improving recommendations for the user. Differentrecommendationapproacheshasadvantagesand disadvantages that could impact the precision and effectivenessofasystem.

MViswaMurali,VishnuTG,Et.al[1],inthispapertheyhave createdacollaborativefiltering-basedrecommendersystem fornewtrendsinanyresearchfieldhasbeendevelopedin this study. The three main building blocks for the recommender system that is here proposed are datasets, predictionratingsbasedonusers,andcosinesimilarity.The

7.529 | ISO 9001:2008 Certified Journal | Page1038

Factor value:

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022 www.irjet.net p-ISSN: 2395-0072

quantityofaccurateratingssubmittedbyuserswilldecide howaccuratelytheyarerated.Cosinesimilarityisthenused toorderthefindings.

RamniHarbirSingh,SargamMauryaEt.al[2],inthispaper they have created a movie recommendation using cosine similarity and KNN. This study outlines a method that provides users with generalized suggestions based on the popularityand/orgenreofafilm.Theimplementationofthe Content-BasedRecommenderSysteminvolvesseveraldeep learningtechniques.Thisstudyalsoprovidesaglimpseinto thedifficultiesthatcontent-basedrecommendationsystems encounter,alongwithoureffortstoaddressthem.

ShivgangaGavhane,JayeshPatilEt.al[3],inthispaperthey have created a recommendation system using KNN and cosinesimilarity,theauthors ofthisresearchdeveloped a recommendation system. They have worked on machine learning based technology that helps to comprehend requirementsandprovidesrecommendationsfortheuser's chosenproduct.Inthisresearch,differentmachinelearning algorithms are compared for the suggestion of different product purchase patterns by users and provides more accuratesearchresult.

Shubham Pawar, Pritesh Patne Et.al[4], in this paper the authorshavecreatedarecommendationsystemusingcosine similarity.Thealgorithmnotonlyoffersrecommendations butalsodetailsaboutthemovieyousearchedfor.Therating ofthefilm,itspremieredate,cast,andgenresareamongthe supplementary information. The system also offers more detailsaboutthecast.Thesystemalsoconductssentiment analysisonthemoviereviews,categorizingthemintotwo categories,"Good"and"Bad,"toaidtheuserinsavingtime whenreadingreviews.

Chen Et al[5],in this paper the CCAM (co-clustering with augmentedmatrices)hasbeenusedbytheauthorsofthis papertodevelopavarietyoftechniques,includingheuristic scoring,conventional classification,machinelearning,and theincorporationofcontent-basedhybridrecommendation systemsinconjunctionwithcollaborativefilteringmodels,to buildarecommendationsystem.

ZhouEt.al[6],FortheNetflixPrize,theauthorsofthisstudy havecreatedthecollaborativefiltering-basedALSAlgorithm.

ALS works to address the scalability problem of large datasets. This work created a movie recommendation systemforpredictinguserratingsusingtheALSalgorithm. Thissystemcannotdisplayslightlybetterresultssincethe Restricted Boltzmann Machine (RBM) has not been improved.

Tiantian He Et.al[7], They have put out a graph clustering methodology that uses contextual correlation to identify groupsinagraphthatexhibitmultiviewvertexproperties. Themethodologiesusedbeforethismodel,however,were focused on the features of a single view and ignored the

contextual link between features. To fulfil the task of clustering in the multiview featured graph, their solution combines graph clustering and multiview learning. The model'sunsupervisedlearningfoundationmeansthatitdoes notallowvertexembeddingforattributedgraphs,whichisa featureofsupervisedlearningmodels.

Zhiheng Wu Et.al[8], They suggested integrating the recommendation system with user reputation. Using informationabouttheusers'interestsorusertypes,online recommendationsystemsmakerecommendationstousers. However, recommendation systems occasionally push particular goods or services without confirming their reputation. Suggest removing the skewed user ratings by examining the user's historical rating history and user credibility.Toidentifybiasedconsumers,onemightemploy algorithms like the cumulative sum method algorithm. recommendedusingcollaborativefiltering.Theirapproach gives genuine users greater reputation value while giving fraudulent users less. Therefore, their system is unable to distinguishbetweendistinctusersifthisvalueisthesame forbothtypesofusers.

3. PROPOSED SYSTEM

3.1 Problem Statement

3.2 Problem Elaboration

Todayrecommendationsystemisusedinvariousdomains and the major challenge is to provide better recommendation of services to the user by using huge amountdatapresentwithintheapplication.Inthisproject, we are doing the comparative study of machine learning techniques which are implemented using collaborative filteringandcontentfiltering.Aftercomparison,whichever isthebesttechniqueamongstthemwillbeusedinbuilding the movies recommendation system. Also, the purpose of using machine learning in recommendation system is to create the model for prediction of movies etc. in the recommendation system instead of doing it programmaticallyexplicitlyeachtime.

3.3 Proposed Methodology

Followingourresearchandliteraturereview,wefoundthat systemsthatwereonlybasedoncontentandcollaborative filtering had a number of disadvantages. Therefore, we combined these filtering approaches with a number of machine learning algorithms, including ALS (Alternating LeastSquare),SVD(SingleValuedDecomposition),KNN(KNearest Neighbor), Co-clustering, and cosine similarity, to improve this recommendation system. After comparing these strategies, we will decide which is best and can be

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1039

“Toimplementthemoviesrecommendationsystemusing machinelearningtechniques”

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022 www.irjet.net p-ISSN: 2395-0072

employedmovingforwardtoconstructtheultimatemovie recommendationsystem.Inordertoobtaininformationon the movies, we used the TMDB dataset. During implementation,weintendtousethefollowingcategoriesof machinelearningtechniques(oralgorithms):

1) ALS(AlternatingLeastSquare)

Collaborative filtering uses the alternating least squares (ALS) algorithm, which is a very well-liked technique. A matrixfactorizationapproachcalledALSrecommendation makes use of Alternating Least Squares with WeightedLamda-Regularization (ALS-WR). It executes the ALS algorithminparallelandfactorstheusertoitemmatrixA intotheusertofeaturematrixUandtheitemtofeature matrixM.Inordertoreducetheleastsquaresdifference betweenanticipatedandactualratings,theALSalgorithm seekstoidentifythelatentcomponentsthatbestexplain theobservedusertoitemevaluations.

2) KNN(K-NearestNeighbors)

Onecategorizationtechniquethatmakestheassumption that comparable entities reside nearby is KNN. This algorithmplacesthenewcaseinthecategorythatmatches theavailablecategories the most byassumingsimilarity betweenthenewcase/dataandexistingcases.Inorderto classifyanewdatapointbasedonsimilarity,itstoresallof theexistingdata.

3) SVD(SingularValueDecomposition)

Inthefieldsofdatascienceandmachinelearning,Singular ValueDecomposition(SVD),atraditionallinearalgebraic approach, is becoming more and more well-liked. This popularityresultsfromitsuseincreatingrecommender systems. Many online user-centric apps, such video players, music players, e-commerce applications, etc., suggestadditionalcontentforuserstointeractwith.Itcan be difficult to find and suggest numerous acceptable productsthatuserswilllikeandchoose.SVDisoneofthe variousstrategiesthatareemployedforthisgoal.

4) Co-clustering

Co-clustering focusses on grouping by similar rows and columns while focusing on both the row and column dimensions. The key distinction from the standard K meansalgorithmisthattherowclustercentroidandthe columnclustercentroidarecalculatedfromtheco-cluster centroid.

5) CosineSimilarity

Cosinesimilarityisametricusedinavarietyofmachine learningtechniques,includingtheKNNforcalculatingthe distancebetweenneighbors,recommendationsystemsfor suggesting comparable movies, and textual data for determining the similarity of text in a document.

Applicationslikedataminingandinformationretrievaluse machinelearningandcosinesimilarity.

4. IMPLEMENTATION

4.1 Data Collection

TheTMDBdatasetprovidesinformationabouteverymovie. The Movie Database is a collectively created film and television database (TMDB). Since 2008, the community's amazingindividualshaveuploadedeverypieceofdata.The vastdatasetandsignificantfocusonforeignmarketsoffered byTMDBaremostlyunmatched.Thesizeofallthesefilesin thedatasetisaround900MB.Thiscollectionofdataincludes anumberofmovie-relatedfiles,including:-

(1)movies.csv-Thisfilecontainsdataon45500filmsthat areincludedintheentiremovielensdataset.Adult,Budget, Genres,Homepage,Id,Imdbid,OriginalLanguage,Original Title, Overview, Popularity, Poster Path, Production Companies, Production Countries, Release Date, Revenue, Runtime,SpokenLanguages,Status,Tagline,VoteAverage, andVoteCountarethecolumnsinthisfile.

(2)credits.csv-Thisfilecontainsthecolumnslikecast,crew andid.Thisfilehasaround45500rows.Itcontainsdetailsof castandcrewforallthemovies.Itisavailableintheformof astringifiedJSONobject.

(3) keywords.csv- It includes the MovieLens movie plot keywordsforthefilms.ItisaccessibleasastringifiedJSON object.Thisfilehasaround46,000rows.Itcontainscolumns likeid,keywordsetc.

(4) links.csv- This file contains all of the Full MovieLens dataset's movie TMDB and IMDB IDs. This file consists of around 45,800 entries. It contains the columns like:Movieid,Imdbid,Tmdbidetc.

(5)Links_small.csv-ItcontainstheTMDBandIMDBIDsfora smallsubsetoftheFullDataset's9,000movies.Itcontains the same columns as present in the links.csv file. (6) Rating_small.csv-Itcontainstheportionof100,000reviews left by 700 individuals for 9,000 films. This file contains around1,00,000entries.Itcontainsthecolumnslike:Userid, movieid,rating,timestampetc.

4.2 Preprocessing and creating machine learning models

Theadditionaldatathatispresentintheexistingdatasethas to be cleaned up and preprocessed. We will deal with duplicate, invalid, and null values in the dataset. This is necessary to transform the raw data into a more comprehensible,practical,andeffectivemanner.

Afterthedatasetwaspreprocessed,wedevelopedanumber of machine learning-based recommendation models,

Certified Journal | Page1040

IRJET | Impact Factor value: 7.529 | ISO 9001:2008

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022 www.irjet.net p-ISSN: 2395-0072

includingALS(AlternatingLeastSquare),SVD(SingleValue Decomposition), Co-clustering, and KNN (K-Nearest Neighbor),whichwereimplementedusingthecollaborative filtering approach, and a cosine similarity-based recommendation model that was implemented using the contentfilteringapproach.Thebestmodel isthenused in the ultimate movies recommendation system when a comparativeanalysisofthemodelsiscompleted.

4.3 Training and Testing

We must train and test the model when it has been generated.Thedatasethasbeendividedinhalf,80:20.20% ofthedatasetisusedtotestthemodel,whiletheremaining 80%isutilizedtotrainthemodel.Forevaluatingthemodels, wehaveusedmetricslikeRMSE(RootMeanSquareError) andMAE(MeanAbsoluteError).RootMeanSquareErroris a statistic that reveals how far, on average, a model's projected values and observed values differ from one another. Mean Absolute error in the context of machine learning refers to the size of the discrepancy between the forecastofanobservationanditsactualvalue.Belowisthe workflowdiagramfortheproject:-

4. RESULTS

Thissectioncontainsadiscussionoftheoutcomesfromour experimentation and implementation of various machine learning-basedrecommendationalgorithms.Wehaveused two metrics namely RMSE (Root Mean Square Error) and MAE (Mean Square Error) for evaluating various recommendationmodels.

Table -1: Comparisonofrecommendationmodels

Recommendation Models RMSE MSE

Collaborative+SVD(Singular ValueDecomposition) 0.8675 0.6729

Collaborative+K-Nearest Neighbors 0.9552 0.7257

Collaborative+Co-Clustering 0.9544 0.7249

Collaborative+ALS (AlternatingLeastSquare) 0.8219 0.7632

Content+CosineSimilarity 0.7481 0.6316

Intheabovetable,themodelwhichhavelowvalueofRMSE and MSE is considered as best model as it is having less error. So, themodel usingcontent andcosine similarity is bestascomparedtotheothermodels.Forcreatingmovies recommendationGUI,wehaveusedpythonFlaskasIDEand created the API for the best machine learning based recommendation model. Below is the screenshot for recommendingmoviestotheuserusingcosinesimilarity.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022 www.irjet.net p-ISSN: 2395-0072

5. CONCLUSION

Inthispaper,wehaveimplementedvariousrecommendation models using content and collaborative filtering based on different machine learning techniques to improve the user recommendation in the movies recommendation system. After studying, comparing and experimenting various recommendationmodelwehaverealizedthatmodelbased on content filtering and cosine similarity was better as comparedtotheothermodels.Usingthebestmodel,wehave created API for it using pythonflask and using it in movies recommendationGUI.Finally,wearerecommendingsimilar movies to the user. In order to boost user satisfaction, our suggested solution would enable the system to make a recommendationtotheuserthatismoreaccurate.

Infuturewecanimplementrecommendationsystemwhich canworkonrealtimeinformationofusers.Also,wecantryto implementcrossdomainrecommendationsysteminfuture.

REFERENCES

[1]MViswaMurali,VishnuTG,NancyVictor,”ACollaborative Filtering based Recommender System for Suggesting New Trends in Any Domain of Research”,2019, (ICACCS),DOI:10.1109/ICACCS.2019.8728409

[2] Ramni Harbir Singh, Sargam Maurya, Tanisha Tripathi, Tushar Narula, Gaurav Srivastav,” Movie Recommendation SystemusingCosineSimilarityandKNN”,2020,((IJEAT),DOI: 10.35940/ijeat.E9666.069520

[3] Shivganga Gavhane,Jayesh Patil,Harshal Kadwe,Projwal Thackrey,SushovanManna,“RecommendationSystemusing KNNandCosineSimilarity”,2020,

[4]ShubhamPawar,PriteshPatne,PriyaRatanghayra,Simran Dadhich, Shree Jaswal, "Movies Recommendation System usingCosineSimilarity",(IJISRT),Volume7,Issue4,April–2022,342-346,April2022.

[5] Y. C Chen, “User behavior analysis and commodity recommendationforpointearningapps,”In2016Conference on Technologies and Applications of Artificial Intelligence (TAAI).IEEE,2016.

[6]Y.HZhou,D.Wilkinson,R.Schreiber,“Largescaleparallel collaborativefilteringfortheNetflixprize,”InProceedingsof 4th International Conference on Algorithmic Aspects in Information and Management (pp. 337–348). Shanghai: Springer,2008

[7]TiantianHe,YangLiu,TobeyH.Ko,KeithC.C.Chan,and Yew-SoonOng“ContextualCorrelationPreservingMultiview FeaturedGraphClustering”,(2019),(IEEEtransactions)

[8] Zhiheng Wu,Jinglin Li,Qibo Sun,Ao Zhou,“Service recommendation with context-aware user reputation evaluation”,(2017),(IEEEconf)

[9]KhamaelRaqimRaheem;IsraaHadiAli,“Content-based Recommender System Improvement using Hybrid Technique”,(2020)(IEEEXplore)

[10] Shailesh Kalkar, Prof. Pramila Chawan, “A Survey on Recommendation System based on Knowledge Graphand MachineLearning”,(2022)(IRJET),Volume:09Issue:06|Jun 2022

[11]A.A.Ewees,MohamedEisa,M.M.Refaat,“Comparisonof cosine similarity and k-NN for automated essays scoring”,(2014),(IJARCCE),DOI10.17148/IJARCCE

BIOGRAPHIES

Shailesh D. Kalkar MTechComputerEngineering, VJTIMumbai

Prof. Pramila M. Chawan, isworking as an Associate Professorin the Computer EngineeringDepartment of VJTI, Mumbai. Shehas done her B.E. (Computer Engg.) and M.E. (ComputerEngg.)fromVJTICollege ofEngineering,MumbaiUniversity.

She has 28 years of teaching experienceandhasguided85+ M.Tech projects and 130+ B.Tech.projects. She has published 143papers in the InternationalJournals, 20 papers in theNational/International Conferences/ Symposiums.ShehasworkedasanOrganizingCommittee member for 25International Conferences and 5AICTE/MHRDsponsoredWorkshops/STTPs/FDPs.She hasparticipatedin16National/InternationalConferences. Worked as Consulting Editor on –JEECER, JETR, JETMS, TechnologyToday,JAM&AEREngg.Today,TheTech.World Editor–JournalsofADRReviewer-IJEF,Inderscience.She has worked asNBA Coordinator of the ComputerEngineeringDepartmentofVJTIfor5years.

ShehadwrittenaproposalunderTEQIP-IinJune2004for ‘CreatingCentralComputingFacilityatVJTI’.Rs.EightCrore weresanctionedbytheWorldBankunderTEQIP-Ionthis proposal. Central ComputingFacility was set up at VJTI throughthisfundwhichhasplayedakeyroleinimproving the teachinglearning process at VJTI.Awarded by SIESRP

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 09 | Sep 2022 www.irjet.net p-ISSN: 2395-0072

withInnovative; Dedicated Educationalist Award Specialization: Computer Engineering; I.T. in 2020AD Scientific Index Ranking (World Scientist and University Ranking 2022) –2nd Rank- Best Scientist, VJTI Computer Science domain1138th Rank- Best Scientist, Computer Science,India.

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

Page1043