Movie Recommendation System Using Machine Learning by IRJET Journal

Movie Recommendation System Using Machine Learning

Divya

Gupta1, Bhumika Singhal2, Shruti Mishra3, Shruti Mittal4, Priyanka Agarwal5

1,2,3,4B.Tech Scholars Department of Computer Science and Engineering MIET Meerut, UP, India 5Professor, Department of Computer Science and Engineering MIET Meerut, UP, India

Abstract - Inordertoimprovethe userexperience,this study creates a system that suggests movies depending on user interests. As digital information becomes more widely available, enhancing user engagement requires an effective recommendation system. This study uses the k-Nearest Neighbors (KNN) algorithm, which is a popular method in recommendation systems because of its efficiency in handling sparse data. The aim of this research is to develop a precise andeffective modelthat forecasts user preferences by utilizing past ratings and film attributes. To enhance model performance like data preprocessing procedures like data cleaning, normalization, and feature extraction are applied to the dataset. Similarity measures like cosine similarity and Euclidean distance are applied to compute relationships between users and movies, enhancing recommendation accuracy. The findings demonstrate that the KNN-based system outperforms conventional heuristic-based methods in providing extremely relevant recommendations. The results align with past research on collaborative and content-based filtering techniques. Future work may focus on integrating deep learning techniques to address cold-start problems and furtherimproverecommendationquality.

Keywords: Movie recommendation system, k-Nearest Neighbors (KNN), machine learning, collaborative filtering,content-basedfiltering,personalization.

1 .INTRODUCTION

Since online content continues to grow exponentially, streaming services like Netflix, Amazon Prime, and Disney+ needtodealwithhugevolumesofdataandengageusers.So manyoptionsmakeuserstiredfromlookingforrightmovies, creating decision fatigue [1]. As a result, we require an efficient movie recommendation system to enhance customer satisfaction by presenting personalized recommendations that are aligned with user interests. Conventional methodsof browsingare nolongerviable, and recommendation algorithms are now the pillars of contemporarycontentdeliverysystems.

Theobjectiveofthisresearchistoinvestigateandcreatean effective movie recommendation system that is capable of predicting user preferences. Through the application of machinelearningmethods,thisresearchexploreshowsuch models can enhance the personalization of movie recommendations. One of the questions explored in this research is the identification of the best machine learning model for movie recommendation. Although there are a

number of approaches, such as collaborative filtering, content-based filtering, and hybrid models, the current study emphasizes the k- Nearest Neighbors (KNN) algorithm because it is efficient in processing sparse data and flexible in dynamic settings [3]. Further, the study compares various measures of similarity like cosine similarity and Euclidean distance, to assess their effect on recommendationaccuracy.

Another key area of this research is examining how the proposed system enhances current recommendation models. A lot of classical methods are prone to issues suchascold-startproblems,wherenewusersormovies lack sufficient data to make proper recommendations [2],[6]. Scalabilityandcomputational overheadarealso areasofconcern in handlinglargevolumes of data. This research intends to improve the process of making recommendations by handling feature engineering and similaritycomputation.

By examining various machine learning models and enhancing current frameworks, the research makes a positivecontributiontowardstheadvancementofmovie recommendation systems. The results will assist in enhancing the personalization of streaming services, makingiteasierandmoreefficientforusersto discover movies.

2. LITERATURE REVIEW

Over the past few years, film recommendation systems have become a major area of interest in research and development. Different research studies have proposed differentstrategiesthatfocusonimprovingtheaccuracy of recommendations and providing a better user experience.Methodslikecollaborativefiltering,contentbased filtering, and hybrid approaches have been extensively used, to handle problems within recommendation technologies [1], [2], [5], [6]. These techniques keep evolving, helping to make recommendationsystemsmoreefficienttoday.

2.1 Content Based Filtering

Content-based filtering recommends movies by analyzing item attributes such as genre, director, and cast, aligning them with user preferences as shown in Figure1.Itwashighlighted howcontent- basedfiltering effectively personalizes recommendations but struggles

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

withthecold-startproblem,limitingitseffectivenessfor newuserswithnopriorinteractions[1],[6].Tomitigate this,researchershaveexploredenhancementsusingNLP andmetadataaugmentationtechniques[7].

2.2 Collaborative Filtering

Collaborative filtering helps in analyzing user preferencesbasedonsimilarusers’behaviorasshownin Figure 2. It was showed that collaborative filtering models, especially matrix factorization methods like Singular Value Decomposition (SVD), can effectively identify user patterns but encounter difficulties with scalabilityanddatasparsity[5].Theirresearchindicated that embedding deep learning-based collaborative filtering approaches, such as neural collaborative filtering (NCF), can enhance recommendation diversity andaccuracy[8].

2. CollaborativeFiltering

2.3

Hybrid Models

Hybridmodelscombinecontent-basedandcollaborative filtering techniques as shown in Figure 3. A weighted hybrid model was proposed that integrates contentbasedKNNwithaRestrictedBoltzmannMachine(RBM), improving accuracy by leveraging both user interaction data and movie attributes [5]. Their findings suggested that hybrid approaches enhance both recommendation

diversity and personalization, making them more effectivethanstandalonefilteringtechniques[8].

2.3 K-Nearest Neighbour Algorithm

Additionally, KNN-based collaborative filtering has been widely explored to refine recommendation quality. An adaptiveKNN-basedmodel incorporatingusercognition parameters was introduced, which dynamically adjusts neighbor selection based on behavioral patterns [3]. Theirworkshowedenhancedprecisionandrecall,which indicates the strength of incorporating social network analysisincollaborativefiltering.

In general, these works highlight the importance of sophisticated models that strike a balance between accuracy, diversity, and scalability in recommendation systems.Thecombinationofdeeplearning,graph-based methods, and reinforcement learning offers promising avenues for improving movie recommendations, overcoming current challenges like data sparsity and over-specialization.

3. PROPOSED METHODOLOGY

Themethodologyoutlinesthesystematicapproachused to develop, implement, and evaluate the movie recommendation system. This research follows a structured workflow, including data collection, preprocessing, model implementation, evaluation, and resultinterpretation.

3.1 Data Collection

For this study, we chose the IMDB dataset, a popular datasetforrecommendationsystems.Thedata offersan abundantsetofuser-movieinteractionsandistherefore perfect for evaluating content-based filtering, collaborativefiltering,andcombinedmethods[1].

ThedatasetprovideuserIDsandmovieIDs,allowingus to uniquely examine the user-movie relationship. It also gives ratings between 1 to 10 representing user

Figure 1. ContentBasedFiltering

Figure

Figure 3. HybridModels

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

preferences, and timestamps for which each of these ratingswasgiven.Additionalmoviemetadatalikegenre, title, and year of release aid in applying content- based filteringbylookingatmovieattributes.

TheMovieLensdatasetwaschosenduetoitslarge-scale user interactions, ensuring diverse and reliable data for training machine learning models [5]. Its structure allows us to explore different recommendation techniques and assess their performance in generating personalizedmoviesuggestions

3.2

Module Description

In order to improve the precision and effectiveness of the recommendation system, data preprocessing was conducted prior to model deployment. This will help ensure that the dataset is clean, organized, and in a suitable format for analysis, thereby eliminating inconsistenciesthatmayharmmodelperformance.

3.2.1 Handling Duplicate and Missing Data

One of the major problems while dealing real-world datasets is to handle missing and duplicate values. For theMovieLensdataset,entrieswithmissingratingswere excluded in order to preserve the integrity of the collaborative filtering model because missing ratings might have resulted in false user preference estimation [4].

For missing movie metadata, such as genre or release year,weappliedanimputationtechniquewheremissing valueswerefilledbasedonthemostcommonattributes of similar movies (mode method). This ensured that content-based filtering models could still make accurate recommendations without being affected by incomplete metadata.

Additionally, duplicate records were identified and removedtopreventbiased recommendationsthatcould arise from repeated entries. Removing these inconsistencies helped in maintaining the dataset's reliabilityandensuringunbiasedmodeltraining.

3.2.2 Data Normalization and Transformation

To address variations in user ratings, we normalized ratingsusingMin-Maxscaling,whichscaledallratingsto a range of 0-1 [2]. This standardized the data to avoid the rating patterns of some users (e.g., always giving high or low ratings) from having too much effect on recommendations.

Further, we extracted relevant features to improve the recommendation system. Popularity of a movie was established by the number of ratings a movie had, enabling the model to distinguish between popular and lesspopularmovies.Moreover,patternsofuserbehavior

were examined by identifying users who consistently rated movies of particular genres, assisting in the enhancementofpersonalizedrecommendations.

These preprocessing steps ensured that the dataset was optimized for training, leading to better performance andimprovedrecommendationaccuracy.

3.3 Model Selection and Implementation

To buildan efficient movie recommendation system, we implementedthreedifferentapproaches:content-based filtering, collaborative filtering using KNN, and a hybrid recommendation system. Each model has its strengths and limitations, and the hybrid approach aims to leverage the benefits of both filtering techniques.At the end, the system gives you movie suggestions, making surenottoincludethemovieyoumentionedifneeded.

3.3.1 Content-Based Filtering

Content-based filtering suggests films by comparing their properties, including genre, director, actors, and plotsummary.Thismethodmakestheassumptionthatif auserlikedaspecifickindofmovieinthepast,then the userwouldenjoysimilarfilmsinthefuture.

To process movie description and metadata, we used vectorization which transforms textual data into numerical values. This method assists in extracting significant words from movie descriptions and minimizing the impact of frequently used words. After vectorizing the data, cosine similarity is employed to calculate the similarity between movies [1]. The system then suggests movies with the most similar scores with thosetheuserhaspreviouslywatchedorratedhighly.

However, content-based filtering has a cold-start problem,meaningitstrugglesto recommendmoviesfor new users who have not rated any films yet or for movieswithverylittlemetadata.Thislimitationreduces itseffectivenesswhendealingwithnewlyaddedcontent orfirst-timeusers.

3.3.2 Collaborative Filtering using KNN

Collaborative filtering suggests on the basis of user interactions instead of movie features. Collaborative filtering discovers patterns of user preferences and makespredictionsaboutwhatauserwillenjoybasedon ratingsfromsimilarusers.WehaveemployedK-Nearest Neighbors (KNN) collaborative filtering, which operates intwomanners:

User-User Collaborative Filtering: This approach identifies similar rating histories of users and recommends movies that similar users have liked. For instance, if two users rated a number of movies in the

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

sameway,amovieenjoyedbyoneusercanbesuggested totheother[3].

Item-Item Collaborative Filtering: Rather than considering users, the method looks for similarities betweenmoviesbasedonratingsgivenbyvarioususers. If two movies have been rated alike by most users, they are marked as similar and recommendations based on thataremade[3].

3.3.3 Hybrid Recommendation System

To overcome the limitations of both content-based and collaborative filtering, we implemented a hybrid recommendationsystemthatintegratesbothtechniques. Thehybridmodeloperatesby:

Applying content-based filtering for new users who do not have a rating history. This ensures that recommendations are still provided based on movie metadataevenintheabsenceofpastinteractions.

Utilizing collaborative filtering for users with sufficient rating history, allowing the system to leverage user preferencesandbehavioralpatterns.

Combining both strategies to enhance the accuracy of recommendations, maintaining a balance between personalizationandsuggestiondiversity.

Throughcombiningseveralmethods,thehybridmethod improves recommendation efficiency, generating more precise and appropriate movie recommendations while alleviating the cold-start issue and sparse data management[5].

3.4 Experimental Setup and Training

The movie recommendation system was created with Python along with Pandas, NumPy, Scikit-Learn, and Streamlitfordataprocessing,machinelearning,anduser interface. KNN-based collaborative filtering was applied for enhanced recommendations [5]. The dataset was divided into 80% training and 20% testing to enhance modelperformance.ThesystemwastestedonWindows 10/Linux with Intel Core i7 10th Gen, 16GB RAM, and NVIDIA RTX 3060 GPU to ensure maximum processing andscalabilityforfuturedevelopments.

3.5 Evaluation Metrics

To assess the performance of the recommendation models, we utilized several measures of evaluation for accuracy and dependability. We used Root Mean Square Error (RMSE) to quantify the difference between predicted and actual ratings, where smaller RMSE measures better prediction precision. We also used Mean Absolute Error (MAE) to measure the average

absolute difference between predicted and actual ratings,indicatingthereliabilityofpredictions.

Apart from numerical accuracy, we tested the recommendation system against precision, recall, and F1- score. Precision indicated the number of relevant movies recommended, whereas recall indicated how many relevant movies were appropriately recommended. F1-score then gave a balance of both. These measures assured that the system was accurate andeffective[5],[8].

3.6.6

Result and Interpretation

The hybrid recommendation system outperformed both content-based and collaborative filtering approaches. KNN-based collaborative filtering achieved high precision for experienced users but struggled with the cold-startproblemfornewusers.Content-basedfiltering worked well for users with clear genre preferences but lacked diversity, often suggesting similar movies. The hybrid approach combined the strengths of both methods,resultinginlowerRMSEforimprovedaccuracy and higher precision, ensuring more relevant and diverserecommendations. Model Type

GraphicalRepresentationofModelPerformance

The hybrid model achieved the best results, demonstrating its effectiveness in handling both new andexperiencedusers.

This methodology provides a repeatable, structured approach for developing and evaluating movie recommendation systems. By following these steps, other researchers can validate our findings or extend themforfurtherimprovements.

4. EXPERIMENTAL RESULTS AND DISCUSSION

The performance of the movie recommendation system was measured by employing several metrics, including content-based filtering (CBF), collaborative filtering (CF), and the hybrid model. The findings highlight the effectiveness of various models in producing optimal recommendations.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 12 Issue: 05 | May 2025 www.irjet.net p-ISSN:2395-0072

4.1 Model Performance Comparision

Model

The hybrid model performs better than single approaches, with the lowest RMSE of 0.96, reflecting greater precision in user rating prediction. It also has the highest precision (0.83), recall (0.77), and F1-score (0.80),demonstrating effectivenessingenerating useful anddiverserecommendations.

4.2 Statistical Significance of Improvements

Tocheckthestatisticalsignificanceoftheenhancements in recommendation precision, we performed a paired ttest between the hybrid model and the individual filteringmodels.Thefindingswere:

 CBF vs Hybrid: p-value = 0.0021 (significant improvement).

 CBF vs Hybrid: p-value = 0.0085 (significant improvement).

As the p-values are < 0.05, the hybrid model's enhancementsarestatisticallysignificant, reflectingthat itgeneratesmorepreciseandreliablerecommendations thanindividualmodels.

Key Findings

 Collaborative filtering outperforms content-based filtering but is affected by data sparsity and coldstartproblems.

 Content-basedfilteringisappropriateforuserswho have explicit genre preferences but has limited diversity.

 The hybrid method greatly enhances accuracy by bringing together the meritsof the two models,and itisthereforethebestrecommendationsystem.

5. CONCLUSION

This research demonstrates that the hybrid recommendation method that integrates content-based filtering and collaborative filtering is more accurate and diverse than using individual models alone. The hybrid model ranked lower in RMSE and higher in precision compared to single-method approaches, resolving coldstart issues and sparsity of data. Collaborative filtering

based on KNN helped enhance recommendation quality bysuccessfullyfindingsimilaruserpreferences.

Future Improvements

 Integration of deep learning techniques, such as neuralcollaborativefiltering,toenhanceaccuracy.

 Incorporating sentiment analysis from user reviews torefinerecommendations.

 Real-time recommendation updates to adapt dynamicallybasedonuseractivity.

 OptimizingKNN-basedfilteringtohandlelarge-scale datasetsefficiently.

6. REFERENCES

[1]Yadav, R., et al. (2024). "Addressing Cold-Start Problems in Content-Based Movie RecommendationSystems."

[2]Sharma, A., et al. (2022). "Cold Start and Sparsity Handling in Recommendation Systems Using MetadataAugmentation."

[3]Nguyen, T., et al. (2023). "Adaptive KNN-Based Collaborative Filtering Using User Cognition ParametersforEnhancedRecommendations."

[4]Singh, S., et al. (2021). "Data Preprocessing Techniques for Recommender Systems: A ComparativeStudy."

[5]Behera, R., et al. (2021). "A Hybrid Movie Recommendation System Based on Content-Based and Collaborative Filtering Using Restricted BoltzmannMachines."

[6]Zhang, S., et al. (2020). "Deep Learning Based Recommender System: A Survey and New Perspectives."

[7]Wu, L., et al. (2022). "Graph Neural Networks for Recommendation:AdvancesandOpportunities."

[8]He,X.,etal.(2017)."NeuralCollaborativeFiltering: Advances in Deep Learning Recommendation Systems."

[9]Chen,M.,etal.(2019)."Top-KOff-PolicyCorrection foraREINFORCE-BasedRecommenderSystem."