International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
![]()
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
Saumya Singh1, Soumyadeepta Das2 , Ananya Sajwan3, Ishanika Singh4, Ashish Alok5
1,2,3,4,5Vellore Institute of Technology, Tamil Nadu, India ***
Abstract - Consumers face a major challenge today in choosing from the many alternatives available to any product category. With the growing demand and production of cars, there are thousands of great brands which make hundreds of new models every year. The need for proper recommendation of the cars based on customer’s specific needs is essential and acts integral to both the manufacturers as well as clients. Recommendation based on specifications and details, are becoming somewhat unrealistic and not everyone has the detailed knowledge of cars. Clients prefer to have a review-based recommendation from the people who have hands on experience with the particular product. We can see how recommendation systems have a surprisingly large impact on the materials consumers engage with over the course of their daily lives. Hence, our proposed solution is to provide a system for car recommendation based on customer reviews, using the powerofML,NLPandDataAnalytics.
Key Words: Machine Learning, Natural Language Processing, Data Analytics, Data Visualization, Recommendation, Recommendation System
As the world's population grows, so does the value of the product in the market. Due to global distribution, there is an increase in global trade leading to different types of products e.g., buy soap, there are different types depending on the flavours, aroma, brand (international) etc.italsoappliestocars.
As the number of cars on the international market increases, so does the information that each individual gains from an online product. People’s madness with cars dates back to the Neolithic period, the last part of the StoneAgewhenthemakingofawheelwasmade.Astime goes by, technology grows and stands on what we see today.Todaymostpeopleareaware ofwhatishappening around them. As market competition grows, cars with similarfeaturesenterthemarket.Peoplewill beconfused about what to choose. Here the recommendation algorithm plays a role because it assists the customer or enduserinpromotingtherightproductbasedonitstaste.
This research project is about a web-based program for vehicles. The existing systems that rely on recommendation based on specifications and details, are becoming somewhat unrealistic and not everyone has the
detailed knowledge of cars. Different customers use differentstrategies,somearemoreknowledgeable,andup to date whereas some need advice, reviews from peers, andadvice.Ourgoalistobringthebestoutofbothworlds, be it specific search, or recommendation, be it seller claims, or reviews. Existing solutions require the users to have proper knowledge of specification and their requirement and, mostly are based on the content type recommendation.Usingthecustomerreviews,oursystem becomes more customer oriented as well as less knowledgedrivenbutisnotconfinedtothat.
Themainpurposeofthisfunctionistorecommendthecar according to the user model and object profile. In this paper, a proposed algorithm to recommend hybrid-based user-to-user and interactive filtering techniques is used based on actual textual data from the end users or clients oftheproduct,usingtheML,NLPandDataAnalytics.
The dataset for the project is Consumer Car Reviews dataset from Edmunds.com (also available on Kaggle.com), as well as other sources, which contains lakhs of reviews from multiple brands given by consumers.
1. T. G. Thomas 1, V. Vaidehi , worked on “Vehicle Recommendation System Design The Web uses a Hybrid Recommender Algorithm” developed a web based complimentary program for vehicles. The main purpose of this function is to recommend the car accordingtotheusermodel andobjectprofile.Inthis paper, the proposed hybrid recommendation algorithm is used from user-to-user and interactive filtering methods aimed at generating vehicle recommendations. The user model is built with personality features, click data and browsing history. The profile of the item is built using various car attributes. Forty car types are used including 224 car typesinthisproject.
2. Srivastava, A. Kumar, S. Samee, P. Thokal Vijay, P. S. Tanesh, worked on “Vehicle Recommendation Method Using Vengatesan K Machine Learning Algorithm” studied and found that more than 90% of planned drivers regularly show that eliminating any driving pollution can control their chances and adaptability. Competitivedrivershaveexertedpressureonthelow-
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
key concept of open-vehicle integration. This weight conveystheideathatitisactuallyimprovinginterms ofhowhalfofthelaterespondentshadeliminatedany driving distractions that they felt were an open car that,orinsomeway,lacked.
3. G. Prabowo, Md. Nasrun, R. A. Nugrahaeni worked on “Recommendations for the Combined Filtering Program (CF)”, proposed a program that can help provide information about vehicles that are in line with user preferences, i.e., a recommendation system. The recommendation system requires appropriate recommendations In this study you will focus on the problem of recommending a car selection system by creating a recommendation system using a collaborativefilteringprocess.
Recommendation systems are an important part of businessande-commerceincludingtheautoindustry.The currently available car recommender systems depend heavily upon users having proper knowledge of cars and theirspecifications.
Most of the systems use a content-based recommender approach which is based on the user’s history and suggestsitemssimilartotheirpastpurchases.Thistypeof system is extremely disadvantageous as users could be first-time buyers and may not have proper knowledge on carspecifications.
These drawbacks have been eliminated with our application which uses a collaborative filtering approach i.e., “people to people” approach. Collaborative filtering methodologyisappliedto
filter products that might interest a particular user depending on reactions received by similar users. This method is more suited to recommend cars as it is less knowledge-driven and more customer-oriented, keeping inmindthatnotallcustomersareinformedonspecificsof cars.
Challenges: One of the challenges faced was cleaning the user reviews available in the datasets. This was done by incorporating different methods of natural language processinglikelemmatization,removal ofstop wordsand punctuations. The datasets were combined onto a single data frame and cleaned by additionally removing unnecessarycolumns.
Another challenge included integrating the machine learning notebook with the Flask application to create a fullyfunctionalproduct.Thechallengeherewastoensure that the recommendation was fast, accurate and efficient. This involved multiple testing scenarios and removing
bugs which improved the accuracy and the efficiency of therecommendersystem.
•Python(asprogramminglanguageversion>=3.0) •Pythoncompiler •CondaorJupyterenvironment(usedfordevelopment) •Flask •WebBrowser
The Recommendation system for the car proposed involvesdevelopmentprocesseslikeData Collection,Data Pre-processing, Model Design, Model Building, Recommendation. The system architecture involves Python as the main programming language, for the recommendationmodel.
The architecture involves the dataset collected from Edmunds Car Reviews dataset, which consists of the details of the reviewer, the car reviewed, the text review and the rating given by the customer for the car. The dataset is pre-processed which involves processes like stop word removal, lemmatization, removal of common words, etc. After pre-processing, the processed dataset is fed to Topic Modelling model using Non-Negative Matrix factorization after, count vectorization and inverse document factorization, after this stage the topics are extracted and distributed based on the dataset and the reviews on each car. To optimal number of topics in the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
dataset to avoid overfitting and high variance decisions, we check the coherence values vs number of topics, the number with the highest coherence value is taken as the number of topics. Upon deciding the number of optimal topics, the topics and the output is fed to the Latent DirichletAllocation(LDA)model,toscoreeachrecordand find out the underlying relationship of a particular query with the review of the car, as a result of which a specific topicnumberisbeallocatedtoeachcar,whichisusedfor recommendation based on whatever query the user provides.
Theapplicationallowsbothquantitativesearchesbased oncartypelikeSUV,sedanandalsoqualitativesearchlike howauserdescribestheirrequirementstoanexpert,who inreturnusestheirexpertisetorecommendtheircars. Thisapplicationandrecommendationmodelprovidesthe capabilityforboth. Stages: 1.DataCollection 2.DataPre-processing 3.ModelBuilding: a.TopicModellingusingNMF b.Selectingoptimalnumberoftopics c.LDAmodeltraining d.Assigntopicscorestocarsandmappingbackto dataset 4.Recommendation
The usability of the dataset is marked 7.1-7.5, which is decent, and means it has low null and ill formatted data. Hence, as the source provides a huge data, along with propercolumnsandfieldsandhashighusabilityscore,we decided to work on this dataset. Consumer Car Reviews Dataset contains a huge database of car reviews from varyingbrands.
Contains7columnsnamely:
•Key,
•ReviewDate,
•AuthorName,
•VehicleTitle,
•ReviewTitle,
•Review,Rating.
The models will be trained based on the Date, Vehicle Name,ReviewTitle,mainlytheReviewandRating.
b) Data Pre-processing: The datasets are combined and loadedontoasingledataframeandthenweonlyconsider dataafter2010duetorelevancy.Thedatasetiscleanedby removing unnecessary columns and then we are left with themaindata–Vehicle,ReviewandRatingcolumns.
DataPre-processingtechniqueslike:
•StopWordRemoval
•Lemmatization
•Removalofcommonbasicwords
Oneimportantinferenceduringpre-processingstagewas, carswithhigherratinghadreviewswithlessertextreview length.Evenaftercleaningandpre-processingthedataset, itwasholdingtrue,andisusedformodelbuilding.
Fig -1:ArchitectureDiagram
a) Dataset: The dataset used for the project is Consumer CarReviewsdatasetfromEdmunds.com(alsoavailableon Kaggle.com), which contains lakhs of reviews from multiple brands given by consumers. Contains datasets pertaining to different companies, which could be used separately for each company or combined to be used as wholegeneraldatabase.
Fig -2:PreprocessingInference(ratingsinversely proportionaltoreviewlength)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
c) Topic Modelling: The processed data is used for training the topic modelling model, topic modelling is done using the NonNegative Matrix Factorization (NMF) model from sklearn, which takes the input matrix and outputstwomatrices
• The W factor contains the document membership weights relative to each of the k topics. Each row corresponds to a single document, and each column correspondtoatopic.
•TheHfactorcontainsthetermweightsrelativetoeachof thek topics.Inthiscase,eachrowcorrespondstoa topic, andeachcolumncorrespondstoa uniqueterminthecorpusvocabulary.
Topics Distributed and top words for each distributed topicisshown.
d) Latent Dirichlet Allocation (LDA) model fitting: This helps us find the underlying relationships in the distributedtopicssothateverytextpertainingtoeachcar can be scored and each car can be assigned a topic. The topic having highest value for a particular car is assigned. And based of thi LDA model the users query is compared and the top topics for the query are scored and from that topicthetop-ratedcarsarerecommended.
The below plot shows the distribution of topics and the mostsalienttermsanditsfrequencydistribution.
Fig -3:Topicsfromtopicmodelling,withtopwordsfor eachtopic
After this stage, optimal number of topics to be keep for actual recommendation is found out from all the topics distributed. This is done to keep the recommendations from overlapping too much due to redundant topics. This isdonebyplottingcoherencevaluesagainstthenumberof topics.
Fig -5:LDAtopicdistributionsandworddistributionsfor eachtopic
WordCountofeachtopic’stopwordsfromtheLDAmodel andtheirImportanceinthattopic(Fig.6&Fig.7): Fig -6
Fig -7:Visualizationshowingdistributionoftopics assigned
Fig -4:coherencevaluesvs.numberoftopics
Fromtheplotthenumberofoptimaltopicsisdetermined tobe8.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
Fig -8:Visualizationoftopicdistributed(8optimaltopics spatialdistribution)
e) Recommendation: After the LDA model assigns each recordwithaspecifictopic,theusersqueryisscoredwith theLDAmodeltofindouttherelationsdefinedintheuser query and find those topics which reflect the query. Then from the top topics discover the top-rated cars are recommendedbacktotheusers.Forspecifictypesofcars such as SUV and sedan or hatchback, we extracted the types of cars from their model names and appended the categories for class wise recommendation. Allowing a holistic recommendation ability based on quantitative featureslikecategory,type,etcaswellasqualitativebased ontheusers’abstractrequirements.
Pre-processing inference:
From the visualization, it can be inferred that the ratings increase with decrease in review length, for the population
Fig -10:LDAtopicdistributionsandworddistributionsfor eachtopic
Post topic modelling: choosingoptimalnumberoftopics, from the plot the number of optimal topics is determined tobe8.
-9
(ratingsinverselyproportionaltoreviewlength)
The distribution shows the topics are evenly distributed and explains a different section, showing an efficient model.
Fig -11:CoherenceValuesvs.No.ofTopics
Topic Distributions, results: Intertopic distances show that the topics are properly spaced out and each topic explainsadifferentcategoryofautomobileandfeatures.
Fig -12:Visualizationoftopicdistributed(8optimaltopics spatialdistribution)
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
We successfully developed a flask application that that textualqueryasinput,analysesit,runsthetopicmodelling and linear discriminant analysis model, and recommends top 10 cars accordingly. The recommender system uses not only quantitative inputs but also qualitative inputs whichincludesreviewsfromothercustomers.
Anotherfeaturethatisusedisthatintheflaskappitself,if the type of automobile is specified such as sedan or hatchback, then the query filters out those cars from the startgivingbetterresults
Generally, the car recommendation systems are content based, but we are trying to implement collaborative recommendation system. This will help us recommend cars using textual queries as well, which are based on the reviews given by actual customers. The existing systems are mostly either search on basis of specification or content-basedrecommendation.
The proposed system could be act both as specific search toolorrecommenderaswellasanexpertcaradvisorora community-basedsuggestion,butinthedigitalform.
As the global market rises and demand for new products in the Indian economy is leading to the arrival of new models. All foreign car manufacturers see the Indian market as their growth point in their share of the global automotive economy. As the world progresses to the climax of a new era, recommendations become an inevitable reality. Almost all technological and nontechnical items in modern hands raise their hands in compliments.Themainfactthattherecommendationsare extremelyfocusedonthenewtechnologyisbecauseofits accuracy,precision,andreliability.
Fig -13:Thewebapplicationinterfacerecommendingtop 10carsforSUVtype,enteredbyuser
The recommendation provides a personal preference for user needs. In the proposed method, which is a combinationofuser-to-userandobjecttoacollaboratively based object to recommend filtering an algorithm that works well for suggesting. The biggest problem with car databases is that they are dynamic data because it is difficult to predict the car model that will be released in their product. In addition, the performance of the proposed system can be improved by using a real-time network that allows you to build websites and access session information. This research activity can be expanded as information-based complimentary programs using a variety of information presentations. Expert recommendations using a professional program can also beconsideredusingknowledgebases.
[1] Datasetsources,Edmunds.com/Kaggle.com, (https://www.kaggle.com/ankkur13/edmundsconsu mer-car-ratings-and-reviews)
Fig -14:Recommendingtop10carsfor“stylishand affordable”textquery,enteredbyuser
[2] Q. Zhang, J. Wang and K. Fan, "Research on passenger car recommendation based on comments mining of Internet," 2017 12th IEEE Conference on Industrial Electronics and Applications (ICIEA), 2017, pp. 122127,doi:10.1109/ICIEA.2017.8282826.
[3] G. Prabowol, M. Nasrun and R. A. Nugrahaeni, "Recommendations for Car Selection System Using
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 10 | Oct 2022 www.irjet.net p-ISSN: 2395-0072
Item-Based Collaborative Filtering (CF)," 2019 IEEE International Conference on Signals and Systems (ICSigSys), 2019, pp. 116-119, doi: 10.1109/ICSIGSYS.2019.8811083.
[4] Shrey Talati, Anukrity, Priyanka Salian and Anam Hussain. Article: Recommendation System for Automobile Purchasing: A Survey. IJCA Proceedings on National Conference on Advancements in Computer & Information Technology NCACIT 2016(6):23-27,May2016.
[5] Vengatesan K, A. Srivastava, A. Kumar, S. Samee, P. T. Vijay, P. S. Tanesh, “A Novel Approach of Car RecommendationUsingMachineLearningAlgorithm”, (https://www.ijstr.org/final-print/jun2020/A-NovelApproach Of-Car-Recommendation-Using-MachineLearning-Algorithm.pdf)