International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072
![]()
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072
Siddhi Divekar1, Gunashree Attarde2, Adesh Chavan3, Ankita Dahiphale4, Prof. Rahul Patil5
1,2,3,4Students, Dept. of computer engineering, Pimpri Chinchwad College of engineering, Pune, Maharashtra, India. 5Professor, Dept. of computer engineering, Pimpri Chinchwad College of engineering, Pune, Maharashtra, India. ***
Abstract Due to the spread of covid many people lost their jobs so to earn their lives they started with small occupations. But these occupations are still unknown and are not able to earn profits. So as a helping hand to these people we have come up with an ecommerce website which will help them earn profits and get real review from the customers which will help them improve in their sectors. For earning the profits, we are about to build a recommendation system by analysing the best sales of a product using the Boyer Moore Voting Algorithm. The analyses of the product will also be shown using data visualization using the Power BI Software. We will be using the various algorithms like the SVM, LinearSVC, Naïve Bayes, etc for detecting whether the provided review is real or. Fake
Key Words: Data stream mining, Power BI, Recommendation, Review, SVM, Naïve Bayes and Ecommerce.
Due to the pandemic situation many people started their ownsmall scalebusiness.Weareprovidingane commerce platformforthesesmall scaleentrepreneurswhichwould help them to sell their products and get product Reviews fromthecustomers.ThereviewsRecommendationwillbe basedontwotypes:
1. Thereviewsfromthecustomerswillbeinthestreaming form which will be then converted into data visualization and further will help in the product recommendationsystem.
2. Thesupplementaryoccupationsfromwhichtheycan alsobuysupplementaryproductswiththeactual productpurchasedwillhelpintheoccupation recommendationsystem.
3. Wewillalsotakecareofthereviewsubmittedarenot fakebyapplyingthevariousfalsereviewalgorithms
[1] Manyofourregularactivitieshavebeenaffectedbythe Internet's fast expansion. Ecommerce is one of the fastest growingareas.Customerscanpostevaluations aboute commerceservicesingeneral.Thesereviews might be utilized as a source of data. Companies, for example,canuseittodevelopgoodsorservices,while
potentialcustomerscanuseittodeterminewhetherto buyoruseaproduct.Unfortunately,somepeoplehave tried to generate false reviews in order to boost the popularityoftheproductortodiscreditit.Thegoalof thisstudyistousethelanguageandratingproperties of a review to detect fraudulent product reviews. In summary,thesuggestedsystem(ICF++)wouldassess the honesty of a review, the trustworthiness of the reviewers,andtheproduct'sdependability.Textmining and opinion mining techniques will be used to determineareview'shonestyvalue.Theresultsofthe experimentdemonstratethatthesuggestedsystemhas a higher accuracythan the iterative computation framework(ICF)method'soutcome.
[2] Fakereviewdetectionhasgottena lotofattentionin recent years. Both the business and research communities are paying attention to this issue. For Detecting reviews that represent actual user experiencesandopinionsFakereviewsareasignificant issue. The benefits of supervised learning are numerous.Oneoftheprimarymethodstoresolvingthe issueObtainingbrandedbogustrainingreviews,onthe other hand, is challenging. because it is extremely difficult, if not impossible, to properly identify fakes manualexaminationsVariousformsofdatahavebeen utilized in previous studies. Training reviews that aren'tentirelytrue.Thefauxfalseevaluationscreated with the Amazon Mechanical Turk (AMT) crowdsourcing tool are maybe the most intriguing. Usingsimplywordn gramcharacteristics,reportedan accuracyof89.6%usingAMTcreatedbogusreviews. Thislevelofprecisionisbothshockingandpromising. TheAMTproducedreviews,albeitfalse,arenotactual bogus reviews on an e commerce website. The Turkers are unlikely to be in the same psychological conditionastheauthorsofactualbogusreviewswho have enterprises to promote or downgrade other productswhileproducingsuchevaluations.Thisnotion is supported by our research. Following that, it's reasonabletocomparefakereviewdetectionaccuracies onpseudo AMTdatawithreal lifedatatodetermineif various states of mindmay lead to different writings and,asaresult,differentclassificationaccuracies.We undertake a complete set ofclassification tests using just n gram features for actual review data, using all filteredandnon fakereviewsfromYelp.com.Although the accuracy of false review identification on Yelp's
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2799
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072
real lifedataisjust68.8%,thisaccuracysuggeststhat n gram characteristics are definitely useful. The information theoreticmeasure KL divergence and its asymmetricattributearethenusedtoofferanoveland principled technique for determining the precise differencebetweenthetwotypesofreviewdata.This exposessomefascinatingpsycholinguisticphenomena concerningfalsereviewers,bothforcedandnatural.We offer a new set of behavioral characteristics about reviewers and their reviews for learning to enhance classification on real life Yelp review data, which substantiallyimprovestheclassificationresultonreal lifeopinionspamdata.
[3] Power BI has completely changed the business data visualisation,intelligence,andanalyticsworlds.Power BI is a web based application that enables users to search for data, convert it, visualise it, and share the reportsanddashboardstheycreatewithotherusersin the same or different departments/organizations, as wellasthegeneralpublic.AsofFebruary2017,Power BI was used by over 200,000 businesses in 205 countries.PowerBIhasemergedasaviablecompetitor for use as a business intelligence tool in small and medium businesses, thanks to a free version that includessufficientfeaturesandcapabilities.PowerBI's Quick Insights feature (Michael Hart, 2017) is a new tool built on a growing collection of powerful logical algorithms. After upload dataset to PowerBI, a single click may activate this function, which generates a numberofreportsbasedonthedata'sanalysiswithout the need for human interaction. This also aids in reducinghumanmistakesincomputations,statistical procedures, which may result to research that isn't verified. PowerBI is simple to use as a platform for Research Data Analysis, visualizations and accepting even Excel files as input. The goal of this article is to demonstratehowquicklyPowerBIcanturnadataset of research data into a collection of reports and dashboardsthatcanbesimplyshared.
[4] The ability to store, gather, and manipulate data has greatly increased as technology has advanced. Data analysis has grown more crucial as the amount of informationanditscomplexitygrowsatarapidpace. The purpose of this article is to suggest to the user goodsthataremorelikelytobepurchased.Thispaper initiallydiscussesseveralrecommendationapproaches and research on recommendation systems, before proposing a better strategy for a successful recommendationsystemandexplainingtheoutcomes of that approach. On a transactional dataset, is combinationsofthek meansclusteringmethodandthe apriori algorithm is used to provide a better recommendationlist.
[5] The quantity and influence of online reviews grows because of the growth in the significance of internet
worldwide. Comments, reviews and feedback about servicesareveryimportantfortheitemsandservice providers because they influence the consumers and frequently are the most convenient method for the customertodecideiftheycanbuyaparticularproduct ornot.Reviewscanhaveapositiveaswellasnegative impact. And hence, trusting reviews blindly is not advisable because they involves risk both for the customers and sellers. Some selling organizations sometimesofferincentivestopeoplewhopostpositive reviewsandfeedbacksfortheirparticularserviceson theotherhandothersmaypaytosomepeopletowrite negativereviewsfortheircompetitorproductservice providers. Thus, providing a bad influence over the consumers and deflecting their decision of buying a productornot.Suchfalsereviewsarecalledasspam reviewsandareverycommoninonlineE Commerce systems. Moreover, consumers must also be careful while going through the reviews and selecting an particular product or service to make the decision based on reviews. In this article, we explain how the suggestedsystemaidsinthedetectionandremovalof falsereviews,withafocusondataminingtechniques utilizing the "J48 Algorithm," as well as the system's performance
[6] User input in the form of app ratings and reviews is becoming increasingly common in app stores. Researchers and, more recently, tool providers have provided analytics and data mining solutions to developersanaanalystsforeg,toassistreleasechoices. Positive feedback, according to research, boosts app downloadsandrevenue,andthereforeit’ssuccess.Asa result, a market for pho bogus, incentivized app evaluations arose, with yet to be determined ramifications for developers, app users and owners.This study investigates false reviews, their sources,characteristics,andthedegreetowhichthey may be identified automatically. To understand their tactics and services, we ran disguised questionnaires with 43 bogus review providers and analyzed their reviewrules.Wediscoveredsubstantialdiscrepancies betweenthe matchingapplications,reviewers, rating distribution,andfrequencybycomparing60thousands bogus reviews with 62 millions review from the App Store.Thispromptedthecreationofasimpleclassifier that can automatically detect fraudulent app store reviews.Ourclassifierhasarecallof91percentandan AUC/ROC value of 98 percent on a labelled and unbalanceddatasetwithone tenthoffalsereviews,as documentedinotherareas.Ourfindingsarediscussed, aswellastheirimplicationsforsoftwareengineering, appconsumers,andappstoreowners.
[7] Theimportanceofinternetevaluationsonbusinesses hasrisendramaticallyinrecentyears,andtheyarenow criticalindeterminingbusinessperformanceinawide range of industries, from restaurants to hotels to e
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072
commerce. Unfortunately, some individuals utilize unethicalmethodstoboosttheirinternetimage,such as creating false reviews of their own companies or competitors. Fake review detection has already been studied in a variety of sectors, including product and companyevaluationsinrestaurantsandhotels.Despite its economic importance, however, the consumer electronicsindustryhasyettobeproperlyinvestigated. Thispaperpresentsafeatureframeworkforidentifying fraudulent reviews in the consumer electronics area, whichhasbeentested.Thefourpartcontributionisas follows a)creatingadatabasewithfourdifferentcities forconsumerelectronicsdomaininordertoclassifythe fake reviews. b) identify a feature framework for detection of false reviews. c) on the proposed framework development of classification method. d) analyse the output for each cities. The Ada Boost classifierhasbeenprovedtobethebestbystatistical methods according to the Friedman test, with an F scoreof82percentontheclassificationjob.
[8] Inthisfieldofstudy,twotypesofdatasetsaretypically used: pseudo fake and real life evaluations. When comparedtopseudofakereviews,literatureshowsthat classification models perform poorly in real world datasets. Following our analysis we discovered that behavioral and contextual factors are crucial for detectingfraudulentreviews.Inparticular,weutilized animportantbehavioralaspectofreviewersknownas "reviewer deviation." Our research focuses on the relationship between reviewer deviance and other environmentalandbehavioralfactors.Therelevanceof a certain feature set for a classification algorithm to detect fraudulent reviews was empirically demonstrated. We rated features in a chosen feature set,andreviewerdeviationcameineighth.Wescaled thedatasettotestthefeasibilityoftheselectedfeature setandfoundthatscalingthedatasetcanincreaseboth recallandaccuracy.Acontextualfeatureinourchosen feature set captures text similarity between a reviewer's reviews. For calculating text similarity of reviews, we used the NNC, LTC, and BM25 term weighting methods. BM25 outperformed other word weightingschemes,accordingtoourfindings.
The main motive of our system is to get recommendation
andtoidentifywhetherareviewistrueorfake.
So,toachievethefirstmotivethatistherecommendationwe will be generating a goggle form which will take feedback fromthecustomersrelatedtothepurchasedproducts.The dataintheformwillthenbeconvertedintoanexcelsheet whichwillbeaninputtothePowerBisoftwarewhichwill give us a clear data visualization of the products sales. FurtherthisProductsalesdatawillbegivenasaninputto the streaming algorithm (Boyer Moore voting Streaming algorithm)afterthedatapreprocessingwhichwillhelpusin analyzing the best sales ofa product which can help the small scaleentrepreneurstoanalyzetheirprofitsandloss.
Thenextmotiveistoletthesmall scaleentrepreneursknow whetherthereviewprovidedthroughthegooglefeedback form are true or fake. We will be using various machine learningalgorithmslikethenaïvebayes,SVM,randomforest whichwillustoclassifywhetherthereviewistrueorfake.
Parametersonwhichthereviewwillbeclassifiedare: Timespanofthereview
Technicaltermsinthereview
Ratings
VerifythePurchase
Inspectingtheuserprofile
CustomerJacking
3.2
Boyer Moore voting Streaming algorithm:
The Boyer Moore voting method is one of the most often used optimum algorithms for determining the majority elementamongelementswithmorethanN/2occurrences.
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Usingthisalgorithm,wewillgetthebestsalesproductfor therecommendationpurposeastheo/p.
TimeComplexity=O(N)SpaceComplexity=O(1)
The gaussian Naïve Bayes is a type of the Naïve Bayes algorithm which acts in accordance with the Gaussian normal distribution. It also contributes to the continuous data.
LinearSVC:
Thisclassifierdividesdataintogroupsbyofferingthebest suitedhyperplane.
SVM:
Various investigations have revealed If you employ SVC's defaultkernel,theRadialBasisFunction(RBF)kernel,you're likely using a nonlinear decision boundary, which will greatlyoutperformalineardecisionboundaryinthecaseof thedataset.
Random Forest: This approach, which is supplied by the sklearn package, has also been used for classification by buildingnumerousdecisiontreessetrandomlyonasample oftrainingdata.
Afterapplyingalloftheseclassifiers,theaccuraciesofeach arecompared,andtheiraccuracyfordetectingfalsereviews isevaluated.
WewillbeusingITERATIVEMODEL.Becausetheiterative methodology starts with a modest implementation of a limited set of software requirements and repeatedly improves the evolving versions until the entire system is built and ready for deployment. The Iterative and Incrementalmodelisdepictedinthefigurebelow.
3.5 UML
Fig 3.3 Model
Fig 3.4 UsecaseDiagram
Fig 3.2 FlowDiagram
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2802
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056
Volume: 09 Issue: 05 | May 2022 www.irjet.net p ISSN: 2395 0072
Fig 4.1 Streamingdatafromtheform Fig 4.2 Successfullyupdatedtimestampofstreamingdata Fig 4.3 Datavisualizationoftheproductsales
We presented an overview of our ecommerce website whichwill help the people earn profits for the similar occupation recommendation of the searched product and also get a true review of their sales so that these reviews help them to improvise in their field. In future scope the website can also be used for the marketing the advertisementoftheproductstoearnmoreprofits.
WeexpressourheartfeltgratitudetoProf.RahulPatil,our Project Guide, for his encouragement and support throughout our Project, particularly for the helpful ideas madeduringtheProjectandforlayingthegroundworkfor ourwork'saccomplishment.
We'dalsowanttoexpressoursinceregratitudetoProf.Dr. S. V. Shinde, our Research & Innovation coordinator, and Prof. S. R. Vispute, our Project Coordinator, for their help, real support, and guidance from the beginning of the seminaruntiltheend.We'dliketoexpressourgratitudeto Prof. Dr. K. Rajeswari, Head of the Computer Engineering Department,forherunflinchingsupportduringtheseminar.
[1]https://www.researchgate.net/publication/303499094_F ake_Review_Detection_From_a_Product_Review_Using_Modif ied_Method_of_Iterative_Computation_Framework
[2]http://www2.cs.uh.edu/~arjun/papers/UIC CS TR yelp spam.pdf
[3]http://ir.inflibnet.ac.in:8080/ir/bitstream/1944/2116/1 /2
[4]ApplicationofDataMiningtoE Commerce RecommendationSystems
[5]https://www.ijsr.net/archive/v7i10/ART20191163.pd f
[6]https://ir.inflibnet.ac.in/bitstream/1944/2116/1/24.p df
[7]https://www.youtube.com/watch?v=AGrl H87pRU