International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 03 | Mar -2017
p-ISSN: 2395-0072
www.irjet.net
Evaluating and Enhancing Efficiency of Recommendation System using Big Data Analytics Archit Verma1, Dharmendra Kumar2 1M.Tech
Student , Computer Science and Engineering at United College of Engineering and Research, Allahabad, Uttar Pradesh, India,Email:architv11@gmail.com 2 Associate Professor, Computer Science Department at United College of Engineering and Research ,Allahabad, Uttar Pradesh, India, Email: kumar.dharmendra@rediffmail.com
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Recommendation System helps purchaser in
finding out the most favorable item he should purchase out of a large number of items, by predicting rating of items that are not yet rated by him. It helps user by showing list of items that the user may like to buy, based on user's past purchases. Large amount of information on the Internet, e-commerce sites like Amazon , ebay and social media like twitter, linked etc ,are the cause for making recommendation system, as it filters useful information from huge data sets. Recommender System is of two types, Content based that recommends items on basis of the features of item, and Collaborative Filtering based that recommends items based on the user's social environment. In this paper we will discuss collaborative filtering which is further classified 1.) User's based, that recommends items on basis of user similarity,2.)Item based, that recommends items on basis of item similarity, and 3.)Matrix factorization based (Alternating least squares (ALS)).Due to large amount of number of users and items, recommendation system experiences scalability problem. To handle this scalability problem we use Big Data Analytics. Big data refers to datasets that are not only big with high volume, but also high in variety and velocity, known as (3V) of big data, which makes them difficult to handle using traditional tools and techniques. Hadoop is one of the answers for this problem of Big Data. Apache Mahout is a machine learning tool that provides scalable machine learning algorithm for collaborative filtering on Hadoop environment, it also provides non-hadoop implementation. Mahout provides non-hadoop implementation of user based and item based collaborative filtering and hadoop implementation of item based collaborative filtering. Apache Spark is a fast and general purpose cluster computing technology, designed for fast computation. MLlib is Apache Spark's scalable machine learning library. MLlib provides implementation of Matrix factorization based on ALS algorithm for collaborative filtering. HBase is a distributed column-oriented NoSQL database built on top of the Hadoop file system,we are using HBase in storing rating data inputted by user by web forms in JSP(Java Server Pages). Apache Phoenix is the fastest way to access HBase data,it provides SQL interface to this NoSQL data. For analysis purpose we will use movieLens dataset. We will evaluate recommendation algorithms of Mahout on different similarity measures along with ALS of spark MLlib based on MAE, RMSE, Precision and Time taken, finally we use © 2017, IRJET
|
Impact Factor value: 5.181
|
the best algorithms based on our analysis in terms of accuracy and time taken in implementing recommendation systems. Key Words: Recommendation System, Collaborative Filtering, User Based ,Item Based, Matrix Factorization, Alternating Least Squares, BigData, MAE, RMSE, Precision,Recall,F-Score, Hadoop, Mahout , Spark , MLlib, HBase, Apache Phoenix ,Pig, and Crontab.
1. INTRODUCTION Recommender systems is a information filtering system that predicts the ‘rating’ that user would give to an item or social element they had not yet been considered or rated, using a model built from the characteristics of an item known as content-based approaches or the user's social environment known is collaborative filtering approaches. Recommendation systems are used by e-commerce (like Amazon, eBay), social media (like Facebook, Twitter, LinkedIn) and (Pendora Radio). Amazon utilizes item-based collaborative filtering approach in recommendation. These systems use collaborative filtering for predicting rating of items that are not purchased by a particular user from large amounts of data and provides narrow suggestions to that user. LinkedIn makes substantial use of item-based collaborative filtering. As an example, each member's profile on LinkedIn has a "People Who Viewed This Profile Also Viewed" recommendation module. Due to this large data sets recommendation system experiences scalability problem so there is a big need of using Big Data analytics. Big data is a term that refers to data sets or combinations of data sets whose volume, variability and velocity make them difficult to be captured, managed, processed or analyzed by conventional technologies and tools, such as relational databases and desktop statistics or visualization packages, within the time necessary to make them useful.[3] V’s of Big Data[2],[3] Volume: Volume refers to the vast amount of data generated every second. We are not talking terabytes, but zettabytes or brontobytes of data. Velocity: The term velocity refers to the speed of generation of data or how fast the data is generated and processed. ISO 9001:2008 Certified Journal
|
Page 1518