Big Data – A Review

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 04 | Apr -2017

p-ISSN: 2395-0072

www.irjet.net

Big data – A Review Dipti Shikha Singh1, Garima Singh2 1 Student,

2

Computer Science Department, Babu Banarasi Das University, Lucknow, U.P, India Assistant Professor, Computer Science Department, Babu Banarasi Das University, Lucknow, U.P, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - The use of the Internet and various technologies worldwide, whether for social, personal or professional use, give rise to Big Data with an incredible speed. The Big data analysis has emerged as an important activity for many organizations. There is still a debate about the tools and traditional management frameworks are ineffective with Big Data. This document sheds light on many of these documents that help us with the idea of Big Data and new technologies that help Big Data. Also, we discuss the challenges that increase the use of large data while trying to get the right approach to get valuable information from large data stack.

data, even if the volume, speed, and various data on the storage capacity of an organization are calculated. Doug Laney defined the 3V model in 2001, characterizing Big Data with respect to the three V. The three basic characteristics of the data are large-volume, variety and velocity. Many organizations and professionals have expanded this model 3V to 4V model with a new “value” of “V”. While the extension of the model 4V to 5V is by the concept of veracity [1, 3]. • Volume: Refers to the size of the data. Along with the growth of social media, the data volume is also growing very fast. Large amount of data generated by machines and surpasses the man-made data. Therefore, the occurrence of data size is known as the large data volume.

Key Words: Hadoop, Big Data, 5V’s, Hive, Pig, etc. 1. INTRODUCTION Traditional data sources such as business data, sensor data generated automatically, social data and data from billions of devices such as mobile phones, smartphones, laptops, cameras, and pictures are a wealth of information to create. A few years ago the data were measurable in megabytes and gigabytes, while today data are measured in terabytes and petabytes. With this growing momentum to come more in the future. The current data rate is estimated with approximately 1,000,000 terabytes [1] that is 2.5 exabytes of data per day. The sources of these data vary from a variety of data sources, including sensors that transmit meteorological data, generated data from social networking sites like Facebook and Twitter, and digital content sites such as YouTube [1]. Gone are the days when the data were generated by the people and usually recorded in tabular form. Now the challenge is how to transform these unstructured data into information. Various challenges arise when using Big Data deal with an application requiring unstructured data for management and provide near realtime analysis, along with fault tolerance. In addition, you must have high storage and processing capacities. The great variety and large data set sizes are becoming impractical for tools and applications of traditional data management. Therefore, Big Data requires a new set of applications, tools, and frameworks for themselves.

• Velocity: Refers to the speed at which data is generated. In today's competitive world, decision makers want information to provide important data in a fraction of a second in real-time. Twitter Tweets, status updates / likes / shares in Facebook, etc. • Variety: refers to the different formats in which data is generated. 70% of the data generated today is in an unstructured manner. Earlier the development of Big Data, the industry did not have powerful management tools to manage unstructured data. The competition between the organizations was not only due to semi-structured data but also unstructured data like the traditional tables, flat files, relational databases and unstructured data stored as images, audio, web logs, sensor data, etc. • Value: refers to the ability of companies to analyze data and to provide a better understanding of the various key areas that include customer behavior, provide personalized services, and provide information about problems that do not access previously. Therefore, the value can be viewed as the monetary value in a company or an organization that includes a data technology.

2. BIG DATA: DEFINITION AND CHARACTERISTICS The word Big Data seems to rule the data on the basis of the size to define is not limited to a certain extent, but it is a solution to analyze the data in order to make sense and its value for valuable information to use. The massive size of these data goes beyond petabytes and exabytes of

© 2017, IRJET

|

Impact Factor value: 5.181

• Veracity: Refers to the accuracy or truth of the data. Uncertainty in the data can be caused for various reasons in the data, which may be legal questions, privacy issues, duplication, etc.

|

ISO 9001:2008 Certified Journal

| Page 822


Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.