International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 02 | Feb -2017
www.irjet.net
e-ISSN: 2395 -0056 p-ISSN: 2395-0072
Big Data Processing with Hadoop : A Review Gayathri Ravichandran Student, Department of Computer Science , M.S Ramaiah Institute of Technology, Bangalore, India ---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract – We live in an era where data is being generated
includes those stored in the company database, or those obtained from social media and other third party sources. When data is processed and analyzed, one can draw valuable relationships between various attributes that can improve the quality of decision making. Statistics and industrial knowledge can be combined to obtain useful insights
by everything around us. The rate of data generation is so alarming, that it has engendered a pressing need to implement easy and cost-effective data storage and retrieval mechanisms. Furthermore, big data needs to be analyzed for insights and attribute relationships, which can lead to better decisionmaking and efficient business strategies. In this paper, we will describe a formal definition of Big Data and look into its industrial applications. Further, we will understand how traditional mechanisms prove inadequate for data processing due to the sheer volume, velocity and variety of big data. We will then look into the Hadoop Architecture and its underlying functionalities. This will include delineations on the HDFS and MapReduce Framework. We will then review the Hadoop Ecosystem, and explain each component in detail.
2. New Products and Services Analyzing big data helps the organization to understand how customers perceive their products and services. This aids in developing new products that are concurrent with customer needs and demands. In addition, it also facilitates redeveloping of currently existing products to suit customer requirements. 3. Smart cities
Key Words: Big Data, Hadoop, MapReduce, Hadoop Components, HDFS
Population increase begets demand. To help cities deal with the consequences of rapid expansion, big data is being used for the benefit of the citizens and the environment. For example, the city of Portland, Oregon adopted a mechanism for optimizing traffic signals in response to high congestion. This not only reduced traffic jams in the city, but was also significant in eliminating 157,000 metric tons of carbon dioxide emissions.
1. INTRODUCTION 1.1 Big Data: Definition Big data is a collection of large datasets- structured, unstructured or semi-structured that is being generated from multiple sources at an alarming rate. Key enablers for the growth of big data are – increasing storage capacities, increasing processing power and availability of data. It is thus important to develop mechanisms for easy storage and retrieval. Some of the fields that come under the umbrella of big data are - stock exchange data ( includes buying and selling decisions), social media data ( Facebook and Twitter), power grid data ( contains information about the power consumed by each node in a power station) and search engine data ( Google). Structured data may include relational databases like MySQL. Unstructured data may include text files in .doc, .pdf formats as well as media files.
4. Risk Analysis Risk is defined as the probability of injury or loss. Risk management is a very crucial process which is often overlooked. Frequent analysis of the data will help mitigate potential risks. Predictive analysis aids the organization to keep up to date with recent technologies, services and products. It also identifies the risks involved, and how they can be mitigated. 5. Miscellaneous
1.2 Benefits of Big Data
Big data also aids Media, Government, Technology, Scientific Research and Healthcare in making crucial decisions and predictions. For example, Google Flu Trends (GFT) provided estimates of influenza activity for more than 25 countries. It made accurate predictions about flu activity.
Analysis of big data helps in improving business trends, finding innovative solutions, customer profiling and in sentimental analysis. It also helps in identifying the root causes for failures and re-evaluating risk portfolios. In addition, it also personalizes customer and interaction.
1.3 Challenges of Big Data
1. Valuable Insights
1. Volume
Valuable insights can be derived from big datasets by employing proper tools and methodologies. This data
Data is being generated at an alarming rate. The sheer volume of data being generated makes the issue of data
Š 2017, IRJET
|
Impact Factor value: 5.181
|
ISO 9001:2008 Certified Journal
|
Page 448