International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395 -0056
Volume: 04 Issue: 04 | Apr -2017
p-ISSN: 2395-0072
www.irjet.net
Polyglot Persistence on Oracle Cloud using Hadoop Map Reduce Ms. Namrata Rawal1, Ms. Vatika Sharma2 1Research 2Developer,
Scholar, Network Security, GTU PG School, Ahmedabad, Gujarat, India
Hadoop Technology, I-verve Infoweb Company Ahmedabad, Gujarat, India
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - Handling Big Data means to handle huge
databases. In other words, handling of multiple data stores on multiple platforms cannot be done at a time. So, polyglot persistence came into place to handle data. It is the term that used to describe different data storage technologies to handle multiple data stores at a same time. This paper focuses on Polyglot Persistence with the map reduce on oracle cloud as it is limited in the research framework, that can not apply multiple technologies on cloud; So, I am going to apply polyglot persistence (handle multiple databases at a multiple platforms on same time) and analysis on Hadoop which is a technology that used on a distributed framework to handle bigdata. We can work on multiple nodes at time using Hadoop by creating masters and slave. In this we also analyze the performance metrics (response time and complexity).
Key Words: Datastores, polyglot persistence, map reduce, KVstore, Cygwin.
1.
INTRODUCTION
Big data is not handled by traditional data management system. So technology like Hadoop is used for managing large and distributed data. It is the open source framework which offers features like Hadoop Distributed File Structure (HDFS) and Mapreduce[1] that are reliable for bigdata. But Hadoop is not able to handle multiple datastores like (SQL and NoSQL datastores) at a onetime [3]. So term Polyglot persistence is used. Polyglot persistence means simultaneous use of SQL and NoSQL(key-value datastore, column oriented datastore, document oriented datastore, graph based datastore ) datastores which is gradually becoming popular in future applications development[4]. In this research we are applying Hadoop technology on polyglot persistence to get distributed environment and handling multiple datastores (SQL and NoSQL) at a one time on oracle cloud. Based on that creating bank simulator and analyzing performance 2.
BACKGROUND AND RELATED WORK
For handling bigdata using Hadoop technology great research work is done, it provides distributed data |
Impact Factor value: 5.181
2.2. Polyglot persistence Modern application development are mostly using data which is distributed [6] and volume of data which is going to be send is also day by day increasing so managing data is not easy task. In this data is Heterogeneous, and also the requirements like Horizontal Scalability, schematic flexibility and failure safety has become indispensable for application development. But using only single database like SQL or NoSQL database for data managing distributed data is not possible so to overcome this problem there is a term Polyglot persistence which allows handling of multiple datastores simultaneously. It is one-fits-all database development productivity and may increase performance. 2.3. Hadoop with Polyglot Persistence 2.3.1
|
Oracle Nosql Database
Oracle Nosql database is a type of NoSQL distributed key value database from Oracle Corporation [7]. It provides transactional semantics, horizontally scalability. It is very simple model no specific query is required to manage dta in database like SQL database. NoSQL using CAP theorem and BASE property which is very reliable for transaction process [7]. 2.3.2. a. b. c. d. e. 2.3.3.
2.1 Motivation to Hadoop
Š 2017, IRJET
storage and processing of huge data using Hadoop Distributed file system and mapreduce[1]. Hadoop addressing various challenge of bigdata like scalability, unstructured data, accessibility, real time analytics, fault tolerance and many more [2]. Hadoop contains Hadoop cluster which is used for storage of huge amount of unstructured data in distributed environment [5].
a. b.
Advantages Distributed environment Handling big Data Speed Processing Expenses are very less Scalable and flexible Disadvantages Cluster management is difficult task Required more management as data is increasing
ISO 9001:2008 Certified Journal
|
Page 2130