Fast Range Aggregate Queries for Big Data Analysis

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 03 | Mar -2017

p-ISSN: 2395-0072

www.irjet.net

FAST RANGE AGGREGATE QUERIES FOR BIG DATA ANALYSIS M.R.ABHIMAN RAAM1, V.S.ARAVINDAKSHAN2, P.A.HARISH3 , S.KARTHIK4 Department of information technology, Valliammai Engineering College, Tamilnadu, India Department of information technology, Valliammai Engineering College, Tamilnadu, India 3 Department of information technology, Valliammai Engineering College, Tamilnadu, India 4Department of information technology, Valliammai Engineering College, Tamilnadu, India ---------------------------------------------------------------------***--------------------------------------------------------------------1

2

Abstract - In big data environment, Aggregate Queries are

important tools in finding individual persons behavior, trends and various activities in the real world. The Aggregation is applied to employ aggregate function on all the tuples within a specified range. Existing approaches for this method is not enough to provide fast results for large datasets such as in banks, financial institutions, etc. It is important to provide effective methods and tools for big data analysis. In the proposed system, Fast RAQ divides big data into different partitions using Partitioning Algorithm and generates a local value for each individual partition. When a Query request is given, this algorithm obtains the result directly by grouping the local estimates from all tuples and provides a collective results. This system applies Fast RAQ for Banking Domain. The banking datasets are divided into multiple tuples and stored in different sets of the database across different places. This proposed method tracks multiple accounts maintained in different banks of same user and their transaction details. This helps in finding out tax violators using their unique id. Key Words: Hadoop, Bigdata, Cloud, Fast RAQ, Balanced partitioning

1.INTRODUCTION

accurate results for the large datasets. The Hadoop and map reduce concept is used for storage and processing of large datasets.

2. LITERATURE SURVEY Xiaochun Yun, et al.. [1] This paper describes about the implementation of low cost and fast approach technique for getting accurate results in big data analysis using queries. Zhiqiang Zhang, et al.. [2] proposed Hadoop online Aggregation in the distributed environment. The random sampling and sample size estimation are analyzed. This two sample values are calculated according to (1)user calculated sample value, (2)system calculated sample value. It also ensures that approximate aggregation results are produced. A. Munar, et al.. [3] It is based on the highly scalable and fault tolerance map reduce model for the use of large scale database. It uses various big data analytics to handle systems with different requirement specification. This paper mainly focuses on providing good performance even when there is an enormous increase in the database.

Big data analysis is generally used to explore the hidden patterns from the large datasets. This provides a new approach to discover the solutions for the various difficulties in the real world. It is vital to provide a cost-effective and time saving methods and tools for the analysis in big data environment.

Y. Shi, et al.. [4] Cloud Based system for Online Aggregation which provides progressive approximate aggregate answers for both single table and multiple joined tables.

The main aim of the project is to identify the tax violators in the banking sector. To track the transaction of users in multiple banks and monitor them using Fast Range Aggregate Queries with Balance Partitioning algorithm. The result is obtained by summarizing local estimates from all the partitions and provides a collective results.

In the Proposed System, the data sets are divided into different partitions using the partitioning algorithm. Then a sample value is obtained from each individual partitions and the analyzes made on the datasets is obtained. When the query arrives another cost factor involved in the analyzing of big data are cost of network synchronization and the scanning of files in every transaction while passing the range aggregate queries meanwhile in our proposed system since our query is fast range aggregated it passes through every tuple, counter values from the aggregated columns and the sample values from the rows are calculated the cost of network synchronization of files and the scanning of files can be reduced. Which it leads to it produce of accurate result.

The transaction details of different banks are taken and stored in the cloud. The datasets are partitioned into different tuples before uploading it into the cloud storage. The algorithm partitions the datasets according to its attributes, interests, etc. In this project, the time taken to process the given query is enormously reduced. It provides an efficient and

ÂŠ 2017, IRJET

|

Impact Factor value: 5.181

3. PROPOSED SYSTEM

| ISO 9001:2008 Certified Journal

|

Page 2267

Turn static files into dynamic content formats.

Create a flipbook