Mining Big Data using Genetic Algorithm

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 07 | July -2017

p-ISSN: 2395-0072

www.irjet.net

Mining Big Data using Genetic Algorithm Surbhi Jain

Assistant Professor, Department of Computer Science, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract – In today’s era, the amount of data available in

some of the clustering algorithms and genetic algorithm to process big data.

the world is growing at a very rapid pace day by day because of the use of internet, smart phones, social networks, etc. This collection of large and complex data sets is referred to as Big Data. Primitive database systems are unable to capture, store and analyse this large amount of data. It is necessary to improve the text processing so that the information or the relevant knowledge which was previously unknown can be mined from the text. This paper proposes need for an algorithm for the clustering problem of big data using a combination of the genetic algorithm with some of the known clustering algorithms. The main idea behind this is to combine the advantages of Genetic algorithms and clustering to process large amount of data. Genetic Algorithm is an algorithm which is used to optimize the results. This paper gives an overview of concepts like data mining, genetic algorithms and big data.

To extract some meaningful information from the source data is the process called Mining. It is a set of computerized techniques that are used to extract formerly unknown or buried information from large sets of databases. A Successful Data Mining makes possible to uncover patterns and relationships, and then to use this “new” information for making proactive knowledge-driven business decisions. There are a lot of algorithms which are being used for mining the information from plain text. The algorithms used to solve the optimization problems are the Genetic Algorithms. These algorithms work on search based inputs. The algorithms eventually leads to generate useful solutions for such kind of problems.

2. GENETIC ALGORITHMS

Key Words: Genetic Algorithms, Big Data, Clustering, Chromosomes, Mining

Genetic Algorithms are a clan of computational prototypes inspired by evolution theory of Darwin. According to Darwin the species which is fittest and can adapt to changing surroundings can survive; the remaining tends to die away. Darwin also stated that “the survival of an organism can be maintained through the process of reproduction, crossover and mutation”. GA’s basic working mechanism is as follows: the algorithm is started with a set of solutions (represented by chromosomes) called population. Solutions from one population are taken and used to form a new population (reproduction). This is driven by optimism, that the new population will be superior to the old one. This is the reason they are often termed as optimistic search algorithms. The reproductive prospects are distributed in such a way that those chromosomes which represent a better solution to the target problem are given more chances to reproduce than those which represent inferior solutions.

1. INTRODUCTION In current Big Data age the data is becoming more and more available owing to advances in information and communication knowhow, enterprises are gaining meaningful information, relevant knowledge and vision from this huge data based on decision making. Big data mining is the ability of taking out valuable information from huge and complex set of data or data streams i.e. Big Data. One of the important data mining techniques for big data analysis is clustering. There are difficulties for applying clustering techniques to big data due to enormous amount of data rising on daily basis. There are a lot of clustering techniques available the most common of which is the K-means algorithm. It is used to analyze information from a dataset. But as we are saying that because of big data we have plethora of data available, thus available clustering algorithms are not very efficient. As Big Data refers to terabytes and petabytes of data, we need to have clustering algorithms with high computational costs. We can think of designing an algorithm which can combine the features of

|

Impact Factor value: 5.181

They search through a huge combination of parameters to find the best match. For example, they can search through different combinations of materials and designs to find the perfect combination of both which could result in a stronger, lighter and overall, better final product.

|

ISO 9001:2008 Certified Journal

| Page 743

Turn static files into dynamic content formats.

Create a flipbook