Review of Existing Methods in K-means Clustering Algorithm

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395 -0056

Volume: 04 Issue: 02 | Feb -2017

p-ISSN: 2395-0072

www.irjet.net

Review of Existing Methods in K-means Clustering Algorithm MS. Kavita Shiudkar1, Prof. Sachine Takmare2 1 2

ME CSE, Bharti Vidyapeeth college of Engineering, Kolhapur, Maharashtra, India

Assistant Professor, Dept. of CSE, Bharti Vidyapeeth college of Engineering Kolhapur, Maharashtra, India

---------------------------------------------------------------------***--------------------------------------------------------------------2. CLUSTERING Abstract – Data mining is the process of extracting It makes an important role in data analysis and data mining useful information from the large amount of data and applications. Data divides into similar object groups based on converting it into understandable form for further use. their features, each data group will consist of collection of Clustering is the process of grouping object attributes similar objects in clusters. Clustering is a process of and features such that the data objects in one group are unsupervised learning. Highly superior clusters have high more similar than data objects in another group. But it is intra-class similarity and low inter-class similarity. Several now very challenging due to the sharply increase in the algorithms have been designed to perform clustering, each large volume of data generated by number of one uses different principle. They are divided into applications. Kmeans is a simple and widely used hierarchical, partitioning, density-based, model based algorithm for clustering data. But, the traditional kalgorithms and grid-based. means is computationally expensive; sensitive to outlier’s i.e. unnecessary data and produces unstable result hence Raw Input it becomes inefficient when dealing with very large Data datasets. Solving these Issues is the subject of many recent research works. In this paper, we will do a review on k-means clustering algorithms. Key Words: Initial Centroids, Clustering, Data mining, Data sets, K-means clustering, Map-Reduce.

Clustering Algorithms

1. INTRODUCTION Big Data is evolving term that describes any voluminous amount of structured, semi-structured and unstructured data. It is characterized by “5Vs”, volume (size of data set), variety (range of data type and source), velocity (speed of data in and out), value (how useful the data is), and veracity (quality of data). It creates challenges in their collection, processing, management and analysis. As new data and updates are constantly arriving, there is need of data mining to tackle challenges. The purpose of the data mining technique is to mine information from a bulky data set and make over it into a reasonable form for supplementary purpose. Data mining is also known as the knowledge discovery in databases (KDD). Technically, data mining is the process of finding patterns among number of fields in large relational database. It is the best process to differentiate between data and information. Data mining consists of extract, transform, and load transaction data onto the data warehouse system, Store and manage the data in a multidimensional database system, Provide data access to business analysts and information technology professionals, analyze the data by application software, Present the data in a useful format, such as a graph or table. © 2017, IRJET

|

Impact Factor value: 5.181

|

Data Clusters

Clusters stages Fig: 1 Clustering There are two types of Clustering Partitioning and Hierarchical Clustering. 1.

Hierarchical Clustering - A set of nested clusters organized in the form of tree.

2.

Partitioning Clustering - A division of data objects into subsets (clusters) such that each data object is in exactly one subset.

3. K-MEANS CLUSTERING K-means clustering technique is widely used clustering algorithm, which is most popular clustering algorithm that is used in scientific and industrial applications. It is a method of cluster analysis which is used to partition N objects into k clusters in such a way that each object belongs to the cluster ISO 9001:2008 Certified Journal

|

Page 1213

Turn static files into dynamic content formats.

Create a flipbook