Cancer data partitioning with data structure and difficulty independent clustering scheme

International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 02 | Feb -2017

www.irjet.net

e-ISSN: 2395 -0056 p-ISSN: 2395-0072

CANCER DATA PARTITIONING WITH DATA STRUCTURE AND DIFFICULTY INDEPENDENT CLUSTERING SCHEME K.R.Kavitha1, G. Angeline Prasanna2

Research Scholar, Department of Computer Science, Kaamadhenu Arts and Science College, Tamilnadu, India Head and Assistant Professor, Dept. of Computer Application&IT, Kaamadhenu Arts and Science College, Sathy,

1 2

--------------------------------------------------------------------***--------------------------------------------------------------------Abstract - Hidden knowledge extraction is the main 1. INTRODUCTION operation of the data mining applications. Decision making 1.1 Clustering Concepts processes are carried out with the support of the discovered knowledge. Relevant records are grouped by using the clustering methods. Cancer diagnosis data values are maintain in high dimensional model. Micro array data models are adapted to process the high dimensional data values. Distance measures are estimated to identify the record relationship levels. The cluster representative elements are referred as cluster ensembles. All the relationship analysis is carried out through the ensemble analysis mechanism. Cluster ensemble consolidates the transactions of the individual cluster results. Distributed Computing, Knowledge Reuse and Quality and Robustness are the key features of the cluster ensemble models. The ensemble members are fetched using the Incremental Ensemble Membership Selection (IEMS) scheme. The clustering operations are performed with Incremental Semi-Supervised Cluster Ensemble (ISSCE) framework. The cancer expressions are compared using the Similarity Functions (SF). Data and structure dependency is incased in the ISSCE scheme. The cancer data partitioning process uses the breast cancer data values. Noisy data removal and missing value replacement operations are carried out under the data preprocess. The Dynamic Ensemble Membership Selection (DEMS) scheme is build to support data structure and complexity independent clustering process. Data clustering operations are performed through the Partition Around Medoids (PAM) clustering technique. The PAM clustering technique and DEMS scheme are combined to handle the ensemble based data partitioning process. The clustering accuracy level is increased in the healthcare data partitioning process. Key Words: ISSCE (Incremental Semi Supervised Cluster Ensemble, IEMS (Incremental Ensemble Membership Selection), SF (Similarity Function), DEMS (Dynamic Ensemble Membership Selection),PAM (Partition Around Medoids) .

|

Impact Factor value: 5.181

|

Clustering is the classification of objects into different groups, or more precisely, the partitioning of a data set into subsets, so that the data in each subset share some common trait - often proximity according to some defined distance measure. Data clustering is a common technique for statistical data analysis, which is used in many fields, including machine learning, data mining, pattern recognition, image analysis and bioinformatics. It is possible to guarantee that homogeneous clusters are created by breaking apart any cluster that is unhomogeneous into smaller clusters that are homogeneous.  Used mostly for consolidating data into a high-level view and general grouping of records into like behaviours. Space is defined as default ndimensional space, or is defined by the user, or is a predefined space driven by part.  Besides the term data clustering, there are a number of terms with similar meanings, including cluster analysis, automatic classification, numerical taxonomy, botryology and typological analysis.  The clustering technique is called an unsupervised learning technique. It is a technique that when they are run, there is not a particular reason for the creation of the models to perform predication. In clustering, there is no particular sense of why certain records are near each other or why they all fall into the same cluster.

Use of Clustering in Data Mining

Clustering is often one of the first steps in data mining analysis. It identifies groups of related records that can be used as a starting point for exploring further relationships. This technique supports the development of population segmentation models, such as demographicbased customer segmentation. A company that sale a variety of products may need to know about the sale of all of their products in order to check that what product is giving extensive sale and which is lacking. This is done by data mining techniques. But if the system clusters the products that are giving fewer sales then only the cluster of such products would have to be checked rather than comparing the sales value of all the products. This is actually to facilitate the mining process.

ISO 9001:2008 Certified Journal

|

Page 1784

Turn static files into dynamic content formats.

Create a flipbook