International Research Journal of Engineering and Technology (IRJET) Volume: 04 Issue: 02 | Feb -2017
www.irjet.net
e-ISSN: 2395 -0056 p-ISSN: 2395-0072
A Firefly based improved clustering algorithm Priyanka Singhai, Prof Abhey Kothari, Mr. Rahul Moriwal M.Tech, Computer Science &Engineering, Acropolis Institute of Technology & Research. Indore, M.P. India -------------------------------------------------------------------****-----------------------------------------------------------------------
Abstract—The computational domain need to develop the methods by which the storage and data is handled effectively. Therefore the data mining techniques are utilized to evaluate the data and obtain the meaningful patterns to explore hidden knowledge. In this presented work the cluster data analysis technique is investigated. The cluster analysis is a technique by which the data is analysed in unsupervised manner to divide and decided the different groups of the data according to the user inputs. In this process the similarity among the grouped elements is the primary objective to achieve. This objective is help to find the better performance from the clustering algorithm. In this proposed work the clustering algorithm is studied in detail. Additionally the different clustering issues are addressed to achieve the good clustering. Finally the firefly optimization algorithm based clustering algorithm is followed for cluster data analysis. this technique is suffers from the long running time for performing the clustering therefore an improved clustering algorithm with the help of k-means algorithm and the firefly algorithm is proposed. The proposed technique provides ease in the centroid selection and the efficient and accurate data modeling. Additionally promises to reduce the processing time of the algorithm. Further the proposed clustering technique is implemented with the help of visual studio environment. After implementation of the proposed algorithm the comparative study with the traditional firefly algorithm is performed. For comparative performance study the accuracy, error rate and resource consumption is taken as the primary parameters. The experimental results show the high performance outcomes during the data evaluation and accurate cluster formation. Keywords—data mining, cluster analysis, performance improvement, firefly algorithm, k-means.
1. INTRODUCTION The data mining is a domain of automatic data analysis. For evaluation of data there are two different approaches are used first supervised and second the unsupervised learning approach. In this presented work the supervised learning technique is used for investigation and demonstration. Data mining is a
technique of analysing data and extraction of meaningful data for the real world applications. The extraction of data from the raw set of data needs to develop some computational data model by which the data is evaluated in certain criteria and return the matched data which is required by the application. The evaluation of data is performed in both the manners either with the supervisor or without the supervisor. In the machine learning and data mining the supervisor are the labelled data which is produced for analysis and using the class labels the learning process are keep in track. Most of the supervised learning algorithms are the classification algorithms and the unsupervised learning supports the clustering algorithms. However the supervised learning algorithms are much accurate as compared to the unsupervised learning techniques. But the supervised learning techniques are always used with the labeled data and the amount of data is countable. On the other hand the unsupervised learning technique or clustering algorithms are used when the data is unlabelled or found in huge quantity. Therefore the proposed work is intended to explore the domain of data clustering and the performance improvement of the traditional clustering approaches. Therefore the optimization based technique based technique namely firefly algorithm is used for investigation and solution design. Basically the clustering of data need to identify the optimal cluster centers using the optimization techniques. After finalizing the cluster centers the data clustering performed on the data. Therefore some initial improvement on the data centroid selection process is required to perform by which the solution becomes more effective and accurate for data analysis. The data mining techniques are directly depends on the data which is used for analysis and pattern recovery. If the size of data is small, pre-defined classes are exist and the data is also refined and
Š 2017, IRJET | Impact Factor value: 5.181 | ISO 9001:2008 Certified Journal | Page 215