Reviews on swarm intelligence algorithms for text document clustering by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 09 Issue: 04 | Apr 2022

p-ISSN: 2395-0072

www.irjet.net

Reviews on swarm intelligence algorithms for text document clustering S DHANALAKSHMI1, S SATHIYABAMA2 Department of Computer Science Government Arts College, Rasipuram, Tamilnadu, India ---------------------------------------------------------------------***--------------------------------------------------------------------1,2

1,2 Thiruvalluvar

SI is the collective behavior of self-organized and decentralized systems, which includes both intelligent and non-intelligent individuals who follow simple rules or behaviors to do very complicated tasks with limited local information. Observing natural or artificial behaviors such as bird flocks, fish schools, and ant food foraging led to the development of SI algorithms. Particle swarm optimization (PSO), bat optimization (BA), grey wolf optimization (GWO), firefly optimization (FFA), ant colony optimization (ACO), artificial fish swarm algorithm (AFSA), and artificial bee colony optimization (ABC) are examples of SI algorithms [10].

Abstract - Text clustering is an unsupervised learning

technique that divides a large number of text documents into a small number of clusters. Each cluster contains similar documents, while the clusters contain dissimilar text documents. Various optimization problems, including text document clustering challenges, have been effectively solved using swarm intelligence (SI) optimization algorithms. This paper reviews all of the relevant literature on SI-based text document clustering applications including many variants, basic, enhanced, and hybrid methods. The main procedure of text clustering, distance and similarity function, and theoretical discussion are also discussed.

By comparing the many SI optimization algorithms accessible for text document clustering, this research aims to give the reader an accurate overview of the numerous SI optimization algorithms available. This work investigates the accessibility of each class and the implementation of a suitable optimization algorithm for each. As a result, this research will help academics and clinicians choose methods and algorithms that are appropriate for a wide range of text clustering applications. As a result of these studies, the goal of this study was to investigate the field of SI clustering methods and achieve the following goals:

Key Words: text mining, text clustering, swarm intelligence, optimization algorithms, data mining

1. INTRODUCTION Clustering is a general text mining technique for representing a dataset using a limited set of clusters, sometimes with a fixed number of clusters, based on similarities between its elements [1, 2]. The partitioning clustering technique is widely applied to solve real-world applications including data clustering, image clustering, marketing, and bio-informatics. The goal of text clustering is to create optimal clusters that contain related documents. Clustering is the process of dividing a large number of documents into a set of related groups. Each group contains many similar objects, but different groups contain different objects [3-5]. The overall technique for the text clustering problem as an optimization problem, its formulation, mathematical notations, preprocessing stages, document format, clustering problem solution representation, and the fitness function are all described in this part. This can aid future studies in obtaining clear broad information about the issue [6, 7].

Impact Factor value: 7.529

The present paper discusses the pre-processing steps of text clustering



Various related works have been discussed using swarm intelligence algorithms.



To provide a comprehensive classification of clustering evaluation criteria that can be used in experimental research.



Conduct a theoretical examination of each class's best representative SI optimization techniques.

The following is a list of the main sections of this study. In Section 2, the main procedures of the text clustering are presented. The variants of SI algorithms that have been employed to solve text clustering problems are shown in Section 3. Section 4 discusses the evaluation criteria utilized in text clustering applications. Section 5 contains a discussion and theoretical analysis. Finally, Section 6 presents the survey's findings as well as future research possibilities.

Extracting meaningful data from documents is a difficult operation that necessitates the use of rapid and high-quality document clustering techniques. The K-means algorithm is a straightforward, quick, and unsupervised partitioning algorithm that produces results that are both parallelized and comprehensible [8, 9]. However, it has many drawbacks like local optima, low accuracy, and failure to achieve global solutions. Hence, the swarm intelligence (SI) optimization algorithm can overcome the limitations of the k-means algorithm and obtain a global optimum value.



ISO 9001:2008 Certified Journal

Page 682