Text Document Classification System

Page 1

International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 06 | June 2022

e-ISSN: 2395-0056

www.irjet.net

p-ISSN: 2395-0072

Text Document Classification System Sarosh Dandoti1 ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Document classification needed in day to day

detection, and opinion analysis mining. As online articles and blogging has taken popular in online services using the web, text and article classification plays an important role in this field.

activities while arranging loads of text documents containing various kinds of articles on different topics. This text Document Classification is essentially the process of assigning each text document a category. Text Classification focuses on a wide range of applications from detecting emotion from a sentence to finding the general context of a summary of an article. In this paper, however, we have focused on the Classification of different newspaper articles to arrange them into different sections. The goal of this research is to design a multi-label classification model with parameter tuning to improve performance and predictions. Text and Document Classification has become an important part of today’s social internet media. Tweets, messages, and posts must be monitored to find out the existence of hateful speeches and cyberbullying.

1.1 Document Classification Document classification is the process of labeling documents using categories, depending on their content. Document classification can be manual or automated and is used to easily sort and manage texts, images, or videos. There are advantages and disadvantages to both processes. Classifying documents manually gives humans greater control over the process of classification, and they can make decisions as to which categories to use. However, when handling large volumes of documents, this process can be slow and repetitive. Hence, it is much faster, more cost-efficient, and more accurate, to carry out document classification using machine learning. In order to carry out this task, we need to create a solution workflow, and the first step is to find out about the data and its characteristics. Document classification can also be done using OCR where ML models recognize the structure of the document so it can be analyzed and segregated into different sectors. Here we are performing classification based on the text of the document and not the way it is structured. On a general level, we analyze the frequently occurring words in the documents of each category, and then we train the model accordingly. When a new document comes in, we check the words in that document with the words in the training classes and then predict its label accordingly.

One can use these classifiers in these areas where the model makes sure no content is posted which violates the social platforms laws.Social listening and opinion classification Businesses are interested in hearing what their consumers have to say about them. One of the most efficient methods is to use sentiment analysis to categorize social media comments and reviews based on their emotional nature. Sentiment analysis is a subset of NLP-based systems that focuses on deciphering the emotion, viewpoint, or attitude indicated in a text. They can distinguish between words with positive and negative implications. This is how we can automatically assess customer feedback or reactions to your products or services. For example, a business that designs airports uses sentiment analysis to categorize criticism left on social media by tourists. Managers can use opinion mining to make better decisions, win contracts, and deliver better services.

1.2 Text and Document Classification Even though these terms sound very similar there is one major point standing between them. For example, in document classification, we analyze the entire document and get a broad understanding of what the article is talking about.

Keywords: Text, Classification, Document, Categorizing, Python, Model Tuning, Supervised Learning.

1. INTRODUCTION

But we can go deeper into the document, divide it into bits of text, and get a granular understanding of the documents with text bits and the context and emotion they represent.

Text Document Classification using Supervised Machine Learning Algorithms. In the Document Classification, we have used multiple ML models such as KNNs, Naive Bayes, SVMs, Random Forest, and a comparative analysis of each model. Machine learning-based text classification is considered to be more useful for applications that have the classification of text documents in a soft format. These applications and their importance can be identified from the use of spam filtering in email, web services, fake news

© 2022, IRJET

|

Impact Factor value: 7.529

A document may talk about the pros and cons of a certain topic, it depends on us to determine the level of detail we want to dwell into. This was a very simple explanation. When we are working with more complex text classification problems, we require natural language processing or NLP.

|

ISO 9001:2008 Certified Journal

|

Page 2847


Turn static files into dynamic content formats.

Create a flipbook