International Research Journal of Engineering and Technology (IRJET) Volume: 09 Issue: 11 | Nov 2022
www.irjet.net
e-ISSN: 2395-0056 p-ISSN: 2395-0072
An in-depth review on News Classification through NLP Labade Saurabh Ravindra1, Thorve Rushikesh Vallabh2, Wavhal Ruturaj Sagar3, Prof. M. G. Sinalkar4 1,2,3Department
of Computer Engineering, Jaihind College of Engineering, Pune, Maharashtra, India Asst. Professor, Department of Computer Engineering, Jaihind College of Engineering, Pune, Maharashtra, India ---------------------------------------------------------------------***--------------------------------------------------------------------4
Abstract - An ever-increasing number of people in the world
In today's information age, there are many online resources like Yandex, Bing, and others from which anybody may access a wide range of data. In general, the content of such portals is organized into distinct subcategories for the convenience of viewers. Anyone may quickly and easily get the specific pieces of news and data that most interest them. Included here might be "economic," "educational," "sports," and so on. There are numerous news stories in various categories that are completely unrelated to the topic at hand. In recent years, several studies have been presented for the purpose of news categorization. Researchers in this area have used a wide range of taxonomical approaches to analyses their native tongue.
depend on online news outlets as their major source of everyday information and current events. As more people discover how useful digital data can be, both the amount of information available and the frequency with which it is accessed are anticipated to grow considerably. With the massive amounts of data being produced by a plethora of publishers, it may be challenging for average consumers to obtain the data they need. Unfortunately, meanwhile, current search engines return so many items that only a small fraction of them are relevant to user requests. As a result, it might be helpful to layer search engines with a classifier, any algorithm created to sort large amounts of data into predetermined categories. There have been considerable inconsistencies that are observed with the realization of an effective news classification approach in the current methodologies which are listed in this survey paper. Thus, to improve this condition there is a need for an effective and useful news classification approach that utilizes Natural Language processing and feature extraction along with Decision Making and fuzzy list. This approach will well defined in the next research on this paradigm.
As more people discover how useful digital information can be, both the volume of information and the frequency with which it is accessed are expected to soar. With the massive amounts of data being produced by a plethora of publishers, it may be challenging for average consumers to get the information they need. Unfortunately, nevertheless, current search engines return so many items that only a small fraction of these are relevant to user queries. As a result, it might be helpful to layer search engines with a classification model, an algorithm created to sort large amounts of data into predetermined categories. Multiple approaches exist that can accurately categorize an English text. The majority of these algorithms are broken down into four sub-categories: text pre-processing, feature extraction, categorization, and effectiveness.
Key Words: Natural Language Processing, Feature Extraction, Decision Making, Fuzzy List.
1. INTRODUCTION As computing power and connectivity have improved over the last several years, so too has the volume of online data. Getting one's news these days mostly comes from the big news portals. A growing amount of news content, unfortunately, poses significant difficulties for news portals. The requirements of modern society cannot be satisfied by using the same text categorization techniques that have been used for decades. As a result, text categorization modeling development has become more important in the area of knowledge discovery in recent times. The speed and accuracy with which the news text categorization technique processes all text input and predicts categorization labels is impressive. As a result, automated classification may provide a cost-effective solution for completing the text categorization job for the media outlet. The study of automated text categorization is becoming more significant in the age of big data.
© 2022, IRJET
|
Impact Factor value: 7.529
One of the most critical and difficult problems in machine learning techniques is text categorization. Allocating a label from a collection of possible ones to a document corpus autonomously is the task known as text categorization. The significance of this document is determined by the labels that will be applied. For this reason, even for an ordinary human, picking the right collection of labels could be a matter of some ambiguity. The papers may be categorized into one or more categories according to the information we know about them. The documents in this data corpus must only be filed under one of the available categories. X. Liang et al. present a basic method for identifying CM articles using TF-IDF characteristics, with only middling success [1]. Finally, a graph-based approach is proposed to further improve the identification of content marketing
|
ISO 9001:2008 Certified Journal
|
Page 717