News article classification using Naive Bayes Algorithm
PrasadAbstract – The Naive Bayes classifier, a probabilistic machinelearningtechnique,isusefulforclassificationtasks. It is based on the Bayes theorem, which states that the likelihood of an event occurring given some observed evidence is equal to the prior probability of the event occurring. The Naive Bayes classifier can be trained on a datasetoflabellednewsarticles,eachofwhichisassociated with a particular class or category, for the purpose of classifyingnewsarticles.Thefeaturesofthearticles,suchas thewordsusedandthelengthofthearticle,canthenbeused bytheclassifiertopredicttheclassofanunseenarticle.The "naive"assumption,whichisoneofthekeyassumptionsof theNaiveBayes classifier, is thatthearticles'featuresare independentofoneanother.Theclassifierisabletopredict outcomeswithouttakingintoaccounthowfeaturesinteract with one another because of this assumption. The Naive Bayesclassifiercanstillperformwellonmanyclassification tasks, including the classification of news articles, despite thisassumption.
Key Words: Natural language toolkit ,python ,machine learningalgorithm
1. INTRODUCTION
The process of classifying a news article according to its content is known as news article classification. This is a commonissueininformationretrievalandnaturallanguage processing,anditcanbeusefulfororganizingandsearching throughlargecollectionsofnewsarticles.
One approach to categorising news stories is to use a machinelearningalgorithmliketheNaiveBayesclassifier.A probabilistic model known as the Naive Bayes classifier makes predictions based on the likelihood that particular occurrences will occur. The events are the classes or categoriestowhichnewsarticlescanbeclassified,andthe featuresarethewordsorothercharacteristicsofthearticles.
A dataset of labelled news articles, each of which is associatedwithadistinctclass,isrequiredtotrainaNaive Bayesclassifierfornewsarticleclassification.Theclassifier would then learn the probability distribution of the characteristics for each class. which would then use this informationtopredictarticlesthathadnotbeenseenbefore. The Naive Bayes classifier is able to efficiently simplify calculationsandmakepredictionsbecauseitassumesthat thearticles'featuresareindependentofoneanother.
Kavathe DolasA lot of classification tasks benefit from the Naive Bayes classifier'srelativesimplicityandeaseofuse,whichisoneof its advantages. It can also do well on a variety of classification problems, such as classifying news articles. However,inordertoensurethattheclassifieriseffective,it isessentialtoevaluateitsperformanceonyourparticular datasetandproblem.
2. LITERATURE REVIEW
R. Siva Subhramanian and D. Prabha [22] contributedtheirpaperin InFebruary2020onresearchof Thisresearchseekstoidentifypotentialcustomers.
TheyusedtheSBCmethodtomodifytheNBmodelwiththe goal of enhancing prediction by removing unnecessary datasetfeatures.
Accordingtotheexperimentalfindings,theWSNBrunning timeis0.03secondsforWSNBatdepth1, 0.06secondsfor WSNB at depth 2, and 0.15 seconds for WSNB at depth 3. RunningtimeforStandardNaiveBayeswas0.16seconds. WhichwasunmistakablydemonstratingthatWSNBshortens themodel'srunningtimeascomparedtotraditionalNaive Bayes.
FacultyofAgriculture,UniversityofNoviSad[23]published their article in 2022. The effectiveness of the Naive Bayes approach for predicting water quality was studied by the author.Ninewaterqualityfactorswereexamined,including temperature, oxygen saturation values, and others. Five locations and 68 samples of data were used to assess the water quality using the Naive Bayes model. The testing reportrankedeachparameterasverygood,excellent,good, orbad;afteranalysingthereportandusingthemethod,the author came to the conclusion that the model correctly identifiedwaterclassin64outof68instances.
Disha Sharma and Sumit Chaudhary [24] They studied varioussourcesofstresswhichincludes1)Thesurrounding Environment2)SocialStress3)Physiological4)Thoughts
Authors applied four machine learning technics that are logistic Regression, Naïve Bayes, Multilayer perceptron ,Bayer’sNet.
Parameters like False Positive rate, True Positive Rate , precision, Recall considered for the performance. After comparing all the results of four methods they concluded
that Baye’s Net classifiers gives longest accuracy of 88 percentageandNaiveBayesgivesaccuracyof86percentage.
MamataThakurandteam[25]byconcerningtheproblemof huge growth of internet and difficulty in getting relevant topic according to search. Authors chose some news websitesafterthattheimportantattributesfromthese
The Nave Bayes algorithm was used by the authors to classifydatafrom10differentwebsites,andtheresultsof comparative studies with other current algorithms on the same dataset demonstrate that Nave Bayes outperforms them.
YiYing[26]Theauthorofthisstudyemployedavarietyof newsstoriestoresearchandusednewscategoriesincluding sports,politics,business,etc.TheConfusionMatrixresults show that the Sarcasm model developed using the Naive Bayesapproachaboveachievedanaccuracylevel of66%, 70%withdrawal,and68%precision.
By summarising the literature review we can understand sometimesNaviesBayesgivesgoodresultsbutnotableto give 100% correct results and some of other machine learning algorithms are more effective than NB, so more researchescanbedoneincreaseefficiencyofNB
3. METHODOLOGY
3.1 PROPOSED SYSTEM
Hereisageneralapproachtoutilisingtheprovidedcodeto categorisenewsarticles:
1.Assembleandclassifyadatasetofnewsstories,eachwith a category tagged (e.g., sports, tech, business, entertainment). The classifier will be trained and tested usingthisdataset.
2.Removeallstopwordsfromthedataandlowercaseeach wordineacharticleaspartofthepre-processing.
3. To turn the text input into numerical feature vectors, createaTfidfVectorizer.
4. The training and test data should be converted into featurevectorsusingtheTfidfVectorizer.
5. Making use of the training data, create a Multinomial NaiveBayesclassifier.
6.Calculatetheclassifier'saccuracybyevaluatingitagainst thetestdata.
7.Makepredictionsforfresh,unlabellednewsarticlesusing the classifier by converting them into feature vectors and passingthemintotheclassifierspredictmethod
Importmodules
Createtrainingand testdata
Pre-processthedata
Createtfidvectorizer
Create multinomialNB
Traintheclassifier
Evaluatetheclassifier
Makepredictionfor newdata
Printpredicteddata
RESULT AND DISCUSION
Theprogramwedidisasimpleclassificationprogramthat uses naïve bayes classifier to predict the category of text data.Theprogrammefirstgeneratestwosetsoftextdata:a trainingsetandatestset,whereeachtextisassignedtothe appropriatecategory.Thetextdataisthenlowercasedand stopwordsareremovedaspartofthepre-processing.The textdataisthentransformedintofeaturevectorsviaaTf-Idf vectorizer,whicharethenfedintotheclassifierasinput.The classifieristhentrainedonthetrainingsetofdataandused tothetestsetofdatatoprovidepredictions.
Becauseofthelimiteddatasetandsmallamountofdata,this programme cannot make good predictions based on the inputdata.
Use a larger and more varied training and test data set to enhancetheeffectivenessofthisprogramme.Additionally, you can experiment with various classifiers, feature extractionmethods,andtextpre-processingapproaches.It canalsobebeneficialtofine-tunethemodelparameters,use pre-trainedmodels,andfine-tuneitusingyourowndataset.
Beforeselectingaparticulartechniqueormodel,it'scrucial totakethecontextoftheproblemathandandtheprecise requirementsofthetaskintoaccount.
3. CONCLUSION
The Naive Bayes classifier is a popular machine learningmethodthatcanbeusedtocategorisenewsstories. Ithasalotgoingforit,likebeingeasytouse,workingwell, anddoingwellonalotofclassificationtasks.However,its performanceonaparticulardatasetmustbeevaluated.The classifier's performance can be improved, more complex problemscanbehandled,andtheclassifiercanbeappliedto new domains can all be developed further in this area. Naturallanguageprocessingandinformationretrievalcould benefitgreatlyfromusingtheNaiveBayesclassifier.
FUTURE SCOPE:
InthefieldofnewsarticleclassificationusingNaive Bayes classifiers, there are numerous potential future directionsforresearchanddevelopment.Thesearesome: expandingtheapplicationoftheclassifiertonewdomains andlanguages,enhancingtheclassifier'sperformance,and incorporatingitintonewsanalysissystems.TheNaiveBayes classifieroffersalotofpotentialforsolvingavarietyofrealworld issues, and more research may be done on its capabilitiesandrestrictionswhenitcomestocategorising newsitems.
REFERENCES
[1] Kuldeep Vayadande,Aditya Bodhankar,Ajinkya Mahajan,Diksha Prasad,Shivani Mahajan,Aishwarya PujariandRiyaDhakalkar,“ClassificationofDepression on social media using Distant Supervision”, ITM Web Conf.Volume50,2022.
[2] Kuldeep Vayadande,Rahebar Shaikh,Suraj Rothe,Sangam Patil,Tanuj BawareandSameer Naik,” Blockchain-BasedLandRecordSystem”,ITMWebConf. Volume50,2022.
[3] KuldeepVayadande,KirtiAgarwal,AadeshKabra,Ketan GangwalandAtharv Kinage,” Cryptography using AutomataTheory”,ITMWebConf.Volume50,2022
[4] Samruddhi Mumbare,Kunal Shivam,Priyanka Lokhande,Samruddhi Zaware,Varad DeshpandeandKuldeep Vayadande,”Software
Controller using Hand Gestures”, ITM Web Conf. Volume50,2022
[5] Preetham,H.D.,andKuldeepBabanVayadande."Online CrimeReportingSystemUsingPythonDjango."
[6] Vayadande,KuldeepB.,etal."SimulationandTestingof DeterministicFiniteAutomataMachine."International Journal of Computer Sciences and Engineering10.1 (2022):13-17.
[7] Vayadande, Kuldeep, et al. "Modulo Calculator Using TkinterLibrary."EasyChairPreprint7578(2022).
[8] VAYADANDE, KULDEEP. "Simulating Derivations of Context-FreeGrammar."(2022).
[9] Vayadande, Kuldeep, Ram Mandhana, Kaustubh Paralkar, Dhananjay Pawal, Siddhant Deshpande, and Vishal Sonkusale. "Pattern Matching in File System."International Journal of Computer Applications975:8887.
[10] Vayadande, Kuldeep, Ritesh Pokarne, Mahalakshmi Phaldesai,TanushriBhuruk, TanmayPatil,and Prachi Kumar. "Simulation Of Conway’s Game of Life Using CellularAutomata."SIMULATION9,no.01(2022).
[11] Gaurav, Rohit, Sakshi Suryakant, Parth Narkhede, Sankalp Patil, Sejal Hukare, and Kuldeep Vayadande. "Universal Turing machine simulator."International JournalofAdvanceResearch,IdeasandInnovationsin Technology,ISSN(2022).
[12] Vayadande, Kuldeep B., Parth Sheth, Arvind Shelke, Vaishnavi Patil, Srushti Shevate, and Chinmayee Sawakare. "Simulation and Testing of Deterministic Finite Automata Machine."International Journal of Computer Sciences and Engineering10, no. 1 (2022): 13-17.
[13] Vayadande, Kuldeep, Ram Mandhana, Kaustubh Paralkar, Dhananjay Pawal, Siddhant Deshpande, and Vishal Sonkusale. "Pattern Matching in File System."International Journal of Computer Applications975:8887.
[14] Vayadande,KuldeepB.,andSurendraYadav."AReview paper on Detection of Moving Object in Dynamic Background."International Journal of Computer SciencesandEngineering6,no.9(2018):877-880.
[15] Vayadande, Kuldeep, Neha Bhavar, Sayee Chauhan, Sushrut Kulkarni, Abhijit Thorat, and Yash Annapure.SpellCheckerModelforStringComparisonin Automata.No.7375.EasyChair,2022.
[16] VayadandeKuldeep,HarshwardhanMore,OmkarMore, ShubhamMulay,AtharvaPathak,andVishwamTalnikar. "Pac Man: Game Development using PDA and OOP." (2022).
[17] Preetham,H.D.,andKuldeepBabanVayadande."Online CrimeReportingSystemUsingPythonDjango."
[18] Vayadande, Kuldeep. "Harshwardhan More, Omkar More, Shubham Mulay, Atahrv Pathak, Vishwam Talanikar,“PacMan:GameDevelopmentusingPDAand OOP”."International Research Journal of Engineering andTechnology(IRJET),e-ISSN(2022):2395-0056.
[19] Ingale, Varad, Kuldeep Vayadande, Vivek Verma, AbhishekYeole,SahilZawar,andZoyaJamadar."Lexical analyzerusingDFA."International Journal ofAdvance Research,IdeasandInnovationsinTechnology,www. IJARIIT.com.
[20] Manjramkar, Devang, Adwait Gharpure, Aayush Gore, IshanGujarathi,andDhananjayDeore."AReviewPaper on Document text search based on nondeterministic automata."(2022).
[21] Chandra, Arunav, Aashay Bongulwar, Aayush Jadhav, RishikeshAhire,AmoghDumbre,SumaanAli,Anveshika Kamble, Rohit Arole, Bijin Jiby, and Sukhpreet Bhatti.Survey on Randomly Generating English Sentences.No.7655.EasyChair,2022.
[22] R.Siva Subramaniyam ,D.Prabha Customer behavior analysis using weighted naïve Bayesian background “international journal of innovative technology and exploringengineering(IJITEE)
[23] Departmentofwatermanagementfacultyofagriculture ,universityofNovisad,Waterqualitypredictionbased onnaiveBayesalgorithm(2022)
[24] Disha Sharma, Sumit Chaudhary “stress prediction of professional student using machine learning” .international journal of engineering and advanced technology(IJEAT)
[25] Mamatathakur,Priyankathakur,Pritamthakur,govinda rao meetu.“Classification of news using naïve algorithm”. International journal of creative research thoughts(IJCRT)(2018).
[26] YiYing.”effEffectivenessoftheNewsTextClassification Test Using the Naïve Bayes”. Journal of physics: conferenceseries(2021).
BIOGRAPHIES
PraasadNamdeoRathod
“Student,DepartmentofArtificial intelligenceanddatascience, Vishwakarmainstituteof technology,pune,India“
SwapnilLahuGawali. ““Student,Departmentof Artificialintelligenceanddata science,Vishwakarmainstituteof technology,pune,India“
ShivprasadVyankatKavathe. ““Student,Departmentof Artificialintelligenceanddata science,Vishwakarmainstituteof technology,pune,India“
AmitVasantDolas
““Student,Departmentof Artificialintelligenceanddata science,Vishwakarmainstituteof technology,pune,India“