PREDICTION OF COVID-19 USING MACHINE LEARNING APPROACHES

Page 1

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

PREDICTION OF COVID-19 USING MACHINE LEARNING APPROACHES

UTKARSH ANIL BAVISKAR

Abstract The outbreak of the COVID19 virus, called SARSCoV2, has created a pandemic situation worldwide. The cases of COVID19 are increasing rapidly every day. Machine learning (ML) and cloud computing can be implemented very effectively to track diseases, predict epidemic growth, and develop strategies and guidelines to control their spread. This study uses improved mathematical models to analyze and predict the growth of virus. We appliedanimprovedML based model to predict potential COVID 19 threats in countries around the world. Prediction of COVID 19 can be done by iteratively weighting to approximate the generalized inverse Weibull distribution. It has been deployed on cloud platforms to more accurately and realistically predict the dynamics of epidemic growth. It is a more accurate data driven approach as it can be very useful for government and citizens of the nation. Hence, we propose a research and setup grounds for further research.

Key Words: Machine learning, COVID 19, AI, SVM, Random Forest, Decision Tree, Linear Regression

1. INTRODUCTION

The novel coronavirus infection (COVID 19) was first reportedinWuhan,HubeiProvince,ChinaonDecember31, 2019. It began to spread rapidly around the world. The cumulativeincidenceofthisvirus(SARSCoV2)isincreasing rapidly and has affected 196 countries and territories, of which the United States, Spain, Italy, United Kingdom and Francehavebeenmostaffected.WHOhasdeclaredaglobal pandemicforthecoronavirusinfection(COVID 19),andthe virus continues to spread. As of May 4, 2020, there were 3,581,884 confirmed cases and 248,558 deaths. The main differencebetweentheCoV2pandemicandrelatedviruses such as severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) is that CoV2 spreadsrapidlythroughhumancontactandnearly20%of infectedsubjectsremainasymptomaticcarrier.Inaddition, variousstudieshavereportedthatCoV2 induceddiseaseis moreatriskforpeoplewithweakenedimmunesystems.The elderly and those with life threatening diseases such as cancer, diabetes, neurological diseases, coronary heart disease and HIV/AIDS are more vulnerable to the serious consequencesofCOVID 19.Intheabsenceofdrugs,theonly solutionistoslowthespreadofthevirusbyapplying"social distancing"tobreakthetransmissionchain.Thisbehaviorof CoV2 requires the development of a robust mathematical framework to track the spread and the automation of trackingtoolstomakedynamicdecisionsonline.Innovative

solutions are needed to develop, manage, and analyze big dataforgrowingtargetnetworks,patientinformation,and big data for movement within communities, as well as integration with clinical trial and pharmaceutical data, genomicdataandpublichealthdata.Multipledatasources, including text messages, online communications, social media,andwebarticles,canbeveryusefulinanalyzingthe increase in infections caused by community behavior. By wrappingthisdatawithmachinelearning(ML)andartificial intelligence(AI),researcherscanpredictwhenandwherea disease may spread and notify the area to agree on the necessaryaction.Byautomaticallytrackingthetravelhistory of infected subjects, you can study epidemiologic correlationswithdiseasespreadinspecificcommunities.

2. MOTIVATION AND OUR CONTRIBUTIONS

ML can be used to process large amounts of data and intelligentlypredictthespreadofadisease.Cloudcomputing canbeusedtorapidlyimprovetheforecastingprocesswith high speed computing. New energy efficient peripheral systems can be used to collect data to reduce power consumption.Inthisarticle,wepresentapredictivemodel deployedusingtheFogBusframeworktoaccuratelypredict thenumberofCOVID 19cases,anincreaseanddecreasein the number of cases in the near future, and the date the pandemiccanbeexpectedtoendinothercountries.Wealso provideadetailedcomparisonwiththebaselinemodeland showhowdevastatingtheimpactcanbeifapoorlymatched modelisused.Weempowergovernmentsandcitizenstobe proactive by presenting a prediction framework based on machinelearningmodelsthatcanbeusedtomakereal time predictions from remote cloud nodes. In conclusion, we summarizethisworkandpresentvariouslinesofresearch.

3. SOFTWARE PLATFORMS

Python

Pythonisanopensourceprogramminglanguage Currently in high demand in the IT industry. It is mainly used for machinelearningforwebsitedevelopment,dataprocessing, software,etc.PythonisalmostsimilartoC,exceptthatthe codingsyntax is different.Youcan performmany types of tasksandusethemtobuildmachinelearning,dataanalysis, andcomplexstatisticalcalculations.

MachineLearningModel

ManyrecentstudieshaveshownthatthespreadofCOVID 19followsanexponentialdistribution.Empiricalestimates

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

©
| Page903
***

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

of the SARSCoV2 pandemic and previous data sets have shownthatmanysourceshavealargenumberofoutliersin thedatacorrespondingtonewcasesovertime,whichmay ormaynotfollowastandarddistributionsuchasaGaussian or exponential distribution. In a recent study by the SingaporeUniversityofTechnologyandDesign(SUTD)Data

Driven Innovation Lab, a Susceptible Infected Recovered model was used to construct a regression curve and distributedaGaussiandistributiontoestimatethenumber of cases over time. However, in the previously described study, an older version of the virus called SARSCoV2 followedthegeneralizedinverseWeibulldistribution(GIW) betterthanGaussian.

4. PREDICTION MODEL AND PERFORMANCE COMPARISONS

Themachinelearning (ML)and data sciencecommunities are working hard to improve the predictions of epidemiologic models and analyze information sent via Twittertodevelopgovernancestrategiesandevaluate the impactof policies to contain the spread.Various data sets have been published publicly on this topic. However, as COVID 19spreadsglobally,moredataneedstobecollected, processedandanalyzed.Thenovelcoronavirusishavinga serioussocio economic impact worldwide.Countrieswith large populations should be more vigilant as the virus is easilytransmittedthroughdropletsorrunnynose,mainly whenaninfectedpersoncoughsorsneezes.Togetdetailed information about the impact of COVID 19 on the world's population,andtopredictthenumberofCOVID 19casesin different countries and when the pandemic is expected to end,weproposeamachinelearningmodelthatcanberun continuouslyinclouddatacenters.(CDC).Accuratelypredict outbreaksandproactivelydevelopstrategicresponses.

5. DATASETS

ThedatasetusedinthiscasestudyisourworldinData2by HannahRitchie.Thedatasetisupdateddailybasedonstatus reportsfromtheWorldHealthOrganization(WHO).More informationaboutthedatasetcanbefoundonourwebsite: https://ourworldindata.org/coronavirus source data

6. ALGORITHMS

1.SVM

SVM was chosen because it transforms an inseparable problemintoaseparableproblembyusingakerneltrickto transform a low dimensional input space into a high dimensionalspace.Wesplitthedatasetintoatrainsetanda testsetinaratioof7:3,andusingalinearkernel,theSVM classifierusesahyperplanetolinearlydividethedata.Each data class is separated by parallel hyperplanes to keep distancesaslargeaspossible.

2.RandomForest,

RandomForestisanensembletechniquethatcanperform both regression and classification tasks using multiple decision trees and techniques commonly known as bootstrap and aggregation known as packaging. The main idea behind this is to combine multiple decision trees to determine the final result instead of relying on separate decisiontrees.

TheRandomForesthasseveraldecisiontreesasthebasic learning model. We perform row sampling randomization and feature sampling from the dataset forming a sample datasetforeachmodel.ThissectioniscalledBootstrap.

3.DecisionTree

Decisiontreesareverysuccessfulclassifiersappliedinmany domains. Decision trees are constructed using a recursive partitioningprocessinwhichdatapointsarepartitionedat eachnodeusingselectedpartitioningcriteria.Thepathfrom therootnodetothesheetistheruleusedforprediction.An ensemble of classifiers consists of a set of classifiers [18]. The final decision is the combination of all member classifiers. Ensembles generally perform better than individual members when their individual members are preciseandvaried.Decisiontreeensemblesarefairlyrobust andperformwell.Theexperimentusesseveralensemblesof decision trees. Decision tree ensembles designed for unbalanced datasets are also used because the data are unbalanced.

4.Linearregression

Since we are dealing with COVID 19 data, observations suggestthatCOVID 19datadonotfollowalinearproperty. Rather,itfollowsalinearpropertyforashorttimeandthen changes direction. In this case, it is not suitable for linear regression, and if you still use it for linear regression, the predictionisfarfromreal.

©
Journal | Page904
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified
SCORE Name of Algorithm MAE MSE SVM 113952693.00 743757 1.4189100362824158e+ 16 Linear Regression 29503683.939 641554 876863594001211.8 DecisionTree 25613676.380 95238 869104703463814.2 Random Forest Classifier 25613676.380 95238 869104703463814.2
6. MAE/MSE

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

7. RESULTS

Fig.7.4:LinearRegressionPrediction

Fig:7.1:NumberofCases

fig.7.5:DecisionTreePrediction

Fig.7.2:NumberofCasespie chart

fig.7.6:RandomForestPrediction

Fig.7.3:SVMPredictions

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

Page905
©
|
International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page906 fig.7.7:FinalPrediction 8. OUTPUT SCREENS fig.8.1:Dashboard Fig.8.2:DashboardwithFAQquestions Fig.8.3:PredictionusingWeibullDistribution Fig.8.4.SelectcountryforpredictionwithWeilbull Distribution Fig.8.5.FinalPredictionusingWeilbullDistribution fig.8.6:PredictionusingMLalgorithms

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

9. SUMMARY AND CONCLUSIONS

Inthisstudy,wediscussedhowmachinelearning,andcloud computingcanhelpinpredictingofthegrowthofpandemic. Additionally, case studies have been published demonstrating the severity of the spread of CoV 2 in countries around the world. Using the proposed robust Weibullmodelbasedoniterativeweighting,weshowthat ourmodelcanmakestatisticallybetterpredictionsthanthe baseline. The baseline Gaussian model shows an overly optimisticpictureoftheCOVID 19scenario.SVMalgorithm having MAE as 113952693.00743757 and MSE as 1.4189100362824158e+16.

10. REFERENCES

[1] COVID Live Coronavirus Statistics Worldometer. (2021).https://www.worldometers.info/coronavirus/R. Nicole,“Titleofpaperwithonlyfirstwordcapitalized,”J. NameStand.Abbrev.,inpress.

[2] Wang,C.,Horby,P.W.,Hayden,F.G.,&Gao,G.F.(2020). Anovelcoronavirusoutbreakofglobalhealthconcern. The Lancet, 395(10223), 470 473. https://doi.org/10.1016/s0140 6736(20)30185 9.

[3] GuangdiLiandErikDeClercq.Therapeuticoptionsfor the2019novelcoronavirus(2019 ncov),2020.

[4] SmritiMallapaty.Whatthecruise shipoutbreaksreveal aboutcovid 19.Nature,580(7801):18 18,2020.

[5] Kai Liu, Ying Chen, Ruzheng Lin, and Kunyuan Han. Clinical features of covid 19 in elderly patients: A comparison with young and middle aged patients. JournalofInfection,2020.

[6] Shi Zhao, Qianyin Lin, Jinjun Ran, Salihu S Musa, GuangpuYang,WeimingWang,YijunLou,DaozhouGao, LinYang,DaihaiHe,etal.Preliminaryestimationofthe basicreproductionnumberofnovelcoronavirus(2019 ncov) in china, from 2019 to 2020: A data driven analysisintheearlyphaseoftheoutbreak.International JournalofInfectiousDiseases,92:214 217,2020.

[7] Shreshth Tuli, Shikhar Tuli, Gurleen Wander, Praneet Wander, Sukhpal Singh Gill, Schahram Dustdar, Rizos Sakellariou, and Omer Rana. Next generation technologies for smart healthcare: Challenges, vision, model, trends and future directions. Internet TechnologyLetters,pagee145.

[8] Chaolin Huang, Yeming Wang, Xingwang Li, Lili Ren, JianpingZhao,YiHu,LiZhang,GuohuiFan,JiuyangXu, XiaoyingGu,etal.Clinicalfeaturesofpatientsinfected with 2019 novel coronavirus in Wuhan, China. The Lancet,395(10223):497 506,2020.

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

Page907

Turn static files into dynamic content formats.

Create a flipbook