ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

ANALYSIS AND PREDICTION OF RAINFALL USING MACHINE LEARNING TECHNIQUES

Anurag Kumar1 , Lalsingh Chouhan2

1Assistant professor, CSE, Dr. A.P.J. Abdul Kalam UIT Jhabua, Madhya Pradesh, India

2Assistant professor, CSE, Dr. A.P.J. Abdul Kalam UIT Jhabua, Madhya Pradesh, India ***

Abstract - : Weather and climate prediction are dominated by high dimensionality, interactions on many differentspatialandtemporalscales,andchaoticdynamics. Machine learning techniques can predict rainfall by extracting hidden patterns from historical weather data. In this technique apply the Multiple Linear regression (MLR) and Support vector regression (SVR) model for rainfall prediction. To design and implement the system, we have gathered 115 years of data from 1901 to 2017 from Kaggle. Our proposed model has been tested and validated with respect to Multiple Linear regression and Support Vector regression. Compared results reveal the satisfactory performance, the SVR had provided maximum accuracy

Key Words: Rainfall prediction, Machine Learning, Linear Regression, Support Vector Regression, Accuracy

1. INTRODUCTION

Weather forecasting on the basis of historical data is a complexbutveryhelpfultask.Whichcomeswithseveral problems that require to be solved in order to achieve optimalresult.Rainfallpredictionisimportantalloverthe worldanditplayakeyroleinhumanlife.It’sdifficultto predict rainfall precisely with varying atmosphere conditions. Accurate rainfall predictions are crucial for severalareasofsocietyspeciallyinagriculture.Indiaisan agricultural country and therefore the success of agriculturedependsofrainfall.Thereareseveralrecourses forwaterbutinIndiaagricultureisusuallydependenton rainfall. The weather has a significant impact on the agriculturalindustryandbecauseofthat,havingtheability to predict it helps farmers in their day-to-day decisions such as how to plan efficiently, minimize costs and maximize yields. The concept of machine learning is gettingusedineverysectortoreducethelabourcostand increase the productivity. Every Machine learning algorithm has three steps: Depiction, judgment, development. Depiction guides us to represent the discoveredknowledgedonefromthedatamining.Herewe have used the two most popular machine learning techniques to predict the rainfall. Those techniques are SupportVectorregressionandMultipleLinearRegression

Linear Regression [1][12] is very useful for finding

relationship between two continuous variables, one is independentvariableandanotherisdependentvariable.In Statistics,Linearregressionreferstoamodelwhichshow relationship between two variables and how one can impacttheother.InLinearRegression,itshowshowthe variationinthe“dependentvariable”canbecapturedby changeinthe“independentvariables”.LinearRegressionis statistical technique which used to generate insights on consumerbehaviour,understandingbusinessandfactors influencingprofitability.Linearregressionscanbeusedin business to evaluate the trends and make decision for future. For example, if an organisation’s sales have increasedregularlyeverymonthforthelastfewyears,by conductinglinearanalysisonthesalesdatawithmonthly sales,thecompanycouldforecastsalesinfuturemonths. We have used Multiple linear regression model, unlike simple linear regression MLR has multiple independent variables. SVRisaregressionalgorithm,sowecanuseSVR forworkingwithcontinuousValuesinsteadofClassification which is SVM [2]. In regression technique we try to minimisetheerrorratewhileinSVRwetrytofittheerror withinacertainthreshold.

2. RELATED WORK

Therearemanyworksintheliteraturefortheprediction ofrainfall.Thissectiondiscussessomeoftheworkrelated toourproposedmethodology.

KumarAbhisheketal.haveproposedarainfallprediction technique using neural network in [3]. The proposed model in [3] predicts the rainfall of Udupi district from Karnataka state of India. BPNN with feed forward, layer recurrent and BPNN with cascade feed forward neural networks are experimented. The proposed model takes 70% of the data for training and 30% for testing. The recurrentnetworkgivesbetteraccuracywhencompared toBPNN.TheMSEishighinBPNN

NasimulHasan,NayanNath(2015)thispaperexhibitsa robust rainfall prediction technique in view of recent rainfall data of Bangladesh using Support Vector Regression(SVR),arelapsemethodologyofSupportVector Machine(SVM).Itwaschallengingtomakea100percent perfectpredictionandthedatawaspreprocessedmanually

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1908

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

to suit the algorithm [5]. The evaluation results of the study conducted on the data shows that the projected technique performs higher than the conventional frameworksintermofaccuracyandprocessrunningtime [5]. Approach yielded the utmost prediction of almost 99.92%.

G.Mahalakshmi and S.Sridevi (2016) presented a paper which gives detailed survey of the various techniques appliedforforecastingdifferenttypesoftimeseries.This survey covers the overall forecasting models, the algorithmsusedwithinthemodelandotheroptimization techniquesusedforbetterperformanceandaccuracy[6]. Thevariousperformanceevaluationparametersusedfor evaluatingtheforecastingmodelsarealsodiscussedinthis paper [6]. This study gives the reader an idea about the variousresearchesthattakeplacewithinforecastingusing thetimeseriesdata.

Paper proposed by [7] introduced rainfall prediction systemusingdeepminingKNNtechnique.AsingleKvalue isgivenwhichisusedtofindthetotalnumberofnearest neighbours that helps to determine the class label for unknowndata.Similarparametersareclusteredintosame typeofclusterandthuswiththehelpofKNNwedetermine thecategoryofaspecificdatasets.Thisalgorithmdoesnot require time for training of classification or regression. Thissystemmaynotleadtogoodaccuracyiftheincorrect valueofKispicked.

Sandeep Mohpatra and Animaka Upadhyay (2017) presented a paper that focuses on use of data mining techniques for predicting rainfall of an area on basis of some dependent features like precipitation and wet day frequency.Theyhavecollecteddataforyearsrangingfrom 1901to2002ofBangalore,India[8].Theregressionmodel developed has been trained and validated against the actual rainfall of that area. The performance of the algorithmwasfurtherboostedusingEnsembletechniques usingk-fold[8].

Chandrasegar,KSHarsha(2017)carriedexperimentona heuristic prediction of rainfall using machine learning techniques. This paper discusses the rate of rainfall in previousyearsaccordingtovariouscropseasonslikeRabi, KharifandZaidandpredictstherainfallinfutureseasons [9]. Also, it measures the different categories of data by linear regression method. Results help farmers to make correctdecisiontoharvestaparticularcropaccordingto cropseasons.Linearregressionmethodsuggeststhelower correlationbetweenvariouscropseasons[9].

3. METHODOLOGY

InthispaperwehaveusedMultipleLinearregression andSupportVectorregressiontopredicttheamount ofrainfall.

3.1 Machine Learning Model

The proposed method is based on the multiple linear regressionandsupportvectorregression.Thedataforthe predictioniscollectedfromthepubliclyavailablesources andthe70percentageofthedataisfortrainingandthe30 percentage of the data is used for testing. Figure 1 describestheblockdiagramoftheproposedmethodology. Multipleregressionisusedtopredictthevalueswiththe helpofdescriptivevariablesandisastatisticalmethod.It is having a linear relationship between the descriptive variable and the output values. The following is the equationformultiplelinearregression:

Yi = β0 + β 1x1 + β 2x2 + ... β kxk + ε

Here we are using "k" for the number of predictor variablesandwehave k+1regressionparametersWhere, β0 is constant term, β1 variable is coefficient of x1, β2 variable iscoefficientfor x2, βk is xk coefficientvariable and ε is error associated with predicted value. Support VectorRegression(SVR)usesthesame principleasSVM,

Figure 1. block diagram of the proposed methodology

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1909

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

but only for regression problems. SVR works on the principle of structural risk minimization from statistical learningtheory[2]andestablishesahyperplanethatcan predictthedistributionofdata.TheprincipleoftheSVR algorithmhasagivensetofinputtrainingdataset {(Ai, Bi), i = 1, 2,...,k}, xi ∈ RM ,whereAi istheinput3-Dvector, Bi ∈ R is the response output data, and k is the number of samplings.Theoptimallineardecisionfunctioninthehighdimensionalfeaturespaceisexpressedasfollows:

f (xI) = ωA + b

where ω referstoweightvectorsand b denotesthebias.

3.2 Data and Sources of Data

For this study data has been collected from the publically available source Kaggle [11], it contains the monthlyrainfallofeachstateofIndiaformJan1901toDec 2017.

3.3 Performance metrics

Mean Absolute Error:

MAE [10] is the average of the absolute differences betweentheactualvalueandthemodel’spredictedvalue. ThebiggertheMAE,themoreserioustheerroris.

Root Mean Squared Error:

RMSEorRootMeanSquaredError[10]issimilartoMSE, just the final value is square rooted and calculated the squareoferrorsinMSE.

Mean Squared Error:

MSEorMeanSquaredError[10]isoneofthemostpopular metricsforregressionalgorithms.Itissimplytheaverage oftherealvalue’ssquareddifferencewiththeregression model’spredictedvalue.

R2 Error:

R² or Coefficient of Determination is a prevalent metric [10].R²usestwomeansquarederrorcalculations.While thefirstisthemeansquareofeachrealvalueversusthe averageofobservations,thesecondisthemeansquared erroroftheactualvalueversusthepredictedone.

Mean Absolute Percentage Error:

MAPE or Mean Absolute Percentage Error [10] is the averageabsolutedifferencebetweentheactualvalueand thevaluepredictedbythemodeldividedbytherealvalue

4. RESULTS AND DISCUSSION

Wehaveconsidereddatasetfrom1901to2017ofBihar state,India.Visualizationfromdifferentgraphshelpusto understandmoreaboutthe dataanddrivesus todecide thenextsteptotaken.Itprovidesimportantperceptions.

Forecastinggivesappropriateandreliableinputregarding to present, past and future activities with definite numerical and scientific methods. There are some steps involvedinpredictingthenumericalvaluesforaspecific task.Initialstepistorecognizetheproblemwithcomplete analysisandsecondiscollectingtheappropriatedata to analyze the problem for further estimation. After estimation,comparetheactualandestimatedvalueswith necessaryactions.Thedataisarrangedinsuchawaythat rainfall is plotted according to year i.e., yearly counts of rainfallshowningraph1.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1910

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

Graph 1 Overall monthly data plot of rainfall from 1901 to 2017

Graph 2: Stacked bar chart of each year © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1911

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

The stacked bar chart above depicts the rainfall of each monthforaparticularyear,acrosseachmonth.wecansee fromthesortedoverallbarheightsthatyear1987hasthe highestrainfalland2010haslowest.

Graph3showtheminimum,maximumandmadianrainfall ineachmonthusingboxplot.Itclearlyindicatesthat,

The rainfall in the months January, February, March, April, November and December is very less.

TherainfallinthemonthsMayandOctoberis average.

TherainfallinthemonthsJune,July,August,and Septemberarehighcomparedtorainfallin othermonthsoftheyear.

Wecanseeaseasonaleffectwithacycleof12months.

Graph 3: Box Plot graph describing the rainfall in each month.

Graph4 showsrainfallineachmonthfrom1901to2017.

Graph 4: Monthly rainfall through history.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1912

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

we divide the data into Train and Test Sets: Number of entries(trainingset,testset):(982,422)Nowwecompare theMLRandSVRmodeltounderstandwhichmodelgives betterresult. Splittingthedatasetintotrainandtestdatawe havetaken70%fortrainingand30%fortestingthemodel. Atotalof982traindataand422testdataisused. Plotting

thescatter plotof actual andpredicted rainfall weget the followinggraphs.Graph5showsthescatterplotofactualvs predicted rainfall using MLR model. In Graph 6 we can clearly see the comparison between each actual and predictedvalue.

Graph 5: Scatter Plot of Actual vs Predicted rainfall using MLR

Graph 6: Actual vs Predicted rainfall using MLR

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1913

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

Graph 7: Scatter Plot of Actual vs Predicted using SVR

Graph 8: Actual vs Predicted rainfall using SVR

©
Journal | Page1914
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072

Graph7showsthescatterplotofactualvspredicted rainfallusingSVRmodel.

InGraph8wecomparedeachactualandpredictedvalue.

Table 1 Comparison of MLR and SVR Performance

S. No. matrices LR SVR

1 Trainscore 0.827149236683498 0.828585087676299

2 Testscore 0.811790903215551 0.836907038442013

3 MAE 0.249229855446679 0.248383868743224

4 MSE 0.154636268612835 0.151737104047022

5 RMSE 0.393238183055556 0.389534470935528

6 MAPE 0.674927107614465 0.614009940253406

7 R2Score 0.833790903215551 0.836907038442013

5. CONCLUSION

Here,weareusingtimeseriesanalysistopredicttherainfall usingmonthlyrainfallfromyear1901to2017.Foryieldto accuracy,machinelearningalgorithmssuchasMLRandSVR, wereimplementedandtestedonthegivendatasetsfromthe Bihar states. Both algorithms are compared with their accuracy.Comparingthedifferentperformancematrices,we canconcludethatSVRaccuracyisbetterthanMLR.

REFERENCES

[1] AmanpreetSingh,NarinaThakur,AakankshaSharma “Areviewofsupervisedmachinelearningalgorithms”, 20163rd InternationalConferenceonComputingfor SustainableGlobalDevelopment(INDIACom)IEEEOct 2016.

[2] Mariette Awad, Rahul Khanna, “Support Vector Regression, “Efficient Learning Machines Theories, Concepts,andApplications forEngineersandSystem Designers”Apress(pp.67-80),January2015.

[3] KumarAbhishek,AbhayKumar,RajeevRanjan,Sarthak Kumar, “A Rainfall Prediction Model using Artificial Neural Network”, 2012 IEEE Control and System GraduateResearchColloquium(ICSGRC2012),pp.8287,2012.

[4] H.M.Meighani,C.Ghotbi,T.J.Behbahani,andK.Sharifi, “Evaluation of PC-SAFT model and support vector regression(SVR)approachinpredictionofasphaltene precipitation using the titration data,” Fluid Phase Equilibria,vol.456,pp.171–183,Jan.2018.

[5] NasimulHasan,NayanChandraNath,RisulIslamRasel, “A Support Vector Regression Model for Forecasting Rainfall”, Proceeding of International Conference on ElectricalInformationandCommunicationTechnology (EICT2015),IEEE,554-559.

[6] G.Mahalakshmi,Dr.S.Sridevi,Dr.S.Rajaram,“ASurvey onForecastingofTimeSeriesData”,IEEE,2016.

[7] Zahoor Jan, Muhammad Abrar, Shariq Bashir and Anwar M Mirza, "Seasonal to interannual climate prediction using data mining KNN technique", InternationalMulti-TopicConference,pp.40-51,2008.

[8] Sandeep Kumar Mohpatra, Anamika Upadhyay, ChannabasavaGola,“RainfallPredictionBasedon100 yearsofMeteorologicalData”,InternationalConference on Computing and Communication Technologies for SmartNation(IC3TSN),IEEE,2017,162-166.

[9] Chandreshekhar Thirumalai , M. Laxmi Deepak, K Sri Harsha,KChaitanyaKrishna,“HeuristicPredictionof Rainfall using Machine Learning Techniques”, International

[10] Ravish Raj “Evaluation Metrics for Regression Models in Machine Learning” https://www.enjoyalgorithms.com/blog/evaluationmetrics-regression-models

[11] https://www.kaggle.com/datasets/saisaran2/rainf all-data-from-1901-to-2017-for-india

[12] Thirumalai, C., Harsha, K. S., Deepak, M. L., & Krishna, K. C. (2017). Heuristic prediction of rainfall usingmachinelearningtechniques.2017International Conference on Trends in Electronics and Informatics (ICEI)

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1915

Turn static files into dynamic content formats.

Create a flipbook