Heart Disease Prediction Using Multi Feature and Hybrid Approach

Page 1

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Heart Disease Prediction Using Multi Feature and Hybrid Approach

Smt. Akshitha Katkeri1, Nagashree M S2, Shilpa R3, Srilakshmi N4, Srilalitha C S5

1Assistant Professor, VTU, Department of CSE, BNM Institute of Technology, Bangalore, Karnataka,INDIA 2,3,4,5VTU, Department of CSE, BNM Institute of Technology, Bangalore, Karnataka, INDIA ***

Abstract - Heart disease is a build up of fatty plaques in the arteries and calcium outside the major artery. Many techniques have been used for the ailment of this problem by using various algorithms. These manual method of consultation is difficultand time consuming in severe cases. This study proposes an easy method of user system interaction by hybrid approachof several algorithms like logistic regression, Gaussian NB, linear SVC, K Neighbours, Decision Tree and Random Forest. In this hybrid approach the best performed algorithm is used in the final evaluation. Results: For heart disease detection, The Linear SVC model achieved best results with accuracy: 90.78%, precision: 96.87%, sensitivity: 83.78%,F1 score: 89.85%, ROC: 90.60%. Conclusion: This proposed system illustrates the use of interactive system to predict heart disease by using multi feature classification and hybrid approach which has promising results compared the previous studies and methods.

Key Words: Gaussian NB, Linear SVC, Random Forest, K Neighbours, Decision Tree, Random Forest, Arteries.

1. INTRODUCTION

Heartdiseaseisalsoknownascardiovasculardisease(CVD)whichremainsasthenumberonereasonfordeathrateglobally. TherearevariousCVDdiseases,suchasangina,heartfailure,Coronaryheartdisease,congenitalheartdiseaseandsoon.Nearly, 17.9millionpeoplearelosingtheirliveswhoareattheearlyageof70’sbecauseofthisCVD.Themainriskfactorsofheart diseasenowadaysareduetounhealthydietplans,intakeofalcoholandtobacco,smoking,lackofphysicalactivitiesandstressdue towork.Theeffectsoftheseriskfactorsleadtoraiseinbloodpressure,bloodlipids,overweightandsoon.Theothermain reasonforCVDisbecauseofthebuildingupofcalciuminmajorarteryoutsidetheheartwhichispredictedasfutureheartattack orstroke.Themoreextensivethecalciuminthewallsofbloodvessel,thegreaterwillbetheriskoffutureCVD.

Thereareseveralclassifiersusedtodetectheartdiseasesuchaslogisticregression,GaussianNB,LinearSVC,DecisionTree,K Neighbours,andRandomForest.LogisticRegressionisasupervisedmachinelearningalgorithmthatisused tomodelthe probabilityofacertainclassoranevent.Itisusedwhenthedataislinearlyseparableanditsoutcomeisbinaryinnature.

GaussianNBisagenerativemodel.ItassumesthateachclassfollowsaGaussiandistribution.Itisusedspecificallywhenthe featureshavecontinuousvalues.

LinearSVCistofittothedataprovidedandresultingthebestfithyperplanewhichcategorizesthedata.Aftergettingthehyper plane,somefeaturescanbefedtotheclassifiertocheckwhatthepredictedclassis.

DecisionTreeusesvariousalgorithmstodecidetosplitanodeinto2ormoresub nodes.Asthesub nodesincreasesitspurity alsoincreases.Thedataissplitcontinuouslyaccordingtothespecifiedparameters.

Themaingoalofthisstudyistodevelopahybridmodelofallthealgorithmsthatbestsuitthepredictionandmakethemodel more accurate by people having the knowledge about their health condition much before so that they can have aproper treatmentandgetcuredwithoutanyseriousissues.Thereby,reducingthedeathrategloballyduetoheartdisease.

2. METHODOLOGY

Theproposedmethodologyaimstopredictweatherthepatientissufferingfromtheheartdiseaseornot.Thisautomation helps doctorstoanalyzethecriticalconditionofthepatients.Henceitalsohelpsinimprovementoftreatments.Patientscantakemany precautionsandhelpstosavemanylives.Inthisproject,weareusingvariousalgorithmsi.e.weareimplementingbyusing hybridtechnologywithmulticlassdataset.Multiclassdatasetrepresentsvariouslevelsi.e.0,1,2,3,4.Themodelmakesuseof severalmachinelearningtechniquesandalgorithmsinanefforttoofferamorepreciseanswertotheproblem.NumerousML techniquesareusedhereonthedataset.Forinstance,theKNearestNeighborsmethod,RandomForest,LogisticRegression, GaussianNB,RegressionTree,etc.Ahybridmodeliscreatedemployingallthesetechniquesforincreasedaccuracy.Additionally, themodelworkswithpracticallyallpatientrecordtypes.Pre processingofthedatasetinvolvesreducingnoiseandoutliers.The datasethasnowbeensplitintotrainandtestdata.Datathatis75%trainedisreferredtoastraindata.Datadeemedtobetest datamakeup25%ofthetotal.Thefigure basedmethodsbelowareusedtogeneratetheMLmodels.

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2425

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Fig 1:FlowChart

Thebelowfigureshowsthedataflowoftheproposedmodel: Fig -2:ProposedSystem

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2426

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

2.1 Data source and Dataset Description

Inthisproject,thedatasetistakenfromUCIRepositoryofMachineLearningDatabases.Thisdatasetcontainsatotal of303 recordswith14medicalfeatures.Theoriginalvalues1,2,3,4weretransformedinonethatisthepresenceofheartdisease.All featureshavesomevaluesinthedataset.Itisexplainedinthebelowtable. 1 Age Age Ageofpatientsinyears. 2 Sex Sex 0isforfemales, 1isformales 3 ChestPain Cp 1=typicalangina 2=atypicalangina 3=non anginapain 4=asymptomatic

Sr.no Attribute Attributerepresentation Description

Restingbloodpressure Trestbps Bloodpressure 5 Serumcholesterol Chol MinimumCholesterol:126 MaximumCholesterol:564 6 Fastingbloodsugar Fbs 0=false 1=true 7 Restelectrocardiograph Restecg 0=normal 1=abnormalityofST 2=leftventricularhypertrophy 8 MaxHeartrate Thalach Maximumheartrateachieved 9 Exercise inducedangina Exang 0=no 1=yes 10 STdepression Oldpeak Exerciseinducedangina 0=no 1=yes 11 Slope Slope Slopeofpeakexercise 1=unsloping 2=flat 3=downsloping 12 Noofvessels Ca Major vessels colored (0 3) by fluoroscopy 13 Thalassemia Thal 3=normal 6=fixeddefect 7=reversibledefect

Table 1: DetailedHeartdiseasedatasetattributeswiththedescription

Table 2: Staticstudyofthedatarelatedtoheartdisease

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2427
4

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

2.2 Data pre processing

Theprocedureusedtoefficientlyprepareadatasetforcategorizationisknownasdatapre processing.Theremightbe somemissingvaluesinthereal worlddatathathasbeengatheredandsavedinthedatabase.Thisisthemosttypicalissue becauseeverypatientwouldhaveenteredtheirinformationincorrectly.Thenormalizationoftheattributedatafillsinthe missingvalues.

=

Where �� = mean, �� = standard deviation, �� = single value feature. Utilizinga unit mean and zero variance, the data characteristicsarestandardized.

2.3Featureextraction

Thegoaloffeatureextractionistoachievetheaimbyextractingasubsetofnewfeaturesfromtheoriginalsetusingsome functionalmapping.Theextremelysignificantcharacteristicsarechosenforpredictiononcethefeaturesignificancegraphis plottedforfeatureextraction.

Chart 1: FeatureImportancePlot

Thebelowfigureshowstheattributedistribution.

Chart 2 : VisualizationoftheData

©
Journal | Page2428
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified
��
�� �� ����
��
�� ��

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

3. MODELING AND ANALYSIS

3.1 Classification

Followingpre-processing,thedataisseparatedintotraindataandtestdata.Inthisstudy,thehybridmodelisproposed.On thetraindata,avarietyofclassificationtechniquesareusedtotrainthemodel.GaussianNB,LinearSVC,LogisticRegression, DecisionTreeClassifier,RandomForestClassifier,KNN,andSVMarethealgorithmsemployedinthesuggestedmodel.

3.2 Confusion matrix

Atablecalledaconfusionmatrixisusedtodescribehowwellaclassificationsystemperforms.Aconfusionmatrixshows andsumsupaclassificationalgorithm'sperformance.Theconfusionmatrixforeachclassifierisshownbelowinthefigures.The followingisadefinitionofeachentryintheconfusionmatrix:

Thetotal number of accurate findings or hypotheses where the real class was positive is knownas the true positiverate(TP).

Thetotalnumberofinaccuratefindingsorforecastsmadewhiletheactualclasswaspositiveisknownasthe falsepositiverate(FP).

Thetotalnumberofaccuratefindingsorhypotheseswheretheactualclasswasnegativeisknownasthetrue negativerate(TN). 

Theamountofincorrectoutcomesorpredictionsmadewhentheactualclasswasnegativeisknownasthefalse negativerate(FN).

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2429
Fig 3. Random
Forest
classifier Fig 4. LogisticRegression Fig 5. GaussianNB Fig 6. LinearSVC

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Fig 7. Decision Tree Fig 8. KNN

Fig 9. SVM

Table 3. Performancetableofclassifieralgorithms

Thealgorithmwiththegreatestaccuracyscoreistakenintoconsiderationforthepredictionofthepatient'sheartillnessbased ontheperformancetableofclassifieralgorithms.TheclassifierusedforheartdiseasepredictionusestheLinearSVCmodel sinceitprovidesthebestaccuracycomparedtoallotherclassifiers.

4. RESULTS

4.1 UI Design

Aresultisthefinalconsequenceofactionsoreventsexpressedqualitativelyorquantitatively.Performanceanalysis isan operationalanalysis,isasetofbasicquantitativerelationshipbetweentheperformancequantities.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page

2430

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Figure 10: LoginPage

Thisistheloginpagewhereusercanenterintothewebsitebyenteringnameandthepasswordsetbyuseronceregisteredtothe website.Oncethedetailsareentered,itgoesbackcheckswhetherthedetailsgivenwhileregisteringmatch.Ifitmatchesthenit allowstologinintowebsite.

Figure 11: SignUpPage

Theabovefigureshowsthesign uppage.Iftheuserisvisitingthewebsiteforthefirsttime,thentheusershouldregisterto thesitebyprovidingthepersonalinformationaskedintheform.TheuserafterfillingthedetailsshouldclickonCreateAccount button.Iftheuserisalreadyregistered,thenthealertisshownthattheuserisalreadyregistered.Iftheuserisregistered,then alreadyhaveone?Optionisclicked.Thebuttontakestheusertologinpage.

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

2431
©
| Page

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Figure 12: DatabasePage

Theabovefigureshowsthedatabasepage.Thedataprovidedbytheuserduringregistrationisstoredinthedatabase.Eachuser has unique username. If the new user registers by providing the username already existing, itshowstheuseralreadyexists promptingusertoinputtheotherusername.Theuserwhilelogging in,hastoenterthecorrectusernameandpassword.Ifthe usernameandpasswordprovidedbytheuserismatchedwiththedatabase,thentheuserissuccessfully.

Figure 13: InputPage

| Page2432
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Theabovefigureistheuserinputpage.Oncetheuserlogs inintothewebsite,thenheisdirectedtothispage.Userhastoinput thedataaccordingtotheirhealthconditionsbyselectingfromthedropdownoptions.Ageisofintegertype;theuserhastoinput theageinnumbers.TheSexfiledhavetheoptionsmaleandfemale,theuserhastochooseone.TheRestingBloodPressure, SerumCholesterolinmg/dl,MaximumHeartRate,STDepressioninducedaretheintegertype;theuserhastoinputdatafromthe medicalrecordprovided.Theother fieldslikeChestpaintype,FastingBloodSugar,RestingECGResults,ExerciseInduced Angina;theuserselectoneoftheoptionsfromthedropdownmenu.

Figure 14: ResultforTestCases

©
Certified Journal | Page2433 Fig 15. PredictionForm Fig 16. ResultforTestCases
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

Figure 17: DietPlanPage

Theabovefigureisdirectedfromtheresultspagetodietplanpage.Basedontheresultsoftheuserandhow riskythe diseaseisthesystemprovidesadietplantotheusertomaintainhealthconditions.Byfollowingthedietplanusercanbringhis healthconditionsfromseveretonormal.

4.2 Research Implications

Theproposedmethodologyaimstopredictweatherthepatientissufferingfromtheheartdiseaseornot.Thisautomation helpsdoctorstoanalyzethecriticalconditionofthepatients.Henceitalsohelpsinimprovementoftreatments.Auserinterface iscreatedtotaketheinputfromtheuserandthemodelpredictsthepresenceofheartdiseaseandrecommendsdietplans.This isusefulforimprovingtheuser’shealtheffectively.

5. CONCLUSION

HeartDiseasePredictionisaverycommonproblemnow.Thisproposeduserinterfaceplatformhelpseveryonetoregister andloginandgiveinthedataandgettoknowtheirhealthstatus.Basedonthegivendataitpredictswethertheheartdiseaseis presentornot.Thisproposedsystemhelpstoidentifydiseaseinaveryearlystagetopreventdeathrate.Adatabaseisalso createdsothatallthepatient’sdatacanbestored.Itisahybridsystemapproachsuccessfullyusedforheartdiseaseprediction withhigheraccuracyrate.

Theabovefigureshowstheoutputpage.Onceusergivestheinputandselecttopredict,itthenredirectstotheresultpageand afterprocessingitgivestheresultwhetherheartdiseaseispresentornot.Thepagealsocontainstheoption“gobacktohomepage” whichtakestheusertohomepage. © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

34
Page24

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 07 | July 2022 www.irjet.net p ISSN: 2395 0072

6. FUTURE ENHANCEMENTS

Infuture,modelfordirectserviceofthepatientsfromtheoldagehomesorotherhomecarecenterstotheIntensiveCare Unit(ICU)throughambulanceservicescanbeplanned.Anartificiallyintelligentsystemwilltakethedataofclinicalparameters

fromoldagehomesorothercarecenters.Themodelgetsthesingleoutputthatwillrevealdistinctstagesofpatientsinterms of healthy, first/second stage of sickness and critical stage. The system willshowgreencolorifthestatus ofthe personis healthy,andtherespectivepersonwillbeinformedviaSMSthatyouare‘Healthy’.Otherwise,ifthepersonisatthefirst/second stageofsickness,thenaSMS‘Dofrequentmonitoring’willbesenttohis/hermobilenumber.

6. REFERENCES

[1] PronabGhosh,SamiAzam,MirjamJonkman,AsifKarim,F.M.JavedMehediShamrat,EvaIgnatious,Shahana Shultana,AbhijithReddyBeeravolu,FrisoDeBoer,”EfficientPredictionofCardiovascularDiseaseUsingMachine Learning Algorithms with relief and LASSO Feature Selection Techniques“, 10.1109/ACCESS.2021.3053759, VOLUME9,2021

[2] TsatsralAmarbayasgalan,Van-HuyPham,NiponTheera-Umpon(SeniorMember,Ieee),YongjunPiaoAndKeun HoRyu(LifeMember,Ieee),“AnEfficientPredictionMethodforCoronaryHeartDiseaseRiskBasedonTwo Deep Neural Networks trained onwell-ordered training dataset”, 10.1109/Access.2021.3116974,Volume9,2021

[3] Norma Latif Fitriyani, Muhammad Syafrudin, Ganjar Alfian, (Member, Ieee), And Jongtae Rhee, “HDPM: An Effective Heart Disease Prediction Model for a Clinical Decision Support System”, 10.1109/ACCESS.2020.3010511,VOLUME8,2020.

[4] SarriaE.A.Ashri,M.M.El-Gayar,AndEmanM.El-Daydamony,“HDPF:HeartDiseasePredictionFrameworkBased onHybridClassifiersandGeneticAlgorithm”,10.1109/ACCESS.2021.3122789,Volume9,2021.

[5] AqsaRahim,GhulamIshaqKhan,YawarRasheed,FarooqueAzam,MuhammadWaseemAnwar,“AnIntegrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases”, 10.1109/ACCESS.2021.3098688,IEEEAccess2021.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page2435

Turn static files into dynamic content formats.

Create a flipbook