PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING

Page 1

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

PARKINSON’S DISEASE DETECTION USING MACHINE LEARNING

Shreevallabhadatta G1, Suhas M S1, Vignesh1, Manoj C1, Rudramurthy V C2, Bhagyashri R Hanji3

1Students, Department of Computer Science and Engineering, Global Academy of Technology 2Asst.Professor, Department of Computer Science and Engineering, Global Academy of Technology 3Head of the Department, Department of Computer Science and Engineering, Global Academy of Technology ***

Abstract In this Global era, Technology plays an important part in our lives, considering our Lifestyle, Healthcare and maintaining resources and assets. In the field of HealthCare, technology has been growing each day in order to counter different diseases and their symptoms emerging in the present world. One such disease is Parkinson’s Disease. Parkinson’s Disease is a brain neurological disorder. It causes tremors in the body and hands, and also stiffness in the body. At this moment, there is no proper cure or treatment available. Only when the condition is detected early, or at its onset, is treatment possible. These will not only lower the cost of the sickness, but they may also save lives. As a result, a project called "Parkinson's Disease Detection Using Machine Learning Technologies" was launched in try to diagnose the disease at an early stage. Parkinson's disease is a neurological disease that affects the brain's dopamine producing neurons and progresses over time. As a result, various machine learning techniques and Python libraries are employed in order to develop a model capable of reliably detecting the presence of disease in one's body. The current models rely on image or audio analysis to diagnose disease, encouraging the development ofa new model that uses both.

Key Words: Parkinson’s disease, deep learning, ensemble learning, early detection, premotor features, features importance.

tackle Parkinson's disease. To protect neuron integrity and reduce the progression of Parkinson's disease, it is critical to diagnose it early. Various Machine Learning Techniques and Algorithms can assist patients in getting early medication or treatment and in prominent journals to finish their grades. Furthermore, published research work carries a lot of weight when it comes to getting accepted into a prestigious university and improving medical standards Let's have a look at a few machine learningapplicationsinthehealthcareindustry,asseenin Figure1.

INTRODUCTION

ThisentireDataProcessingprocesscanbeautomatedand effective by using Machine Learning algorithms, mathematical modelling, and statistical expertise. Graphs, movies, charts, tables, photos, and a variety of other formatscanbegeneratedasaresultofthisentireprocess, depending on the task at hand and the machine's requirements.Earlydetectioniscurrentlythebestway to

Figure1: Application in Healthcare Sector

Machine Learning:Machinelearning(ML)isthestudyof computer algorithms that can learn and develop on their ownwithexperienceanddata.

Parkinson’sDisease (PD):Parkinson'sdisease(PD)isthe most prevalent movement disorder caused by neurodegeneration.Degradationofdopaminergicneurons isafeatureofthiscondition.

3. RESEARCH AND ELABORATION

3.1 RELATED WORK

Shrihari K Kulkarni, K R Sumana, the researchers in [1] used Decision Tree, Logistic Regression, and Naive Bayes, Deep Learning algorithm like Recurrent Neural Networks (RNN)bypredictingthePerformanceParameterstobuild the model. Machine learning approaches will be used to constructpredictionmodelsthatcandifferentiateearlyPD from healthy normal using the Movement Disorder Society Unified Parkinson's Disease Rating Scale (MDS UPDRS). For Subject and Record Validation, Logistic Regression,RandomForests,andSupportVectorMachine wereemployed.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1805
1. Abstract 2. Introduction 3. ResearchElaboration 4. ResultsorFinding 5. Conclusion 2.
Fivemajorsectionsofthepaperare:

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

Drawback of this paper were, Data Collection techniques are weakly regulated, resulting in unreliable results such as out of range or non existent This Model purely relies on Evaluation of Motions which is not the only source of DataavailableontheDisease bearersorHealthyCitizens.

Yatharth Nakul, Ankit Gupta, Hritik Sachdeva, the researchers in [2] used Supervised Learning Algorithms such as Random Forest, Support Vector and Naïve Bayes are also compared. Confusion matrix was used for accuracy checking and different Classification methods were used.ML classification technique will improve the accuracyandreduce possibleloopholes.Hyper parameter tuningisusedtoachievethemaximumaccuracy.Achieved maximum accuracy of 98.30% using the K nearest neighborclassification

The main drawbacks are Delay in Results derived and Output Progression is slow and Best Proposed MethodologyusedgivesHighererrorratewhenConfusion Matrixisplotted.

SGD (Stochastic Gradient Descent) is utilized for training data models, according to Wu Wang, Junho Lee, Fouzi Harrou, and Ying Sun of [3]. The FNN (Feed Forward Neural Network) is put into action. The sensitivity of the linear discriminate analysis approach utilized is the best, which means it has the best likelihood of distinguishing a real patient. The proposed deep learning model had a 96.45% accuracy rate. This is owing to the deep learning model's favorable capabilities in learning linear and nonlinear features from PD data without the requirement forhand craftedfeatureextraction.

The biggest drawback is that Deep Learning is frequently employed as a Blackbox algorithm, the trained neural networks are difficult to evaluate. Theoretically, it's difficulttocomprehendhowdeeplearninggeneratesgood results.

Support Vector Machine (SVM), Feedforward Back Propagation Based Artificial Neural Network (FBANN) And Random Tree (RT) Classifiers, Binary Logistic Regression, Linear Discriminant Analysis (LDA), ConvolutionalNeuralNetwork(CNN)DeepBeliefNetwork (DBN) Technique Deep Neural Network Classifiers were used by Muthumanickam S, Gayathri J, Eunice Daphne J, the researchers in [4]. It has a higher level of accuracy thanadeepneuralnetwork.Linearregressionissimpleto comprehend.Itcanbetweakedtopreventoverfitting.The sgd command can be used to update linear models. The Algorithm and the Outputs of Binary Logistic Regression HaveaGoodInterpretation.

The main drawback is that there are a lot of data to train with, and the computing expenses are higher. In non

linearrelationships,linearregressionfailsmiserably.They aren'tadaptable,difficult,orquick.

The researchers in [5], Timothy J. Wroge, Yasin Ozkanca, Cenk Demiroglu, and Dong Si, employed the VAD algorithm (Voice Activation Detection Algorithm) to clean thedataset.CrossValidationisperformedusingadecision tree and a Support Vector Machine. Neural Networks developedfromKerasandTensorFlowwerealsoutilized. Machine learning architectures based on non invasive vocal biomarkers can be used to diagnose and forecast disease. For noisy and high dimensional data, machine learningclassifiersareeffective.

Accuracy of this Particular model is very less comparatively. Algorithm’s Performance is limited consideringitisonlyClinician’sData.

Jayashree R. J, Ganesh S, Karanth S.C, Lalitha S has deeply explained how spectral features ex. Spectral Contrast, STFT and temporal features ex. Zero Crossing rate are extracted and classification done using XGBoost and Classifiers including Random Forest and Regression like Logistic Regression. It has Several Advantages including theRealTimeSpeechAnalysiswhichperfectlyshowshow noise and other factors affect Parkinson’s Disease Detection. It depicts better Accuracy and a different Characteristic approach on proving how noise affects in thePredictionofParkinson’ sDisease.Itsbiggestdrawback is it is based on a limited dataset, if more data was available, a more practical approach could be designed. Limited Classifiers are used and Analysis of Features are constrained to just eight features which is very limited comparatively.

3.2 SYSTEM DESIGN

The Proposed Architectures has totally 4 parts where it involves Dataset i.e., Data Acquisition, Feature Extraction, Classification and Output production as shown below in Figure2. 

Data Acquisition involves acquiring all the data available which involves voice samples of People whichcontainsnoiseornoiselessFeatures. 

Feature Extraction involves Voice analysis based onitsFeaturesi.e.MDVP.Jitteretc. 

Classification involves processing the given features of dataset using different classifiers such as SVM, XGBoost and Classification, Regression using Random Tree Classifier and Logistic Regression. 

Final Output is Produced for Prediction of the DiseaseinvolvingindividualInterests.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1806

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

• The Figure4 gives the combined analysis of both voice andimageanalysistodetectParkinson’sdisease.

Figure 2: System Design

3.3 DATAFLOW DIAGRAM

The Data flow diagram explains the basic flow data throughvariousstepsofParkinson’sdiseaseDetection.

•Inthefirststepwewillcollectdatalikefromthepatient withdifferentmedicalequipment.

• The collected data is now sent process for training the datawhichwillbeclassifiedusingdifferentMLalgorithms.

•AfterclassificationthedataintoTrainandTestdatasets, thedatasetswillbesenttopredictwhetherthepatienthas thediseaseornot.

The below figure3 shows the dataflow diagram for the proposedsolutionforthedetectionofParkinson’sDisease

Figure 4: Data flow diagram2.0

3.4 CLASS DIAGRAM

The Fig 5 explains the class diagram of Parkinson’s, detection.

Figure 3: Dataflow Diagram1.0

3.5 MODULES

Figure 5: Class diagram

•MODULE1 DatasetExtraction.

Functionality: Importing different modules for data analysis, data cleaning, model building. Importing dataset from fixed folder or directory. Voice dataset is the input. Importingdatasetfordataanalysisandcleaning.Assigning a data frame variable, the dataset for analysis. Fetch the features and targets from the data frame using pandas. Dataset involves acquiring all the data available which involves voice samples of People which contains noise or noiseless and SPECT images available. Figure 6 explains importing different modules for data analysis, data cleaning,modelbuilding.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1807

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

Figure8.1and8.2explainsdatacleaning.

Figure 6: Importing dataset

MODULE2 DatasetAnalysisandDatasetCleaning

Heretheimportedvoicedatasetischosenforanalysis.The unwanteddata will beremovedandsomeofthenull data will be added with a mean value. Imported voice dataset having features like jitter, shimmer, etc. is the input. Data incolumnsofthedatasetcontainsnullvaluewhichshould be filled certain values so that we can build error free model. Some of the columns in the dataset are not required for the classification, so we can remove unwanted columns. Data in columns of the dataset containsnullvaluewhichshouldbefilledcertainvaluesso thatwecanbuilderrorfree model.Figure7explainsdata analysis.

Figure 8.1: Data cleaning

Fig 7: Data analysis

Figure 8.2: Data cleaning 2.0

• MODULE 3

Data Splitting into Training and Testing Datasets

Splitting the cleaned dataset into Training and Testing datasetsfor model building. Splitthedataset into training and testing sets where twenty percent data for testing purpose. Input is voice datasets. Output of this module is the two datasets, randomly distributed to training and testing datasets each containing both input features and target value. We used test_train_split from sklearn. model.selection to split the dataset into eighty percent training dataset and twenty percent test dataset. Figure 9 explains Dataset Splitting and Decision Trees plotted to analyzethebestvaluedapproachforspecifiedoutput.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1808

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

Fig 9: Dataset splitting

•MODULE4 ModelBuilding

Testingthemodelwithunseendatasetistheprimarystep Input for this step is Trained model and Unseen Dataset Output of this step is to check the model accuracy with various classification metrics. Add the testing data to the trained model. Use classification metrics like Accuracy Score,F1Score,Precision ScoreandRecall Score tocheck theclassificationaccuracyofthetrainedmodel.Inthenext step we are going to add our training dataset to the built model. Wearegoingtocheck theaccuracyofbothtesting and training dataset. Here we have used SVM (support vector machine) or Random Forest Classifier to classify whether the patient has Parkinson’s or not. Figure 10 explains model building where Figure 11 shows the Final Output depicting all the Classification Metrics and DetectionoftheDiseaseforaparticularpatient.

Fig 11: Final Output and Prediction

3. RESULTS AND FINDINGS

Performance Metrics like Accuracy Score, Mean Absolute Error, Root Mean Absolute Error, Precision, F1 Score, and Recall Score, among others play a prominent part in the outcomesofanyMachineLearningproject.Gatheringallof these factors and metrics is critical since it allows us to assess the model’s strengths and weaknesses. When producing predictions in novel scenarios and other Sequenced Oriented Cases, model performance is critical formachinelearning.WeplottedthePerformanceMetrics against certain Models after comparing all of the Models basedonourDataset.

1. Accuracy Score: It is the most frequently applied parameter in all models; it is the ratio of True Positives and True Negatives to all Positive and Negative Observations. Figure 15 shows a graphical depiction of Accuracy Scores for the various models weexaminedinthisproject.

Fig 10: Model building

Fig 15: Accuracy score

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1809

International Research Journal of Engineering and Technology (IRJET)

e ISSN: 2395 0056

Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

2. Precision Score: Measures the percentage of labels and predictive values that are positively predicted. Whentheclassesare imbalanced,Precision Scoreisa useful measure of prediction success. Mathematically, it is the ratio of true positive to the sum of true positive and false positive. The graphical depiction of PrecisionScoreagainstthevariousmodelsweutilized inthisprojectisshowninFigure12.

Figure 12: Precision Score

3. F1 score Overall Incorporator Inculcating both Precision and Recall. Often used for Optimization basedonModels.Figure13givestheF1Scoregraphof variousmodelsofclassification.

Figure 13: F1 Score

4. Recall Score: It is the accurate enumeration of real values from actual positive values. This Score indicates how significant this component is in optimizing Required Output. The Figure14 offers a graphical analysis of how different models calculate recallscores.

Figure14: Recall Score

It can be Inferred from the Above Tables such that Random Forest outperforms all the Other Models implemented with Accuracy Score around 97.43 %, Precision Scorearound 96.55%,F1Score around98.24%, whereas XG Boost lead the Recall Score with 97.252%. Hence Considering all the Performance Metrics, Random Forestisthemostobviouschoiceconsideringthefactthat ithasoutperformedandcanbetrustedonsuchaproblem whichdoesn’tallowevenasmallgapforerrorconsidering itisapartofafieldlikeHealthCare

4. CONCLUSION

Parkinson's disease is a brain disorder that affects the central nervous system (CNS), and there is currently no cure for it unless it is diagnosed early. Late detection resultsinnotherapyanddeath.Asaresult,earlydetection is critical. We used machine learning algorithms such as SVM (Support Vector Machine), Decision Tree, Random Tree Classifier, and Neural networks for early disease detection because they are known for their efficiency and quickretrieval.Mostimportantly,speechprocessinghasa lot of potential in terms of Parkinson's disease detection, classification,anddiagnosis.WeexpectthatmoreMachine Learning based technologies and medical techniques will beavailablesoontosavepeoplefromthisdisease

REFERENCES

[1] Shrihari K Kulkarni1, K R Sumana2,"Detection of Parkinson’s Disease Using Machine Learning and Deep LearningAlgorithms"InternationalJournalofEngineering ScienceInvention(IJESI)ISSN(Online):2319 6734,ISSN (Print):2319 6726VOLUME8ISSUE:8,(AUG2021) Page No:1189 1192.

[2] Yatharth Nakul1 , Ankit Gupta2 , Hritik Sachdeva3,,,” Parkinson Disease Detection Using Machine Learning Algorithms”InternationalJournalofScienceandResearch (IJSR) ISSN: 2319 7064 SJIF (2020): 7.803 Volume 10 Issue6,June2021 Pageno314 318.

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1810

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 06 | June 2022 www.irjet.net p ISSN: 2395 0072

[3] Wu Wang1 , Junho Lee2 , Fouzi harrou3 and Ying sun4,,,”EarlyDetectionofParkinson’sDiseaseUsingDeep Learning and Machine Learning” IEEE ACCESS Digital Object Identifier 10.1109/ACCESS.2020.3016062 Volume 8,2020 Pageno147635 147646.

[4] Muthumanickam S1 , Gayathri J2 , Eunice Daphne J3 ,, “Parkinson’s Disease Detection And Classification Using Machine Learning And Deep Learning Algorithms A Survey”, International Journal of Engineering Science Invention(IJESI)ISSN(Online):2319 6734,ISSN(Print): 2319 6726 www.ijesi.org ,Volume 7 Issue 5 Ver. 1, May 2018||PP56 63.

[5] Timothy J. Wroge1 , Yasin Ozkanca 2 , Cenk Demiroglu3,DongSi4,DavidC.Atkins5andRezaHosseini Ghomi6,,” Parkinson’s Disease Diagnosis Using Machine Learning and Voice”, Conference: 2018 IEEE Signal Processing in Medicine and Biology Symposium||DOI:10.1109/SPMB.2018.8615607.

[6]JayashreeR.J.,GaneshS.,KaranthS.C.,LalithaS.(2021) Automatic Detection of Parkinson Speech Under Noisy Environment.In:ThampiS.M.,GelenbeE.,AtiquzzamanM., Chaudhary V., Li KC. (eds) Advances in Computing and Network Communications. Lecture Notes in Electrical Engineering, vol 736. Springer, Singapore. https://doi.org/10.1007/978 981 33 6987 0_16.

©
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page1811

Turn static files into dynamic content formats.

Create a flipbook