Review on Mesothelioma Diagnosis

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072

Review on Mesothelioma Diagnosis

Department of Computer Science and Engineering, KDK College Rd Opposite Telephone Exchange, Nandanvan Nagpur-440009, Maharashtra ,India ***

Abstract- Asbestos is a carcinogenic substance, and threatens human health. Malignant Mesothelioma disease is one of the most dangerous kind of cancer caused by asbestos mineral. The most common symptom of the disease, progressive shortness of breath and constant pain. Early treatment and diagnosis are necessary. Otherwise, the disease can lead people to die in a short period of time. In this paper, different types of artificial intelligence methods are compared for effective Malignant Mesothelioma's diseases classification. Support Vector Machine, Neural Network and Decision Tree methods are selected in terms of regular machine learning concept. Additionally, Bagging and Adaboost resampling within ensemble learning terminology is also adapted. Totally 324 Malignant Mesothelioma data which consists of 34 features is used in this study. K-fold cross-validation technique is performed to compute the performance of the algorithms with different K values. 100% classification accuracies are obtained from three tested methods; Support Vector Machine, Decision Tree and Bagging. Additionally, the process time of methods are measured in case of using method in lots of data. In this sense, methods are evaluated based on accuracy and time complexity. The results of this paper are also compared with previous studies using same Malignant Mesothelioma'sdataset.

Keywords Malignant Mesothelioma, Support Vector Machine, Decision Tree, Neural Network, Ensemble Learning

1. INTRODUCTION

Malignant Mesothelioma (MM) is one of the cancer type. It appears on the thin layer of tissue and rapidly affects to various internal organs [1]. Lining parts of lungs and the chestwallisthemostinfectedpartsandorgansincases[2] [3]. Different symptoms such as difficulties in breathing, affliction in chest wall, cough, bloated abdomen, exhausted morality, extremelyloss in weightetc.canbeseen.Disease advancesrapidlywhilethesymptomsappearslowly[4].

1) The asbestos mineral plays important role on mesotheliomadisease.Accordingtomedicalreport,80%of disease is caused by the mineral [3]. More exposure to mineral increase the risk of developing the disease. In this sense, people living in industrialized countries encounters

more thansmall towns.Morespecifically,disease ismostly seen in miners and produces who deals with the asbestos mineral.Normally,incubationstageofthediseaseisaround 40 years for [3]. The late awareness of Malignant Mesotheliomadiseasehasmadeitimpossibletodiagnosis.

2) The diagnosis are performed by observation of the X-ray images of chest and the scan findings of computed tomography. In both techniques, doctors mainly examine the fluid produced by the cancer in results or the tissue obtainedbybiopsy[4].

3) Addition to regular techniques, computerized methods are also utilized in few studies. Currently, computer based diagnosis systems, which named as Computer Assisted Systems (CAS) become more popular due to high accurate, consistent and efficient results [5]. CASmainlyemploystheartificialintelligencemethodssuch as Support Vector Machine (SVM), Decision Tree (DT), Neural Networks (NN) etc. on the stored numerical data. Similartovariousmedicalapplication,MMdiseasediagnose is, basically, also a significant classification problem. Methods might conclude different results according to arrangeddata[6].Inthissense,inordertodefinetheuseful method for the corresponding data, several artificial algorithms need to be tested. 5) In the study, the classification of the data for the Malignant Mesothelioma disease is performed and test results is compared. This study also provides a decision support system, which contributes to the doctors in their diagnosis decisions. Paper is organized as follow; current studies over MM disease diagnose are presented in Section 2. Methods used in testing are briefly explained in Section 3 with data information. Results and explanations are given in Section 4.Paperisconcludedwithfutureworksandfinaldecisions aslastchapter.

2. LITERATURE REVIEW

Visual investigation technique on the diagnosis of medical images is a time-consuming and subjective procedure. Experiencesofdoctorsplayeffectiverolesondecisionstep. In this sense, using the image processing algorithms and artificial intelligent methods prevent diagnoses from different decisions of doctors such as in computed tomographyanalyses.Computerbasedtechniquepresented in [7] easily identifies the pleural contours and detects

© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page143
Prof. Vaishali Surjuse, Anish Khobragade, Ajeet Sah, Shubham Soneji

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072

pleural thickenings with two steps. Firstly, they detect the thorax and then remove the air and trachea. In both steps, they implemented 3D morphological operations. According to paper, image retrieval system over MM diagnose is a promisingmethodtodetectthedisease.

Another study published by Chen et al. [8] explains the implementation of the random walk-based segmentation [9] method. They used mesothelioma computed tomography image datasets and aimed to establish an automatic segmentation. They observed the progression of the disease by volumetric assessments to decide the treatments. Similar to this approach, Onama et al. used 3D version of random walk-based segmentation method on PET images [10]. They aimed to increase success rates for thedetectionofLungTumor.

Er et al. used numerical dataset instead of images. They adapted probabilistic neural networks (PNN) for using in the diagnosis of MM disease. They compared the results to multilayer and learning vector quantization neural networks. They reported in [11] that PNN is evaluated as bestclassifierwith96.30%accuracy.

AdifferentapproachtoMMdiseasediagnoseispresentedin [12]byK.Chaisaowongetal.Theyobservedthecontoursof the pleura form in healthy and patient cases. According to comparisonoftracing,theydetectedthethickenings.Inthis meaning, they formed a tissue-specific segmentation by implementation of the 3D Gibbs-Markov random field (GMRF) [13]. It is adopted to distinguish thickenings from thoracic tissue. Then, morphometric analyses and volumetric assessments are performed to 3D modeling. According to results of the paper, authors assure that the automated approach can help physicians to diagnose pleuralmesotheliomainitsearlystage.

3. METHODOLOGY

Currently, several machine learning algorithms are already utilized for mesothelioma dataset. However, classification results might be increased with other methods. Hence, in this study, different machine learning methods tested on mesothelioma dataset. Methods are selected due to not appliedondatasetbefore.Hereby,incaseofmoreaccurate results, method can be used for advanced diagnosis. Five fundamentalclassificationmethodsaretestedinthisstudy. Methodsarecategorizedintotwotitles:a.)Machinelearning and b.) Ensemble-learning methods. The brief descriptions of the used methods and parameter arrangement are separatelyexplainedinfollowingsubsections.

Machine Learning Methods

A great deal of machine learning algorithms and their variation with differentlyselected parametersarestatedin literature by means of classification. Majority of them are highly modified for biomedical datasets. Accurate results

provides more informative and meaningful diagnosis. In that meaning, three fundamental methods of machine learningisadaptedformesotheliomadataset.

SupportVectorMachine(SVM)

SVM is one of the prominent classification algorithms that can be used large-scale datasets and provides more accurate results. It can be achieved by even small size trainsetswiththehelpof well-fittedcostfunctioninkernel space[15].

SVM uses the core idea of kernel based learning. It aims to separate data in high dimensional feature space with a kernelfunction.SVMcreatesadecisionsurfacebetweenthe samples of different classes over optimal hyperplane. SVM provides binary classification of two-class datasets. "One against one" or "one against all" are the most popular strategies in literature. Each strategy has own advantages and disadvantages mentioned in [16]. In our study, “one againstone”strategyisusedowingto2classes’presencein datasets.

In order to define well-fitted settings of SVM for mesotheliomadataset,differentkernels,penaltyandkernel parameters are tested at the initial part of study. Table 2 indicatestheallparametertestresults.

a)DecisionTree(DT)

Decision Tree is known as rule based machine-learning method [17]. Principally, it works based on tree terminology. The path from root to leaf presents classification rules. The roots represent the most informative features and the leaves indicate the labels. Informationgain(IG)istheruledefiningcriteria.Themost widely used algorithms are entropy, twoing, and Gini to calculatetheIG.

Decision Tree is easy to implement. Additionally, interpretationoftheclassificationismucheasierthanother methods. It is useful for some regression problems. However, DT results low performance on large scale datasets with few training samples compare to SVM [18]. Pruning process is another obstacle point to avoid overfitting. According the results of preliminary studies on parameter settings, DT model is modified with pruning functionalityandGiniDiversityIndexforIG.

a)NeuralNetworks- MultiLayerPerceptron(MLP)

Multi-Layer Perceptron (MLP) is the advanced version of NN[19].Minimumtwolayersconnectedwithtwofunctions should be utilized. Different parameters and functions are tested at initial studies. According to results, MLP network isarrangedastheweightandbiasarefixedwith0.8and1, respectively.

© 2022,
| Impact Factor
7.529 | ISO 9001:2008 Certified Journal | Page144
IRJET
value:

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072

Ensemble Learning Methods

Ensemble learning is emerged from the principles of machine-learning concepts. The key point behind the ensemble is the proper combination of several machine learning algorithms. Not only one learner as in regular methods, multi learners gather in decision step for ensemble methods, therefore it gives more success. Machine-learning classifiers such as DT, KNN etc. is named asbaselearner.

Mainlytwo ensemble modelshaving the same base learner (Decision Tree - DT) combinations but different sample selection strategies are evaluated in this study. Majority votingisusedtodefinefinaldecisionofbaselearners.

a)BaggingwithDTs

Bagging,in otherwords bootstrapaggregation,isa wayfor improving the classification by the aid of well-formed train samples.Itisalsocitedasre-samplingprocessinliterature [20]. The idea of bagging is to distort the dataset by resampling, and to train weak learners using re-sampled trainsets.Thedistortionofthesamplesismadebyavoting process of weight parameters. The weights of the samples arefixedequally;therefore,trainsetsarerandomlyselected. Consequently, different samples are used in trainset iteratively. It provides more diversity in the samples' distribution. The average of the each decision of base learners determines the final decision. More information canbefoundin[20].

a)AdaptiveBoosting(Adaboost)withDTs

Boosting is another technique in re-sampling process similar to bootstrap. The difference is that bootstrap ignores the weight values of the samples and re-samples randomly, however boosting technique defines different weights for each samples after first iteration. Then, the probabilities of misclassified samples are boosted for the second step, and subsequent classifiers are trained. Likewise, other steps are sustained with different weight parameters.Readersarereferredtoanessentialguide[21] forboostingtheorem.

Adaptive boosting is mainly outperforms other regular boosting techniques and more robust for over-fitting problem. However, it is still easily affected by noise and outliersowingtoiterativelyarrangingprocessforweights.

Dataset

Dataset is obtained from UCI dataset repository [22]. It includes the patient’s records obtained from Dicle University, Faculty of Medicine. 324 MM patient data were recorded and tested by aforementioned AI methods. These data were also investigated by Orhan Er et al. in terms of PNNasmentionedinSection2[12].

In the dataset, 324 samples individually have 34 features with multivariate variables. There is no “unidentified” or “missing value” presence in dataset. Details of data and features can be found in [12]. Decision labelsprovidedbydoctorsassickandhealthy(2classes).

4. RESULT & DISCUSSION

Classification of mesothelioma dataset is performed by three regular machine learning and two ensemble learning methods.DT,SVM and NN methodsareselectedwithin the regular machine learning concept. On the other hand, Bagging and Adaboost with same weak learners (DT) is performed as ensemble idea. Accuracy and computational time are considered as the evolution metrics. Computational time is recorded to estimate efficiency of method for big data problems due to so many patients suffering from MM disease. In case of future studies with more patient record, time complexity become more importantfactoraccordingtoincluding34featuresbesides plentyofpatients.

Only10Fold Cross validationtestsare measured in terms of computational time. Less computational time and high accuracy rate are preferred to indicate the best algorithm. OverallresultsarepresentedinTable1

DT SVM (Linear) MLP Bagging Adaboost

10-Fold 100 100 96,87 100 70,54 5-Fold 100 100 95,82 100 65,35 2-Fold 100 100 94,44 100 68,82

Time 0,019 0,095 13,89 17,52 0,25

Table1:overallresultsofmethods

According to Table 1, simple DT and SVM as regular machine learning idea and DT with Bagging in terms of ensemble method outperform over other methods with common100%accuracyrates.Differentlyformedtrainsets (2, 5 and 10 Fold) has no effect in general. However, anotherensemblemethod,AdaboostusingsameformofDT as base learner but different sample selection strategy as weighed re-sampling, stay far behind over all methods. In this sense, randomly selection of train samples is more effective strategy in the detection of mesothelioma. Selectionofsamplewithweightparameterisuselessdueto lots of features (34 features) using in classification. However, Bagging needs more computational time because of irregular sample selection process. In that meaning, Bagging is not preferred method when compare to DT and SVMbecauseofthesameaccuracyrates.

©
| Impact
7.529 | ISO
Certified Journal | Page145
2022, IRJET
Factor value:
9001:2008

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072

One of the prominent Kernel based method, SVM, is tested withdifferentkernelsandparameters.Obtainedthehighest results of each kernel with different parameters are individually registered in Table 2. Linear kernel gives the best result with 100% in all K values. RBF (radial basis function) outcome is depended on training size. It resulted 100% accuracy rate with more training samples, but success is decreased when train set reduced. Besides the inconsistent results of RBF, it includes exponentially operations, thus, needs more time to classify big data. To avoid that time consuming process, Linear SVM might be utilizedinpracticeowingtosimplicityofalgorithmandless time complexity. Polynomial, quadratic and MLP (MultiLayer Perceptron) kernels generally concluded with 97%, 88% and 90% respectively. These kernels are also directly related with training sample size. Addition to low accuracies, computational time analyses of kernels are not toofarahead from linearkernel.Therefore, SVMshould be utilized with Linear Kernel to classify mesothelioma dataset. Results emphasize that it might give better results withbigdataoverothermethods.

As a final method, MLP in Neural Network terminology is adapted. Normally, MLP gives higher accuracies on nonlinearclassificationproblems,butdealswithallsamples in dataset. In that meaning, algorithm success might be decreasedeasilybyoutliersand needsmorecomputational timeasitisemphasizedinTable1.Datasethas34features over 324 observation which means 34 dimension data. In thatcase,MLPisresultedwith97%accuracyrateowingto complexityofdataset.Ontheotherhand,SVMfocusonthe samples near support vectors. Therefore, SVM surpasses MLPduetolesscomplexityandusingpre-arrangeddata.

Kernels Polynomial Quadratic MLP Linear RBF

10-Fold 97,72 88,98 90,93 100 100

5-Fold 97,18 88,75 89,21 100 99,84 2-Fold 92,40 84,01 86,11 100 99,07

and obtained same results. This indicates the testing methodology is similar and analogous. In that meaning, otherobtainedresultsexpressconsistentoutput.

DT and SVM as regular machine learning, and Bagging as ensemble learning are highly compatible algorithms for mesothelioma dataset considering to Table 1. Methods

successfully provide 100% accuracy rate in classification. However,linearkernelSVMandDTaresimpleralgorithm and require less computational time compare to Bagging. In this sense, Bagging is not preferably. Rule Based algorithm, DT, fails on big data problem according to report [14]. Therefore, it is also useless in practice owing tonumerouspatientsufferingfromMesotheliomadisease. In order to generalize the results, more record is necessary. In that condition, DT might give misleading diagnose.Asaresult,LinearSVMmightbebettertoutilize inpracticeduetoabovementionedresultsandreasons.

As future works, abovementioned methods will be tested on more obtained data in classification. Then, more genericdiagnosesystemcanbeimproved.

6.REFERENCES

[1] Malignant Mesothelioma, Retrieved 3 May 2016, http://www.cancer.gov/types/mesothelioma

[2] General Information About Malignant Mesothelioma, Retrieved 3 May 2016, http://www.cancer.gov/types/mesothelioma/patient /mesothelioma-treatment-pdq

[3] B.M. Robinson, “Malignant pleural mesothelioma: an epidemiologicalperspective”,Annalsofcardiothoracic surgeryvol.1(4),2012.

[4] S. Kondola, D. Manners, A.K. Nowak, "Malignant pleural mesothelioma: an update on diagnosis and treatment options", Therapeutic Advances in RespiratoryDisease,2016.

[5] Delp, S. L., Loan, J. P., Robinson, C. B., Wong, A. Y., & Stulberg, S. D. (1997). U.S. Patent No. 5,682,886. Washington,DC:U.S.PatentandTrademarkOffice.

5. CONCLUSION

In this study, different machine and ensemble learning methods are tested on the detection of mesothelioma disease. In that meaning, a prevalent dataset provided by OrhanEretal.[8]isutilizedtomeasurethemethods.

OrhanEretal.publishedastudyabouttheclassificationof theirdatasetwithPNNbefore.Theyreported96%success

Time 0,186 0,385 0,089 0,095 0,286 with3Foldcrossvalidation.Inthisstudy,wealsoperform a MLP network having 0.8 weight and 1 bias parameters

[6] H. Kadoz, S. Ozsen, A. Arslan, and S. Gunes, “Medical application of information gain based artificial immune recognition system (AIRS): diagnosis of thyroiddisease”,ExpertSystApplvol.36(2),2008.

[7] J. Lerdsinmongkol, K. Chaisaowong, S. Roongruangsorakarn, T. Kraus, and T. Aach, “Efficient Application of 3D Morphological Operations in the Framework of a Computer-Assisted Diagnosis System”, 9th International Conference on Signal Processing,pp.857-860,2008.

©
|
7.529 | ISO
Certified Journal | Page146
2022, IRJET
Impact Factor value:
9001:2008

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 09 Issue: 12 | Dec 2022 www.irjet.net p-ISSN: 2395-0072

[8] M.Chen,E. Helm,N. Joshi, S.M.Brady, “Random walkbased automated segmentation for the prognosis of malignant pleural mesothelioma”, IEEE International Symposium on Biomedical Imaging: From Nano to Macro,pp.1978-1981,2011.

[9] Grady, L. (2006). Random walks for image segmentation. IEEE transactions on pattern analysis andmachineintelligence,28(11),1768-1783.

[10] Onoma, D. P., Ruan, S., Gardin, I., Monnehan, G. A., Modzelewski, R., & Vera, P. (2012, May). 3D random walk based segmentation for lung tumor delineation in PET imaging. In 2012 9th IEEE International Symposium on Biomedical Imaging (ISBI) (pp. 12601263).IEEE.

[11] O.Er, A.C.Tanrikulu,A.Abakay,andF.Temurtas,“An approach based on probabilistic neural network for diagnosis of Mesothelioma’s disease”, Computers & ElectricalEngineering,vol.38(1),pp.75-81,2012.

[12] K. Chaisaowong, C. Akkawutvanich, C. Wilkmann, and T.Kraus,“Afullyautomatic probabilistic 3Dapproach for the detection and assessment of pleural thickeningsfromCTdata”,ComputationalIntelligence in Medical Imaging (CIMI), IEEE Fourth International Workshopon,pp.14-21,2013.

[13] Schroder, M., Rehrauer, H., Seidel, K., & Datcu, M. (1998). Spatial information retrieval from remotesensing images. II. Gibbs-Markov random fields. IEEE Transactions on geoscience and remote sensing, 36(5),1446-1455.

[14] E. Lotfi, A. Keshavarz, “Gene expression microarray classification using PCA–BEL”, Computers in Biology and Medicine, vol. 54, pp. 180–187, 2014. [15] B. Scholkopf,A.J. Smola,LearningwithKernels:Support Vector Machines, Regularization, Optimization, and Beyond,MITpress,2001.

[16] J.Milgram, M.Cheriet,R. Sabourin, oneagainst oneor one against all: Which one is better for handwriting recognition with svms in: Tenth International Workshop on Frontiers in Handwriting Recognition, Suvisoft.

[17] Quinlan,J.Ross."Inductionofdecisiontrees."Machine learning1.1(1986):81-106.

[18] L. Breiman, J. H. Friedman, R. A. Olshen, C. J. Stone, Classification and Regression Trees, Wadsworth InternationalGroup,Belmont,CA,1984.

[19] Hagan, Martin T., Howard B. Demuth, and Mark H. Beale. Neural network design. Boston: Pws Pub., 1996.

[20] L. Breiman, Random forests, Machine learning 45 (2001)5–32.

[21] R.Rojas,Adaboostandthesuperbowlofclassifiersa tutorialintroductiontoadaptiveboosting(2009).

[22] UCI Machine Learning Repository, Mesothelioma Disease Data Set, Retrieved 3 May 2016, http://archive.ics.uci.edu/ml/machine-learningdatabases/00351/

©
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page147

Turn static files into dynamic content formats.

Create a flipbook