Blood Cell Image Classification for Detecting Malaria using CNN

Page 1

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN:2395-0072

Blood Cell Image Classification for Detecting Malaria using CNN

Student Dept. of Computer Science and Engineering, IIT Bombay, Maharashtra, India ***

Abstract - Artificial intelligence combined with open source tools can improve the diagnosis of the fatal disease malaria. Malaria detection is not an easy procedure, and the availability of qualified personnel around the globe is a serious concern in the diagnosis and treatment of cases. We looked at an interesting real-world medical imaging case study of malaria detection. Easy-to-build, open source techniques leveraging AI can give us state-of-the-art accuracy in detecting malaria, thus enabling AI for social good. AI can improve the diagnosis of Malaria. In our Project, we have used CNN, Max Pooling, Dropout, Rectified Linear Unit (ReLU), and Adam Optimizer for detecting Malaria.

Key Words: Deeplearning,ConvolutionalNeuralNetwork,imagedetection,Convolutionallayers,MaxPooling,Dropout.

1. INTRODUCTION

Malariaisaseriousandlife-threateningdiseasecausedbyPlasmodiumparasiteswhichcantraveltotheliverbyentering theperson’sbloodstream.Itinfectstheredbloodcells,whichresultsinfatalsymptoms.However,thisdeadlyparasitecan liveinourbodyforoneyearwithoutcausinganysymptoms thatleadtodeath.So,itisreallynecessarytodetectmalaria disease earlier to save lives. Artificial intelligence (AI) combined with open-source tools can be used to improve the diagnosis of malaria disease in a fast, easy, and accurate way. This is our motivation to make a model using CNN (convolutionalneuralnetwork)thathelpsustodetectmalariabyclassifyingbloodcellimages.

2. Dataset

2.1 Source of the Dataset

For our project, the Dataset we have used is collected from “The Lister Hill National Center for Biomedical Communications (LHNCBC), part of the National Library of Medicine (NLM), USA.” In the dataset, there are 27558 cell imageswhicharedividedintotwopartsParasitizedcellimageswhichcontain13779images,andUninfectedcell images, which contain 13779. The cell images contained in the dataset were collected from 201 people, and among them, 151 peoplewereinfected,and50peoplewerehealthy.

3. Methodology

3.1 Environment Setup and Library Import:

For this project, wehave used Google Colab from our browser, which helps us to write and execute python code. We haveimporteddifferentlibrariesthatworkasfollows:

● NumPy: It is a Python library used for working with arrays, matrices, and image processing by using concepts suchasvectorizationandadvancedindexing.

©
Journal | Page 1086
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified
Figure1:SampleImagesofBloodCell.

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN:2395-0072

● Pandas:Itisasoftwarelibrarythatisusedfordatamanipulationandanalysis.

● TensorFlow: It is a Python-friendly open-source library for numerical computation. It also enables the user to implementadeepneuralnetworkthatisusedforsolvingimagerecognition/classificationtasks.

● OpenCV:ItisalibraryofPythonthatisusedforsolvingcomputervisionproblems.

● Matplotlib:Itisaplottinglibraryfordatavisualization

● Seaborn:ItisaPythondatavisualizationlibrarybasedonmatplotlib,whichisusedtogivea high-level interface forstatisticalgraphics.

3.2 Data Processing:

We have resized our images by64*64 and rotated our images by40 degrees tochange thepositionof the images. We havealsoshuffledourimagecellbychangingthepositiontodetectinwhichpositionthereisinfectedcellcanbeshown. We have also normalized our images by 255 to increase the intensity and to count the pixels accurately. We have also labeledouruninfectedimagesas0andinfectedimagesas1.Herewehavethreecolorchannelsinourimagesthatarered, greenblue.

After processingthedata, wedividedthedata into train andtestsets.Atfirst, the datasetisdivided intoTrainingand Testset(Train80%,Test20%).Thenthetestsetisdividedintovalidationandtestset(Validation50%,Test50%)

3.3

Figure3:CNNModelArchitecture

We have used CNN to train our dataset, which is so far been the most popularly used network for analyzingimages.However,imageanalysishasbeenthemostwidespreaduseofCNN’sthatcanalsobeusedforotherdata analysisorclassificationproblemsaswell.

● Convolutional layers: Here, we have used convolutional neural network layers to detect spatial patterns from data. This pattern detection makes the CNN model useful for our image analysis. CNN has hidden layers called

©
Certified
Page
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008
Journal |
1087
Figure2:DatasetDividedintoTraining,Validation,andTesting. The architecture of our Model:

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN:2395-0072

convolutional layers, and in our project, we have used three convolutional layers. Basically, one convolutional layerreceivesinput.Afterthat,ittransformstheinputinsomewaywheretheoutputsofthetransforminputgoto thenextconvolutionallayer.Inourproject,ourfirstconvolutionallayerisusedtolearnsmallandlocalpatterns, such as edges and corners. We have taken 40 filters and the kernel size as [5,5]. Our second convolutional layer learnslargerpatternsbasedonthefeaturesfromthefirstlayer,wherethereare70filters,andthekernelsizeis [3,3].Andthelastlayerlogitisusedforcalculatingtheaccuracywherewehavetaken15filters.Herethekernel sizeisthesameasthesecondconvolutionallayer.

● Max Pooling: After that, we have used Max pooling which helps us to downsample and dimension reduction. It alsoreducesthecomputationalcostbyreducingthenumberofparameters.

● Dropout:Wehaveuseddropouttoremovesomelayers.Bypreventingcomplexco-adaptationsontrainingdata,it reducesoverfittinginartificialneuralnetworks.

● Rectified Linear Unit (ReLU): Forhigh-performanceanalysis,weusedtherectifiedlinearactivationfunction.It runs faster and performs better by vanishing the gradient problems. It will give output if the input is positive. Otherwise,itgivestheoutputas0.

4. Results/Analysis:

● For evaluating the model, we used the confusion matrix, classification report, and accuracy score functions of scikit-learn. The performance of our model with relevant classification metrics is given below:

Figure4:FinalClassificationReportforTestdata

● It looks like our models perform well on the test dataset. After testing the dataset, we have got a good model accuracy. For evaluating the model, we used the confusion matrix, classification report, and accuracy score functions of scikit-learn. We got a training loss of 14.13% using the model, while the evaluationstageaccuracyis93.39%andthelossof18.23%.

● For the test set the classification report, we got 90% precision and 97% recall when predicting healthy cells.Andthereis97%precisionand90%recallwhenpredictinginfectedcells.

© 2022, IRJET | Impact Factor
7.529 | ISO 9001:2008 Certified Journal | Page 1088
value:

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Figure5:SomeRandomPredictionsandActualValue

● SomeRandomPredictionsandActualValue:(InfectedCell=1,UninfectedCell=0)

● Ifweseetheoutputsofourmodel,wegetsomefalsepositiveandfalsenegativepredictions.F1-scoreis 93%.Sothereissomescopetoimprovethemodel.

5. CONCLUSIONS

Malaria detection is a complex procedure and needs the proper guidance of qualified personnel around us to diagnose thedisease.Wetriedtomaketheprocess easyandaccessibleforeveryoneusingMachineLearningmodels.Weachieved an accuracy of 93.39 percent using Convolutional neural networks. This model can be used for detecting malaria. In the future, we will try to improve the accuracy of our model for better results and develop a mobile application so that everyonecandiagnoseMalariainthequickestpossibletime.

REFERENCES

[1] Convolutional neural network - wikipedia,”https://en.wikipedia.org/wiki/Convolutional_neural_network,(Accessed on08/14/2022).

[2] Detecting malaria with deep learning | opensource.com,” https://opensource.com/article/19/4/detecting-malariadeep-learning,(Accessedon08/14/2022).

[3] Dilution (neural networks - wikipedia,” https://en.wikipedia.org/wiki/Dilution_(neural_networks, (Accessed on 08/14/2022).

[4] Gentle introduction to the adam optimization algorithm for deep learning,” https://machinelearningmastery.com/adam-optimization-algorithm-for-deep-learning/,(Accessedon08/14/2022).

[5] Google colab,” https://colab.research.google.com/github/Giffy/CarCrashDetector/blob/master/, (Accessed on 08/14/2022).

[6] Matplotlib-wikipedia,”https://en.wikipedia.org/wiki/Matplotlib,(Accessedon08/14/2022).

[7] Max-pooling / pooling - computer science wiki,” https:// computersciencewiki.org/index.php/Maxpooling_/_Pooling#:~:text=Max,(Accessedon08/14/2022).

[8] Numpy-wikipedia,”https://en.wikipedia.org/wiki/NumPy,(Accessedon08/14/2022).

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN:2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1089

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

BIOGRAPHIES

Complected M.Tech CSE at IIT Bombay,Maharashtra.

Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN:2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 1090

Turn static files into dynamic content formats.

Create a flipbook