International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
![]()
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
1Student, Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, Maharashtra, India
2Student, Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, Maharashtra, India
3Student, Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, Maharashtra, India
4Assistant Professor, Department of Computer Engineering, SCTR’s Pune Institute of Computer Technology, Pune, Maharashtra, India
Abstract - Cancer is a collection of diseases, which is driven by changes in cells of the body by increasing normal growth and control. Its prevalence is increasing yearly and is advancing along with it to counter the occurrences and provide solutions.
The early stages of cancer detection are required to provide proper treatment to the patient and reduce the risk of death due to cancer as detection of these cancer cells at later stages leads to more suffering and increases the chances of death
This research aims to study various techniques for detecting cancer in its early stages.
Key Words: Convolutional Neural Network, Random Forest, Machine Learning, Linear Regression
Cancer is one of the major diseases which needs to be taken care of in its early stages; otherwise, the excess cancer cells cause the most damage to the body and weaken the person. It is a priority to detect these cancer cells at an early stage to cure them simple and cause no harm to the person's life by those cells. If we find cancer cells before proceeding to further stages, then we will be capabletoretainmanylives.Manypeoplecannotaffordto spend money to cure this cancer or to test it, so our main aimistotakethistestforaslowacostaspossiblesothat everyone will be able to afford it and be able to cure it at anearlystagewithnoharm totheirlives.Tohelpus with all this, we need to make a system that would help us detect this cell and give output accordingly, so machine learning can be an option here to guess these cells and provide an accurate yield. This survey paper presents all types of algorithms as supervised, semi-supervised, and
unsupervised machine learning algorithms to classify cancercelldetectioninitsearlystages.
Classificationinmachinelearningisestablishedbymaking the machine learn a training dataset to store data. This learning can be classified into three types: supervised, semi-supervised, and unsupervised learning. In the supervised learning class, labeled data is present at the beginning. In semi-supervised learning, some of the class labels are familiar. Whereas, in unsupervised learning, class labels are not available. Once the training phase is finished, features are extracted from the data based on term frequency and then the classification technique is applied
Hajelaet.al.[1]trainedtheclassifiertoteachthemachine todetectcancercellsusingfeaturesthatwerefirstusedto detect cancer cells in the early stages. They have used a fewalgorithmstosolvethisproblem,astheConvolutional Neural Network (CNN), image analysis, and the K-Nearest Neighbors algorithm. It has 91% specificity and 90% average accuracy. Image analysis scans the image and takes its fundamentals to give the desired output. CNN is verymajorinthefieldofdeeplearning.CNNuses1-D,2-D, and multidimensional convolutional models. It uses the softmax function, which converts a vector of N to the probability distribution of N possible outcomes. It takes images as input and gives an output that is easier to understand without losing the key features. K-Nearest Neighbor falls under the supervised ML algorithm, which is resolved by regression and classification problems. It predicts the output by calculating the distance between nodes, but by doing this, the time required for all calculations increases. Among all these algorithms, CNN is thebestmethodwiththebestaccuracyandspecificity.
International Research Journal of Engineering
Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
J.Awatramaniet.al.[2]isonsupervisedandunsupervised learning features are used to detect malignant cells. The algorithms they experimented are Random Forest Tree, SVM (support vector machine), K-SVM (kernel-support vectormachine),K-NearestNeighbor,andDecisionTree.It has98%accuracyinRandomForest Tree, whichisone of thebestmethodsinthispaper.Therandomforestmethod uses a couple of decision trees where a decision tree with themaximumvotesisselected.Itisatypeofunsupervised modeoflearning.ADecisionTreeisamethodinwhichwe gather a dataset and then create sub-data parts and again minimizeit.Itgivestheoutputwhenthetreeevolves,and at the termination level, we disclose the result. Here we uselinearSVM,whichisatypeofsupervisedlearning,and alldataissegregatedintohyperplanes,andthedistanceis considered between these hyperplanes. For KNN, we use Euclidiandistancetocalculatethedistancebetweenpoints of data. The sorting of data is done by taking the closest data relatedtothearrival ofdata.In kernel SVMwe usea polynomial set, which is better to determine the performanceasmanydatasetsarenon-linear.
H.Samiet.al.[3]usesMLmethodstodiagnosecancerand detect cancer cells. Here we use accuracy and susceptibility. The different methods used for detection and diagnosis are Sparse Compact incremental learning machines, Gauss-Newton Representation CNN, and Gene Expression Learning. The Sparse Compact incremental learning machine works on microarray gene expression data, which makes it robust against diverse noise and outliers. Due to its compact nature, it can also do classification tasks. The Gauss-Newton Representation uses sparse representation with training sample representation. It is useful to recognize a pattern. CNN predicts the result by considering the majority of votes. Geneexpressionlearningtendstomakethemachinelearn about the various genes present so that it can determine whether a gene is normal or abnormal depending on the dataitcontains.
R.Mosayebiet.al.[4]isaboutthedetectionofcancercells in blood vessels using various methods? The methods are mobile nanosensors and molecular communication anomaly detection. Mobile nanosensors predict the detection of cancer cells by counting the biomarkers that flow independently in the blood vessels. If cancer biomarkersareobserved,thentheMNimmediatelywarns according to the reading at different points by using the summation method. Molecular communication anomaly detection is known for the communication between different nano-machines which carry on communication between themselves and determine if cancer cells are present.
M. Vijay et. al. [5] uses Histopathological images to determine cancer cells. The techniques used here are
simpleCNN,dilatedCNN,andChanel-wiseseparableCNN. The simple CNN uses three layers, which are as follows: threemax-poolinglayers,threefullyconnectedlayers,and one output layer. The dilated CNN takesless time and has higher training than usual CNN. It also does not have a poolinglayerasitskipsdimension-makingonimagepixels and produces output based on softmax and SVM classification layers. It is also faster and takes less time to computethansimpleCNN.
YillinYanet.al.[6]talksaboutalgorithms,techniques,and applications.ThealgorithmsandtechniquesareCNN,RNN, RvNN, and DBN. RvNN uses a tree-like structure, which is tender for NLP. RNN is appropriate for sequential information and is tender for NLP and speech processing. CNN was originally used for image recognition but is also used for NLP, speech processing, and computer vision. DBNisanunsupervisedlearningtechniquethatisusedon adirectedconnection.
Cruz-Roa et. al. [7] Is on image processing, visual interpretation, and automation. The techniques used in this paper are unsupervised and will follow a series of stepstogetthedesiredoutput.AnAutoencoderisusedto get the most similar output for the given input. Image representation is accomplished through convolution and pooling by mapping the image by a set of k feature maps, and the original size of the image is increased by k in the process. Detection of BCC via Softmax, which is used in logistic regression, is used to calculate the theta vector which tells whether the input image is cancerous. This is consideredbyavalueproducedbetween0and1byusing thesigmoid-activationfunction.
Abien Fred et. al. [8] is to detect breast cancer and to undertake consideration of visual processes and use algorithms. The techniques used are SVM, softmax, linear regression, and GRU-SVM. SVM has the highest accuracy among all the different algorithms used, which is 89.28%. GRU-SVM took 2 minutes and 54 seconds to complete its training. Linear classifiers are used as datasets, which meanstheyhaveusedlinearregressionandSVM.
Mehedi Masud et. al. [9] Is about the training of a convolutional neural network on breast cancer to get efficient output. Transferring knowledge is more feasible than starting to learn from scratch, which means that the CNN model does not need to learn everything from scratch; instead, it can transfer knowledge to the desired machine. Resnet, Alexnet, etc. models are tested to see which gives the best accuracy. Resnet has an accuracy of 96.4%. It had 152 layers and also consisted of a residual layer,whichisveryimportantincopyingthe image tothe next layer. Performance metrics considered are accuracy, specificity, recall, and sensitivity. CNN models provide
e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
automatic detection of breast cancer via ultrasound images.
M. E. Gamil et. al. [10] is about breast cancer detection in the early stages using image processing. The first block is aboutthefilteringblock,whichscanstheultrasoundimage but, due to noise, a speckle is created which blurs the image.Filteringandsmoothinghelptoremovenoisefrom the image. The single image is divided into multiple segments to extract accurate information from it. The malignant value is considered and then the output is passedaccordingly.
Priyanka Shahane et. al. [11] has been referred to get an ideatowriteasurveypaperonmachinelearning.
Priyanka Shahane et. al. [12] has been referred to get a brief idea on survey paper and how to write a survey paperonrelatedtopics.
After the dataset is prepared, it's time to use the machine learning algorithms to come up with an acceptable model which will predict the cancer stages upon feeding similar knowledge. Generating a model here suggests that coaching a machine learning model by feeding it knowledge in order that it will acknowledge a particular target and predict its values by feeding it additionalknowledgewhilenotthetarget.
The algorithms tried, during this case, were the simple regression model, the provision regression model, and therefore the rainforest regressed model of these models were trained against numerous sets of parameters to envision that setting provides the very best accuracy, and thereforethefinalparameterswere:
'Specimen Type', 'Sample Type', 'DNA Input', 'TMB (no synonymous)','Sex',' diagnosing Age', 'Tumor Purity', 'Treatment'
Sr.
1
Thedatasetwassplitintoamagnituderelationof75:25, wherever seventy five p.c of the dataset was utilized in coaching the model, and twenty five p.c in testing the accuracy.
Finally, the rainforest regress or model gave the very best accuracy with a delta of 0.7 once predicting a cancer stage.Precisioniscalculatedas,
Precision:TruePositive/TruePositive+FalseNegative
Where TP (True Positive) test result detects the condition when the condition is present. FP (False Positive) test result detects the condition when the conditionisabsent.
Recall(Sensitivity/TPRate)canbecalculatedusingthe following,
Recall:TruePositive/TruePositive+FalseNegative
Where the FN (False Negative) test result does not detecttheconditionwhentheconditionispresent.
Where True Negative test result does not detect the conditionwhentheconditionisabsent.
FPRatecanbecalculatedas, False Positive Rate: False Positive / False Positive + True Negative
Accuracycanbecalculatedas,
Accuracy: sum (absolute (Expected Output – Actual Output))/2
A.V. PawarandS. Ahirrao
Awatramani
(IRJET)
e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
ofMalignantCells:A StepTowardsBetter Life[2]
3 MachineLearning approachesinCancer detectionand diagnosis[3]
Method andDecisionTree decision boundary accuracyin Random Forest Tree
CNN Gauss-Newton representation,Gene expressionlearning, Sparsecompact incrementallearning machine(SCILM)
Mammogr aphicmass datasets
4 EarlyCancerDetection inBloodVesselsUsing MobileNanosensors [4]
5 DiagnosingCancer Cells Using Histopathological ImageswithDeep Learning[5]
6 ASurveyonDeep Learning:Algorithms, Techniques,and Applications[6]
7 ArchitectureforImage Representation,Visual Interpretability,and Automated[7]
8 Onbreastcancer detection:an applicationofmachine learningalgorithmson theWisconsin diagnosticdataset[8]
mobile nanosensors (MNSs)
Channelwise separablewith dilatedCNN
Molecular communication, anomaly detection,
SimpleCNN,Dilated CNNChannelwise separableCNN
Convolutional Neural Networks(CNN)
BasalCell Carcinoma(BCC)
RecurrentNeural Network(RNN), RvNN,DBN,DBM
9 Pre-Trained ConvolutionalNeural NetworksforBreast CancerDetectionUsing UltrasoundImages[9]
10 FullyautomatedCADx forearlybreastcancer detectionusingimage processingand machinelearning[10]
SVM
andN.Hasteer
Accuracyand susceptibility 2017 H.Sami,M. Sagheer,K.Riaz, M.Q.Mehmood andM.Zubair
NA Accuracynot mentioned 2018
R.Mosayebi,A. Ahmadzadeh,W. Wicke,V.Jamali, R.Schoberand M.Nasiri-Kenari
RNA-Seq 99.7accuracy 2021 S.K.V.N.andM. Vijay
NA Noaccurate accuracy 2018 YillinYan,S.S Iyengar,ShuYengChing
Multineural networks BCC dataset 89.4% in Fmeasure and 91.4% in balanced accuracy
LinearRegression, NearestNeighbor (NN)search,and Softmax Regression,
70% for the training phase, and 30% for the testing phase, linear classifier asdataset.
2018 Cruz-Roa,A.A., ArevaloOvalle, J.E.,Madabhushi, A.,González Osorio
9.28%test accuracy 2018 Abien Fred M. Agarap
VGG16 DenseNet,ResNet CNN Around99% accuracy 2021 Mehedi Masud, M. Shamim Hossain
AutomatedCAD Logisticregression, anisotropicdiffusion Image dataset (ultrasoun d)
95.3% accuracy 2018 M.E.Gamil,M. Mohamed Fouad,M.A.Abd ElGhanyandK. Hoffinan
International Research Journal of Engineering and Technology (IRJET)
e-ISSN: 2395-0056
Volume: 09 Issue: 11 | Nov 2022 www.irjet.net p-ISSN: 2395-0072
From this survey we can conclude that the problem of detectingcancercellinearlystagescanbesolvedbyusing variousmachinelearningtechniques/algorithmslikeCNN, SVM, RNN, Linear regression, BCC, KNN, Random Forest, and so on. Out of all these algorithms CNN and Linear regression have the best accuracy and can be used to retrieve an efficient and accurate output. It has 99.7% of accuracy. Further we can use deep learning and softmax regressiontoincreasethroughput.
[1] P. Hajela, A. V. Pawar and S. Ahirrao, "Deep Learning forCancerCellDetectionandSegmentation:ASurvey," 2018 IEEE Punecon, 2018, pp. 1-6, doi: 10.1109/PUNECON.2018.8745420.
[2] J. Awatramani and N. Hasteer, "Early Stage Detection of Malignant Cells: A Step Towards Better Life," 2019 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), 2019, pp. 262-267, doi: 10.1109/ICCCIS48478.2019.8974543.
[3] H. Sami, M. Sagheer, K. Riaz, M. Q. Mehmood and M. Zubair, "Machine Learning-Based Approaches For BreastCancerDetectionin MicrowaveImaging,"2021 IEEEUSNC-URSIRadioScienceMeeting(JointwithAPS Symposium), 2021, pp. 72-73, doi: 10.23919/USNCURSI51813.2021.9703518.
[4] R. Mosayebi, A. Ahmadzadeh, W. Wicke, V. Jamali, R. SchoberandM.Nasiri-Kenari,"EarlyCancerDetection in Blood Vessels Using Mobile Nanosensors," in IEEE Transactions on NanoBioscience, vol. 18, no. 2, pp. 103-116, April 2019, doi: 10.1109/TNB.2018.2885463.
[5] S.K.V.N.andM.Vijay,"DiagnosingCancerCellsUsing Histopathological Images with Deep Learning," 2021 Sixth International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), 2021, pp. 148-152, doi: 10.1109/WiSPNET51692.2021.9419468.
[6] Yillin Yan, S.S Iyengar,Shu-Yeng Ching, “A Survey on Deep Learning: Algorithms, Techniques, and Applications”,ACM2018,Volume51,Issue5
[7] Cruz-Roa, A.A., Arevalo Ovalle, J.E., Madabhushi, A., González Osorio, F.A. (2013). A Deep Learning Architecture for Image Representation, Visual
Interpretability and Automated Basal-Cell Carcinoma Cancer Detection. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013. MICCAI 2013. Lecture Notes in Computer Science, vol 8150. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40763-5_50
[8] Abien Fred M. Agarap. 2018. on breast cancer detection: an application of machine learning algorithms on the Wisconsin diagnostic dataset. Proceedings of the 2nd International Conference on Machine Learning and Soft Computing (ICMLSC '18). Association for Computing Machinery, New York, NY, USA,5–9.https://doi.org/10.1145/3184066.3184080
[9] Mehedi Masud, M. Shamim Hossain, Hesham Alhumyani,SultanS.Alshamrani,OmarCheikhrouhou, Saleh Ibrahim, Ghulam Muhammad, Amr E. Eldin Rashed, and B. B. Gupta. 2021. Pre-Trained Convolutional Neural Networks for Breast Cancer Detection Using Ultrasound Images. ACM Trans. Internet Technol. 21, 4, Article 85 (November 2021), 17pages.https://doi.org/10.1145/3418355
[10] M. E. Gamil, M. Mohamed Fouad, M. A. Abd El Ghany and K. Hoffinan, "Fully automated CADx for early breast cancer detection using image processing and machine learning," 2018 30th International Conference on Microelectronics (ICM), 2018, pp. 108111,doi:10.1109/ICM.2018.8704097.
[11] Priyanka Shahane, Deipali Gore “A Survey on Classification Techniques to Determine Fake vs. Real IdentitiesonSocialMediaPlatforms,”IJRDT,2018.
[12] Priyanka Shahane,”A Survey on Book Recommendation System” ,Volume 9, Issue 5, May2021