International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
![]()
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
1,2,3 B.Tech Students, Vignan’s University, Vadlamudi, Guntur, Andhra Pradesh, India 4 B.Sc Student, Sri Baba Gurudev Degree And PG College, Sattenapalli, Andhra Pradesh, India 5 Asst. Professor, Vignan’s University, Vadlamudi, Guntur, Andhra Pradesh, India ***
Abstract - Inthepoolofemergingtechnologies, Machine learninghasgainedmuchpopularityinmedicalfielddueto itshighperformanceandaccuracy.Inthesedaysitisvery much essential to use machine learning models in every aspectforhigheraccuracy,specificallyinmedicalfieldsince health is being given more importance to survive. Breast cancerisoneofthemostdangerouscancerdiseaseamong allthecancertypesknowntilldate.Notonlyearlydetection is not the solution but also curing the disease is the most importantissuetobeconsideredintheemergingworld.As the population is growing rapidly , deaths due to breast cancerhasincreasedexponentially.Herewearegoingtobe morefocusedondetection asearlierthedetection, higher the chance to cure. In this study, some of the machine learning algorithms have been employed to detect the diseasesuchasSVM,KNN etconWBCDwhichispublickly availableanduseddatasetformostoftheapplications.The mainmottoofthisresearchworkissignificantcomparision andanalysisontheappliedalgorithmsintermsofaccuracy, precision, recall, f-score. These studies demonstrate that modern machine learning methods could increase the accuracy of early cancer tumour prediction. The wbcd comprises of 569 instances and 32 attributes with no missingvalueswhichhelpsustoidentifythetargeteither malignantorbenign.
Key Words: Breast cancer prediction, Classifier algorithms, SVM, KNN.
Breast Cancer is one of the most significant issue to be consideredseriouslyinmedicineorhospitalitynowadays since deaths due to Breast cancer are increasing exponentially.According to the new reports and reviews 53%oftheindianwomenamongallthereportedcaseshas diedduetoBreastcancer(87090/162468).Todiveintodeep, someofthereasonsforcausingbreastcancerisharmones, radiation therapy ,obesity etc. The intresting fact which I cameacrosswhileI'mwritingthispaperisthatMencanget the breastcancer too. Howeverlessthan 1% mencan face breast cancer which is negligible. But Building the model whichcanfigureoutthat1%mentooisachallengingtask. For early detection of breast cancer there exists some
value:
techniques such as mammography, computer aided detection(CAD)etc,.Inthispaperwewillcometoknowthe influence of the Machine learning algorithms.Recent year study’s has proven that ML models has been gained a percentageof30intheirpredictingpower.Breastcanceris oneofthemostcommonailments/illnessesinIndia,causing many deaths in the present day. The shape of most malignancies cases in women is changing day by day as a resultofchangesinfoodandlifestyle.Itisthesecondmost commoncauseofwomen'slackoflifestylesintheworld.To beginwith,thepaperisaboutthedatasetandthesomeof theimportantinsightsofthedatahasmentionedforclear understanding about the dataset. The proprocessing techniquesifneccessarysinceitishighlyrecommendedto preprocessthedataforimprovedperformanceandaccuracy. Theresultsofthisworkisrepresentedintheformoftablein a comparitive way which includes the accuracy, precision and recall etc. Although there available many algorithms , Logistic Regression is identified as the best for WBCD datasetasitisgivinghighaccuarcy
ThisusesmindofMachineanalyzing(ML)toassumebreast mostcancersbasedtruelycertainlyinrealitytotallyonthe statisticsreceived. Theseveral rangesof breastmaximum cancersarediagnosedthrurightremedyanddetailing.Ifwe donotprovideproperremedytoourpatients,it'sgoingto bring about their loss of lifestyles. Earlier strategies for classifying statistics have been used, but their lower accuracy,becauseofthetruththeymightbeusedforcorrect categorization and prediction. Deep analyzing algorithms andnumericaldatasetsystemstudyingtechniquesareused toextractskillsandhiddenskills.Thistraditionalapproach, it is based mostly on regression, detects the lifestyles of maximumcancers,atthesametimeasnewMLtechniques andalgorithmsareconstructedonmodelintroduction.Inits trainingandfindingoutdegrees,themodelissupposedto forecast unknown facts and offers a pleasing anticipated very last effects. These techniques used to differentiate amongstbenignandmalignanttumors
As you can see in the below imahe there are six different varietiesofcancersthatleadstothedeath.Theultimateand
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
veryprimarygoalofthispaperistoimplementdifferentML algorithms that figures out which ML model best suits for predicting the breast cancer. To determine the number of patientswithnon-cencerousandcanceroustumours,aswell asthetypeoftumour.
[1]The second major cause of cancer-related mortality in women is breast cancer. Breast cancer development is a multi-step process linked to a few different mobile types, andworldwidepreventionisstillchallenging.Oneofthebest ways topreventthisillnessisbyearlydetection ofbreast cancer.Becausetoearlydetectionandtreatment,the5-350dayrelativesurvivalrateofbreastcancerpatientsisabove 80%inseveraldevelopedworldwidelocations.Significant progress has been achieved in both the creation of preventionmeasuresandthestatisticsofbreastcancer in generalduringthepasttenyears.Theidentificationofbreast cancerstemcellsisavaluableresourceforunderstanding the aetiology and processes behind tumor treatment resistance, and several breast cancer-related genes have beenidentified.Humansnowhavemoremedicationchoices for chemo prevention of breast cancer, and natural preventionhasimprovedrecentlytoimprovethequalityof lifeforcancerpatients.Wemayhighlightsignificantresearch on the pathophysiology, related genes, risk factors, and preventativemeasuresofbreastcancerintherecentyearsin thisevaluation..
On the basis of historical data and the existed situations, several researchers have already developed variety of methodologies for risk analysis and prediction of breast cancer. The analysis of breast cancer data for risk identificationwasthemainfocusoftheirstudy.However,a system that forecasts risk based on historical data and current data is a more crucial requirement. The clinical oncologists can utilize their model to help them make decisions. However, different types of persons who are initially exposed to the danger must also be taken into account.Tomoreaccuratelyanticipatethedangers,arulebasedsystemthatcanrecognizethesymptomssoonerand doatemporalanalysisontheirdatawillbehelpful.
Chinaistheworld'smostpopulouscountry.Accordingtoa recent report by the organization (GLOBOCAN-2018), the male-to-female breast cancer ratio is 8.6% for males and 19.2%forfemales.Thisdiseaseclaimsthelivesofover1.3 million people each year. According to statistics, approximately400menand41,900womenarepredictedto dieasaresultofthisdisease.
According to a survey from the United States, 3.8 million women are alive yet have breast cancer. In 2019, 59,838 casesofDuctalCarcinomainSitu(DCIS)breastcancerwere detected in the United States. The total number of breast cancer deaths is 458,000. Breast cancer was the leading cause of mortality in China in 2012.Cancer accounted for 48%ofalldeathsin2012,whereastheglobaldeathratewas 52%.In2015,thestatisticsof1,517womenwereevaluated todeterminethebreastcancersurvivalandrecurrencerate
[2] The author conveyed that Breast cancers is one of the mostnotunusualcancersamongwomenintheworldwide, accountingforthegeneralpublicofrecentmaximumcancers timesandmaximumcancers-associateddeathssteadywith worldwiderecords,makingitanotablepublichealthhassle incontemporarydaysociety.Inthispaper,we'recapableof giftanoutlineoftheevolutionofhugeinformationinsidethe health tool, and check 4 studying algorithms to a breast maximumcancersinformationset.Inthisstudytheauthor triedhisbesttoexplainhowdangerousthiscanceris.Author implemented variety of algorithms and finally achieved a good percentage of accuracy which is perfectly fine to explain someone the seriousness of the cancer. The experimental consequences display that SVM offers the brilliantaccuracyninety seven.NinePercent.Thelocating willassisttochooseouttheterrifickinddevice-readingset oftipsforbreastmostcancersprediction
[3]Amalignantgrowthcalledabreasttumordevelopsinside theglandularepitheliumofthebreast.Itisregardedasone ofthemalignanciesthataffectswomenthemostfrequently in the globe. However, there isn't always a very effective method of treating breast cancer. The early diagnosis and assessment of breast tumours, however, is a crucial componentinloweringtheriskofmortality.Assessmentof medicalpicturesfrommanymodalitiesistypicallyrequired foranaccurateappraisalofbreastmalignancies.Thereisa great demand for an automated equipment that could properly examine the photographs. In this paper, we introduce some commonly used scientific imaging techniques for analysis of breast most cancers, and based totallyonthemwetakealookatsomepresentlyproposed techniques for breast most cancers detection with laptop
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page496
Fig -1:VariousKindsofBreastCancer
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
vision and device reading techniques. Finally, we've got a takealookatandfeaturealookatthedetectionnotunusual standardeverydaytypicaloverallperformanceofnumerous techniques on histological images and mammograph pix respectivelyBreastmostcancersisamalignanttumorthat takesregionwithintheglandularepitheliumofthebreast. Sometimes,thetechniqueofmobileincreasegoesincorrect. New cells form even the frame doesn’t need them and antiqueordamagedcellsdonotdieastheywantto.When thistakesregion,aboomofcellsfrequentlyworkplacework amassoftissuereferredtoasalump,boom,ortumor.Its onset is frequently associated with heredity, and the incidenceofbreastmaximumcancersishigheramongladies maximumoftheawhileoffortyand60oratafewdegree insidethemenopause.
WeobtainedtheWisconsinBreastCancerDiagnosisdataset whichispubliclyavailableontheinetrnetandwe’veutilised GoogleColabasthedevelopmentplatform.SupportVector ClassifierMachineLearningalgorithm,KNN,RandomForest, Adaboost,andXgboostClassifierareamongthesupervised learningalgorithmsandclassificationtechniquesusedinour methodology, along with the K-fold cross validation technique
Mainlythereareexiststhreemethodswhichexactlydefines ourproposedmodel.Theymainlyconcentrateonthedataas well as on the algorithm that is being used which means modelthatisbuilttopredicttheoutput.Primarily,Feature extractionhasapivotalroleinpredictingtheoutputforany MachineLearningModelthatisbuilt.TheaccuracyofanyML model completely depends on how the data is being extractedandarrangedinsuchawaythatitlearnsandtests. Secondly, Model building is a process where the Machine learning algorithm is employed to predict the accuracy. Howevertheoutputispredictedbasedonthedatapatterns. Firstofallthemodellearnsfromthedataandpredictsbased ontestdata.Finally,Modelevaluationisthefinalstepwhich accuratelygivestheoutputofmodelthatisbeingdeveloped.
Fig -2:WorkFlow
Above heatmap consists of variety of different feature’s value. As far as the heatmap is concerned ,it is used for showing the correlation among the features. As shown in Fig,.3“Thickerthecolourshowsthemorecorrelationthan the lighter colour” which means that As the strength of colour increases the correlation between the feature gets movedtowardspositivecorrelation
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
As perour earlierdiscussionwehaveusedWBCDdataset whichisavailableontheinternet.Oursubsequentstepisto applynumerousmagnificencealgorithmstocategoriseour result in to 2 instructions in particular: - Benign and Malignant. Overall our dataset consists of 579 values in which357areBenigni.,enoncancerouscellsand212are malignanI.,ecancerous cells. Thebelowmentionedtables shows the accuracy gained by different machine learning algorithmsusedinthisproject.EachAlgorithmhasitsown advantagesanddisadvantagesasweallarewellawarethata coinhasbothheadandtail,right?So,XGboostisagradient boosted tree algorithm that gains maximum accuracy and alsogot0%Type-IIerrorwhichisprettygoodandindicates that our model has gained perfect accuracy without any errorormisclassification
Table -1: Accuracyofproposedsystem
S.NO Algorithm
Aspertheabovetableitisbeingproventhatwehavegot0% Type-II error which is perfectly Fine. It means that our model predicts that there are no such values called False Negativefurthurmoreit representsthereexistsnoincorrect dataprediction.
Researchinrecentyearshasshownthatmachinelearning models are gaining much popularity due to high accuracy andpredictionpower.Inthispaperisisbeingproventhat XGBOOSTmodelhasgot98.24%accuracy.Currentlythese
aremanynumberofmachinelearningtechniquesexiststo analysemedicaldata.Buildingpreciseandcomputationally effectiveclassifiersformedicalapplicationsisachallenging problem in the era of digital technologies. In order to discover the optimum classification accuracy, we used machinelearningalgorithmsontheWisconsinBreastCancer (WBCD) dataset in this paper. The XGBOOST classifier provided the highest level of classification accuracy. Early diagnosis is therefore crucial, and invasive techniques' detectionmakesmassforecastsmuchsimpler.Theresultsof the examination are proven to be quite accurate in predictingbreastcancer.Thesuggesteddevicecanquickly ascertaintheseverityofthesicknessandforecastwhether the patient will survive the illness or if it will develop to malignancy.
[1]“Y.-S.SunEtAl.,“RiskFactorsAndPreventionsOfBreast Cancer,” International Journal Of Biological Sciences, Vol. Thirteen,No.Eleven,P.1387,2017”
[2] “Y. Khourdifi And M. Bahaj, “Applying Best Machine Learning Algorithms For Breast Cancer Prediction And Classification,” In 2018 International Conference On Electronics, Control, Optimization And Computer Science (Icecocs),Pp.1–5,Ieee.”
[3] “Y.Lu,J.Y.Li,Y.T.Su,AndA.A.Liu,“AReviewOfBreast Cancer Detection In Medical Images,” In 2018 Ieee Visual CommunicationsAndImageProcessing(Vcip),Pp.1–Four, Ieee”
[4]CiosKJ,MooreGW.Uniquenessofmedicaldatamining. ArtificialIntelligenceinMedicine2002;DOI26:1-24
[5]BMGayathriandCPSumathi.AnAutomatedTechnique using Gaussian Naïve Bayes Classifier to Classify Breast Cancer. International Journal of Computer Applications, 2016.DOI10.5120/ijca2016911146
[6]Houston,AndreaL.andChen,et.al.MedicalDataMining ontheInternet:ResearchonaCancerInformationSystem. Artificial Intelligence Review 1999; DOI 13:437-466 4. WittenIH,FrankE:DataMining:PracticalMachineLearning ToolsandTechniques2006DOI10.1186
[7]K.BalachandranandR.Anitha,“Ensemblebasedoptimal classificationmodelforpre-diagnosisoflungcancer”,2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT), IEEE(2013),DOI10.1109/ICCCNT.2013.6726467.
[8]M. Kumar, S. S. Tomar and B.Gaur, “Mining based Optimization for Breast Cancer Analysis: A Review”,InternationalJournalofComputerApplications,vol. 19,no.13,(2015).
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 09 Issue: 08 | Aug 2022 www.irjet.net p-ISSN: 2395-0072
[9]Priyanka Jain & Santosh Kr. Vishwakarma (2016). Collaborative Analysis of Cancer Patient Data using Rapid Miner.InternationalJournalofComputerApplications,145, 8-13.
[10] Priyanka Gupta & Prof. Shalini L(2018): Analysis of MachineLearningTechniquesforBreastCancerPrediction. InternationalJournalOfEngineeringAndComputerScience 7(05),ISSN:2319-7242
[11]S.B.Kotsiantis,SupervisedMachineLearning:AReview ofClassificationTechniques,Informatica31(2007)249-268, 2007
“I am a constant learner in programming and computer science field and also has a keen interestinMLandDatascience “
“Ms.Geetha is an enthusiast of Machine learning and it's applications in the contemporary world,She carried out several projectsinthesame“
“
I am very enthusiastic about computersandprogramming.AsI am an adaptive person, I always learn new technologies in this field.“
“APassionatecoderinterestedin computer science and related field.Addictied to learn new technologiesinrelatedfield.“
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal