Intrusion Detection System Using Machine Learning: An Overview

Page 1

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

Intrusion Detection System Using Machine Learning: An Overview

1Student,Master of Technology, Computer Science Engineering 2Assistant Professor, Computer Science Department, Dronacharya college of Engineering, Gurugram ***

Abstract - Today's wireless networks are faced with rapid expansions in errors, flaws, and attacks that threaten to undermine their security. Since computer networks and applicationsarebuiltonmultipleplatforms,networksecurity is becoming increasingly important. Both complex and expensive operating programs may have security vulnerabilities. The term "intrusion" refers to attempts to break security, completeness, and availability. Network security vulnerabilities and abnormalities can be identified using an IDS. The development of intrusion detection technology has been a burgeoning field, despite being often regarded as premature and not as an ultimately comprehensivemethodoffightingintrusions.Securityexperts andnetworkadministratorshavealsomadeitaprioritytask. This means that more secure systems cannot replace it completely. Using data mining to detect intrusion, IDS is able to predict future intrusions based on detected intrusions. An extensive review of literature on the use of data mining methodsforIDSispresentedinthispaper.First,wewillreview data mining approaches for detecting intrusions using real time and benchmark datasets. This paper presents a comparisonofmethodsofdetectingintrusionsinthenetwork with their merits and demerits. In this paper, we propose approaches to improve network intrusion detection.

Key Words: Intrusion detection, Security, Machine Learning,Datamining.

1. INTRODUCTION

Duetotherapidincreaseinthenumberofapplicationsand organizations using computer networks, security is becoming increasingly important. Most companies use networksecuritytoolslikeantivirusandanti spamsoftware to protect themselves from network attacks. These tools can'tdetectcomplexornewattacks,however.

AnIDS [1]enablescomputernetworksandcomputers to detect and eliminate unwanted intrusions. Identifier systems can collect and process information from various sourceswithinanetworkorcomputer,identifyingthreats that can make people vulnerable, such as misuse and intrusion. IDSs (Intrusion Detection Systems) [2] are systems that continuously monitor and analyze events occurringonanetworktodetectmaliciousactivity.IDSare now regarded as an important element of the security infrastructureinmostcompanies.Bydetectingintrusions, companies can deter attacks on their networks. Security

professionals could use this method to reduce current networksecurityrisksandthecomplexityofcurrentthreats.

The procedure of gaining extra approval by gaining access to a database is how attackers to compromise databases,approveduserswhoabusetheirassentarehow approvedusersgainaccesstodatabases.AnIDSidentifies assaultsthatappeartobeunusualorharmfulinpurpose[3]. The existence of various types of intrusions has been identified using different techniques, but there are no heuristics to confirm their accuracy. The majority of traditional IDS rely on human analysts to distinguish betweeninvasiveandnoninvasivenetworkdata.Becauseof theconsiderabletimeframenecessarytonoticeanassault, fast attacks are not practical. For network owners and operators, access to the internet is a particularly delicate topic.Astheonlineworldoffersdifferenthazards,several solutionsaredesignedtoavoidinternetassaults.

Thedataminingmethodisusedtoderivemodelsfrom massivedatasets.Thetechnologybehindmachinelearning anddeeplearninghasenabledawiderangeofdatamining techniquesinrecentyears.Intrusiondetectionresearchuses a variety of techniques, including classifiers, link assessments,andsequenceanalysis.

Data mining is essential for detecting intrusions by machine learning. It can provide insights into possible behaviors based on prior experiences. The most common dataminingtechniquesareahybridassociation,clustering, andclassification.Groupingdatabasedonresultsiscalled clustering. Clustering is most commonly done using K means.

Themostpopulartechniqueusedbymininganalystsis classification and prediction, which creates models, characterizes data and projects the future to extract important insights. Extends the IDS by categorizingoutcomesasregularorabdominalusingmetric basedcategorization.Thetechniquesusedtomineauditdata werequestionedforconsistency,whichresultedtoseveral proposals for improvements to the present data mining technologies.

A variety of data mining approaches have been described in the literature as being useful in detecting networkbreaches.Thispaperdelvesintohowdatamining algorithmsmaybeutilizedforintrusiondetectioningreat depth. Advantages, restrictions, and effectiveness are also

©
| Page965
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

discussedinthepaper.Withthisextensiveinvestigation,IDS functionalitymaybeenhancedevenfurtherinthefuture.

As far as content is concerned, Section 2 of the paper summarizes the previous literature studies on intrusion detection based on data mining. Identifying the strengths, and weaknesses, and evaluating their performance efficiency,Section3summarizesthesetechniques,Section4 reviewspriorstudies,whileSection5wrapsupthedebate byoutliningpotentialfutureimprovements.

2. LITERATURE SURVEY

Basedonfuzzyentropy,Varmaetal.[4]proposedafeatures technique for real time intrusion detection datasets using Ant Colony Optimization (ACO) techniques. Both discrete and continuous traffic characteristics could be extracted usingthistechnique.Inordertodeterminethemostvaluable characteristicamongthedetectedcharacteristics,ACOuses thefuzzy entropyheuristic. Classifiersare therefore more accurateatdetectingintrusions.Areal timeintrusionthreat detectiontechnologyprovidedthebestsolutionforthistask.

Usingasupportvectormachine,ThaseennandKumar [5]describeanintrusiondetectionmethodwhichisbased onchisquareproperties.Bycalculatingthelargestvariance for each feature, we improved the parameters of SVM. Reverse variance considerably reduces variance, which improveskernelparameters.Usingvariancebalancing,the SVMparameterswereimprovedinthisintrusiondetection model.Theresultsimprovedclassificationaccuracy.

Khammassi&Krichen[6]derivedthebestsampleof characteristicsfromIDSusingafeaturedselectionmethod. To reduce the dataset size, thepre processed dataset was first re sampled, and then a wrapper method was used. Geneticalgorithmsandlogisticregressionwereusedinthe method.Thewrappingtechniqueallowsnetworkintrusions tobeidentifiedusingNBTree,RandomForest(RF),andC4.5 classifiers.

An anomaly based IDS has been proposed by Aljawarneh et al.[7]. In order to determine which characteristicsweremostimportant,avotingmethodledto aninformationgain.Bydoingso,basiclearners'probabilities couldbeintegrated.Severalclassifiers,includingREPTree, AdaBoostM1,MetaPaging,Na*veBayes,andRandomTree, wereimplementedtoidentifynetworkintrusionswiththe givenattributes.

Kabir et al. [8] developed LS SVM (Least Square Support Vector Machine).There were two stages to this method.Thedatasetwasdividedbasedonarbitrarycriteria intopresetsubgroups.Thecharacteristicsthatdistinguished thesegroupswerethenanalyzed.Theywerelistedtogether in the same order. For determining the most efficient allocation method,the variability of the data within

subgroups was examined. To extract samples from a network,weusedLS SVMinphasetwo.

AccordingtoKhanetal.[9],intrusiondetectionshould be performed in two stages. First, network traffic was categorisedusinglikelyscorevalues.Whendeterminingif theintrusionwasaroutineoranassault,deeplearningused this likelihood score value as a second measure. The probability score in step two was applied to avoid overfitting.Byusingthistwo stagetechnique,itispossible to handle large volumes of unlabeled data effectively and automatically.

BasedonConvolutionalNeuralNetworks(CNNs)and featurereductiontechniques,Xiaoetal.[10]developedan intrusiondetectionmodel.Astepintheprocessofintrusion detectionisthereductionofdimensionalitybyeliminating irrelevant or redundant characteristics. A CNN algorithm was used to extract features from the reduced data. A supervisedlearningapproachwasusedtoobtainthedata thataremoresuccessfulindetectingintrusions.

AmongtheapproachespresentedbyZhangetal.[11]to detect network intrusion is a deep hierarchical network. Spatial and temporal aspects of flow were studied using LeNet 5 and LSTM. Various network cascade mechanisms wereusedtotrainthedeephierarchicalnetworkinsteadof two.Itwasalsoexaminedhowtheflowofinformationinthe networkvaries.

3. OBSERVATION

Comparative study of strengths and limitations of the intrusiondetectiontechnologiesdiscussedinthepreceding section. Each approach has advantages and downsides, as illustratedinTable1.Thetabulateddatamakesitsimpleto determinewhichstrategyworksbestandprovidesthemost advantages. In this observation table, we can see how currentmethodsareflawed,andwecancomeupwithnew ideastosolvethem.

ThisobservationtablegivesanideaaboutthestudyofIDS usingmachinelearningmethods.Fuzzyentropybasedwhich gives 99.5% accuracy with time convergence of ACO Chi squarefeatureselectionmethodsgive95.8%accuracybut the selection of functions in SVM is difficult. The genetic Algorithmwith99.9%,failedtoobtaintheoptimalsubset.

Hybrid model,99.2 % accuracy .and supported fully distributed network. These types of algorithms used especially in network based IDS have higher accuracy obtainedduringthelearningprocess.Severalapproachesto dataminingweretestedontwosetsofintrusiondata:NSL KDD and UNSV NB15 to determine the effectiveness of intrusion detection. The study provides information on unbalanced data distributions,convergence times, normal and abnormal traffic distributions, classification, and

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page966

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

detection rates. The combination of network intrusion algorithms and deep learning algorithms such as OSS, SMOTE, and BiLSTM, combined with a deep hierarchical network and learning algorithm, provides superior performanceintermsofaccuracyandprecision.

decisiontreemethodssuchasRFandXGBoostareusedto buildadvancedlearningalgorithms.

1.1.2 K Nearest Neighbors

InKNN,"featuresimilarity"isusedtopredictadatasample's class based on its features. By calculating the distance betweenitanditsneighbors,itidentifiessamplesbasedon theirneighbors.AparametercalledkaffectsKNNalgorithm performance.Themodelcanoverfitwithsmallvaluesofk. Karatas et al[12]. CSE CIC IDS2018 has been used as a benchmark dataset to compare the performance of ML algorithms. Selecting very large k values led to incorrect classification of the sample instances. An improvement in detectionrateforminorityclassattacksresultedfromusing Synthetic Minority Oversampling Technique (SMOTE) to resolvedatasetimbalance.

1.1.3 Support vector machine (SVM)

Asupervisedmachinelearningalgorithmthatconsistsofan n dimensionalhyperplanewhoseelementsarespacedcloser thanthedistancebetweenthem.Bothlinearandnonlinear problems can be solved with SVM algorithms. A kernel functionistypicallyappliedtononlinearproblems.Withthe kernelfunction,aninputvectoristransformedintoahigh dimensional feature space first. After that, the support vectors are used as a decision boundary to determine the maximal marginal hyperplane. NIDS can be improved by using the SVM algorithm to correctly identify normally occurringandmalicioustraffic.

1.1.4 K mean clustering

4. METHODOLOGY

1.1 Machine Learning

Thecreationofanalyticalmodelsisautomatedthroughthe use of machine learning, a data analysis tool. A type of artificial intelligence that uses data analysis to detect patterns,recognizetrends,andtakeactionbasedonminimal humaninvolvement.Whenmachineslearnfromsufficient dataanddevelopmodelscapableofdetectingattackvariants andnewattacks,intelligentintrusiondetectionsystemsare able to achieve satisfactory detection levels. Our study is primarilyfocusedonidentifyingandsummarizingIDSsthat haveutilizedmachinelearninginthepast.

1.1.1 Decision Tree

DecisionTreesareaclassofSupervisedLearningalgorithms that can be used for predicting categorical or continuous variables.Itworksbybreakingdatafromtherootnodeinto smallerandsmallersubsetswhileincrementallybuildingan associated decision tree. Decision nodes create a rule and leaf nodes deliver a result. In addition to CART, C4.5, and ID3,therearenumerousotherDTmodelsavailable.Multi

Aclusterisagroupofsimilardatathatisgroupedtogether. K Mean clustering is an unsupervised system for dividing dataintomeaningfulclustersusingcentroidbasediterative learning.Datasetsconsistofcentroids(clustercenters),and K is their number. To assign data points to clusters, a distance calculation is usually used. During clustering, reducing the distance between data points is the main objective.

The clustering concept is used in the RF model in a multilevelintrusiondetectionmodelframework,Yaoetal. [13]. Four modules were combined into the proposed solution: clustering, pattern discovery, fine grained classification, and model updating. A potential attack that isn'tdetectedinonemodulewillthenbepassedtothenext one. KDD Cup'99 dataset has been used in testing this proposedmethodology.Themodelstillshowedsuperiority despitefewerattacksinourdataset.

1.1.5 Artificial neural network

Aneuralnetworksimulatesthewaythebrainworks.Aspart of the ANN output, there are hidden layers and data

©
| Page967
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

layers.Layerswithinalayerareentirelyconnectedtoone another. An ANN is one of the most popular Machine learningtechniquesandhasproventobeanefficientmethod to detect different types of malware. The most frequently used learning algorithm for supervised learning is backpropagation(BP).Tostartwith,weightsarerandomly assignedatthestartoftraining.Aweighttuningalgorithmis then implemented to specify which hidden unit representation minimizes the misclassification error the best. IDS based on ANN still requires improvement, especially for less frequent attacks. Less frequent attacks haveasmallertrainingdatasetthanmorecommonattacks, meaning ANN has a harder time learning their characteristicscorrectly.

1.1.6 Ensemble method

Ensemble methods'principal advantageis thattheyallow you to take advantage of the different classifiers by using themtogether.Classifiersdifferintheirbehaviors,sotheir usemaynotalwaysbeoptimal.Itmaybethatsometypesof detectionprogramswork wellfordetectingsometypesof attacksbutfailtodetectothers.Combiningweakclassifiers is an ensemble approach, which involves training many classifiers while selecting the best one through a voting algorithm.

AnensemblemethodbyShenetal.[14]toproposean IDS that utilized most of the ELM features. During the ensemble pruning phase, our proposed methodology is optimizedusingaBAToptimizationalgorithm.Datafromthe KDDCup'99,NSL KDD,andKyotodatasetswereusedtotest the model. Results of the experiments indicated that combining multiple ELs in an ensemble manner outperformedeachELindividually.

Usingthedeepneuralnetwork(DNN)andDTasabase classifier, Gao et al.[15] proposed an adaptive ensemble modelusingandadaptivevotingalgorithmtopickthebest classifier. In experiments performed with the NSL KDD dataset,theproposedmethodologyhasbeenverified.Other models were compared to demonstrate its efficiency. For weakerattackclasses,itdidn'tperformwell.

CONCLUSION AND RECOMMENDATIONS

AdetailedoverviewofdataminingstrategiesbasedonIDS mostlyinnetworkisofferedinthisstudy.Theadvantages anddisadvantagesofthesestrategiesarealsoexaminedin order to offer future options for improving intrusion detection performance and thereby improving IDS. The findings of the comparison investigation revealed that insiderthreatdetectionutilisingdeephierarchicalnetworks had greater accuracy, clarity, and recall. However, the intrusion detection system algorithm's training period is lengthy. Because the efficiency of the system of wireless intrusion detection systems are poorly quantified in the

precedingcomparisons,amachinelearning basednetwork detectionmodelispresented.Machinelearningcapabilities that automatically extract and select features reduce the difficultyofcalculatingdomain specific,manuallygenerated features and allow you to skip the traditional attribute selectionphase.Deeplearning(DL)isalsowidelyusedina varietyoffieldsandhasproventobeeffective.Therefore,for thenextfewyears,wewillusemachinesanddeeplearning algorithms to prevent overfitting with zero elements, addressmodeltrainingissueswithalimitedpercentageof attack classifications, and avoid DNNs. Increases the effectiveness of intrusion detection and prevention. Misunderstandingsduetocontroversialinputformationand ultimatelysolvingtheproblemofinstabilityincyberattacks.

REFERENCES

[1]iMohitiSiD,iGayatriiBiK,iVrushaliiGiM,iArchanaiLiG iand iNamrata iR. iB i(2015). Using IArtificial iNeural iNetwork iClassification iand iInvention iof iIntrusion iin iNetwork iIntrusion iDetection iSystem. iInternational iJournal iof iInnovative iResearch iin iComputer iand iCommunicationiEngineering,i3(2).i

[2]iZamaniS,iEl AbediMiandiKarrayiFi(2013iJanuary). ,Features iselection iapproaches ifor iintrusion idetection isystemsibasedionievolutionialgorithms.

[3] iNazir iA i(2013). iA iComparative iStudy iof idifferent iArtificial iNeural iNetworks ibased iIntrusion iDetection iSystems.iInternationaliJournaliofiScientificiandiResearch iPublicationsi.

[4] iVarma iP iR iK, iKumari iV iand iKumar iS iS i(2016). iFeatureiselectioniusingirelativeifuzzyientropyiandiant icolony ioptimization iapplied ito ireal time iintrusion idetectionisystem.iProcediaiComputeriScience,i85,i503 510.i

[5] iThaseen iI iS iand iKumar iC iA i(2017). Intrusion idetection imodel iusing ifusion iof ichi square ifeature iselection iand imulti iclass iSVM. iJournal iof iKing iSaud iUniversity Computer iand iInformation iSciences, i29(4), i462 472.i

[6] iKhammassi iC iand iKrichen iS i(2017). iA iGA LR iWrapper iApproach ifor iFeature iSelection iin iNetwork iIntrusioniDetection.iComputersi&iSecurity,i70,i255 277. i

[7]iAljawarnehiS,iAldwairiiMiandiYasseiniMiBi(2018). iAnomaly based iintrusion idetection isystem ithrough ifeatureiselectionianalysisiandibuildingihybridiefficient imodel.iJournaliofiComputationaliScience,i25,i152 1613] iKabiriE,iHuiJ,iWang iHiandiZhuoiG i(2018).iA inovel istatistical itechnique ifor iintrusion idetection isystems. iFutureiGenerationiComputeriSystems,i79,i303 318.i

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page968

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056 Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

[8]iKabiriE,iHuiJ,iWangiHiandiZhuoiGi(2018).iAinovel istatistical itechnique ifor iintrusion idetection isystems. iFutureiGenerationiComputeriSystems,i79,i303 318.i

[9] iKhan iF iA, iGumaei iA, iDerhab iA iand iHussain iA i(2019). iA inovel itwo stage ideep ilearning imodel ifor iefficient inetwork iintrusion idetection. iIEEE iAccess, i7, i30373 30385.i

[10]iXiaoiY,iXingiC,iZhangiTiandiZhaoiZi(2019).iAn iintrusionidetectionimodelibasedionifeatureireduction iand iconvolutional ineural inetworks. iIEEE iAccess, i7, i42210 42219.i

[11] iZhang iY, iChen iX, iJin iL, iWang iX iand iGuo iD i(2019) iNetwork iintrusion idetection: iBased ion ideep ihierarchical inetwork iand ioriginal iflow idata. iIEEE iAccess,i7,i37004 37016.i

[12]iKaratasiG,iDemiriO,iSahingoziOK.iIncreasingithe iperformance iof imachine ilearning based iIDSs ion ian iimbalanced iand iup to date idataset. iIEEE iAccess. i2020;8:32150 32162. ihttps://doi.org/10.1109/ACCESS.2020.2973219 i

[13]iYaoiH,iFuiD,iZhangiP,iLiiM,iLiuiY.iMSML:iainovel imultilevelisemi supervisedimachineilearningiframework ifor iintrusion idetection isystem. iIEEE iIoT iJ. i2018;6(2):1949 1959. ihttps://doi.org/10.1109/JIOT.2018.2873125

[14]iSheniY,iZhengiK,iWuiC,iZhangiM,iNiuiX,iYangiY. iAn iensemble imethod ibased ion iselection iusing ibat ialgorithm ifor iintrusion idetection. iComput iJ. i2018;61(4):526 538. ihttps://doi.org/10.1093/comjnl/bxx101.

[15]iGaoiX,iShaniC,iHuiC,iNiu iZ,iLiuiZ.iAniadaptive iensemble imachine ilearning imodel ifor iintrusion idetection. iIEEE iAccess. i2019;7:82512 82521. ihttps://doi.org/10.1109/ACCESS.2019.2923640

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal

©
Page969
|

Turn static files into dynamic content formats.

Create a flipbook