Mischievous Urls detection based on multi-feature using Soft Voting Classifier

Page 1


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net p-ISSN: 2395-0072

Mischievous Urls detection based on multi-feature using Soft Voting Classifier

1Reseacrh Scholar, Department of CSE, Shri Ram Institute of Science and Technology, Jabalpur, M.P.

2Professor, Department of CSE, Shri Ram Institute of Science and Technology, Jabalpur, M.P.

3Professor, Department of CSE, Shri Ram Institute of Science and Technology, Jabalpur, M.P.

Abstract – URLs allow Internet users to move from one websitetoanother.Fullyrepresentaccesstocontentstored onserverssomewhereintheworld.URLsareavailableby simplyclickingonalinkorimageortypinginourbrowsers. Afavouritemethodusedbyattackersandchildrenoftextis todeceivethesocialmediabecauseregularusersstillclick onanylinkorvisitanyURLtheyfind.BlockingotherURLsis afundamentalandessentialwaytoprovideabasiclevelof security. With the advent of internet technology, network security is under various threats. In particular, cybercriminals can spread the same dangerous industries (URLs) to attack as criminals by sensitive and spam information. Searching for the wrong URL is vital in preventingthisattack.CybercriminalsusemaliciousURLsas distributionchannelstodistributemalicioussoftwareonthe web. Attackers use browser vulnerabilities to install malicious software so that they can access the victim's computerremotely.Amalwareprogramaimstogainaccess to the network, filter sensitive information, and secretly monitor targeted computer systems. In this project, we compare models and find the best guessing model for classifying spam and ham URLs in a better way. Our proposedpredictionmodelseekstoimprovetheaccuracyof theforecastbyusingvariousfactorsthattakeintoaccount theinteractioneffectofdifferentparameters.

Keywords: Uniformresourcelocators,Phishing,Diversity, Machinelearning,FeatureEngineering,SoftVotingClassifier, Accuracy.

1. INTRODUCTION

Phishingattacksarecybercrimeusingsocialengineeringto deceive users into stealing their information, such as personalidentity,financialinformation,etc.Masquerading aslegitimatesources,attackerscanreachvictimsbysending fraudulentmessagesusingemails(suchasGmail,Outlook, etc.)orsocialmediaplatforms(likeTwitter,Facebook,etc.). Usersbecomevulnerableiftheyinputtheirinformationor download attachment files [1]. In recent years, there has beenanincreaseinsocialmediaplatformattackssinceitis easyforattackerstoreachmanyusersfromanywhereinthe worldbypostingasinglemessage[2].Accordingto[2],the Anti-PhishingWorkingGroup(APWG)reportsthenumberof phishingattacksincreasedby250000inonemonthinJan 2021. In addition, the number of business compromises

increased 56% from the last quarter of 2020 to the first quarter of 2021. Fig. 1.1 shows that the most targeted industries in 2021 are financial institutions, social media, andwebemails[2].

Accordingtofigure1.1,themaingoalofattackersistosteal victims’ financial or personal information by targeting financialmarketsandsocialmediaplatforms,respectively. Attackers can also send malware that can lead to other network attacks, such as malware attacks, ransomware attacks, etc. Most organizations now rely on human knowledgetodetecttheseattacks[4].However,duetothe similaritybetweenlegitimateandfakemessages,phishing attacks are difficult to detect even for experts. Therefore, cybersecurity experts are paying more attention to email linkssuchasuniformresourcelocators(urls)oremailidsto identifyphishingemails.However,attackersareimproving their attack techniques byusing newtechniquesto create phishingattacksthataredifficulttodetect.Forexample,they create phishing urls such as https://www.facebook.com/, https://www.faceb00k.com/, https://www.facebook and web pages that look like harmless urls. Therefore, it is important to determine ways to distinguish between phishingurlsandharmlessurls.Therefore,researchershave proposedvarioussolutionsagainstphishinginrecentyears, suchasblacklists[5],traditionalmachinelearning[6],and deeplearning(dl)[7],[8],[9].

Belowweprovideabriefreviewofeachsolution.

•Ablacklistisalistofwebsiteurlsthataremostlikelytobe phishing sites. Any url or ip on this list will be blocked. However,therearedrawbackstothisapproach.Thesystem needstohaveaphishingattackurltoblockit;iftheurlisnot onthelist,itwillnotbedetected.

Figure1.1:APWGreport2023[2].

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net

• Traditional machine learning models are used to detect phishing attacks. However, traditional machine learning modelsrequiremanualextraction[11].Therefore,extracting asetoffeaturesislabor-intensiveandtime-consuming[12]. Thesefunctionsarebasedonexistingurls.Therefore,when attackers create new phishing urls, the analysis and extractionwillincrease,whichwillcausethefieldlengthto belarger[10].Despiteeffortstoexaminevariousfeatures anddimensions,itisimpossibletoavoidattacksfromnew phishingurls[10].

•Theadvantage ofusingdeeplearningtodetectphishing urls is that the model can extract features from text and images without human intervention. However, due to the development of phishing attacks and new deep learning techniques,therearesomeproblemswithphishingsites.For example,trainamodeltoidentifylongurls.However,itdoes not detect small urls [13]. In addition, deep learning has somedisadvantagessuchasrequiringalargeamountofdata totrain,test,andvalidatethemodel[14],[15].Thecostof thedeeplearningmodel is alsohighduetoitscomplexity [12].Phishingattackscanbedetectedusingdifferenttypes of information, such as url-based [9], content-based [16], [17],andhybrid-based[18].Url-basedmethodsextracturl data without searching other information, such as web pages,directories, etc.However, extracting onlyurl-based features results in missing important features of phishing web pages, such as page names and page codes. It is also difficulttoidentifysmallurlsusingonlyurl-based.Contentbased systems extract information from the web, such as images, JavaScript, text, and hypertext markup language (html) code. Upload to capture. Content integration combinesurl-basedandcontent-basedfeatures.

1.1 Phishing Attacks

Asdiscussedintheprevioussection,phishingattacksareon the rise. Partof theincrease isduetothe ease ofcreating theseattacks.Theycanbedoneanywhereanddonothaveto beinthesamelocationasthevictim.Theseconsequences makephishingattacksoneofthemostdangerousattacksfor individualsandorganizations.Itworksbysendingfakeurls to victims using email and social media. These urls send victims to fake websites that trick them into sharing personalinformation,suchascreditcardnumbersandlogin credentials.

TherearefourpartstoaURL.First,theURLprotocoltellsus howthedataistransferred.Second,thehost,whichcontains thetop-leveldomain(tld)andsecondarydomain(sld)[19]. Tldshelpdistinguishthepurposeofthedomainname;for example,edurepresentsthedomainusedforeducation.The sld usually has a website name. The third is the path containingtheaddressoftherequestedpage.Thelastone canbeaquestionandtherewillbemanyno’s.Itcanalsobe ananchorthattakestheusertoaspecificsectionoftheweb page.Attackerscanusedifferenttechniquestocreatefake urls.First,attackerscancreatephishingurlsthroughnew

p-ISSN: 2395-0072

query(sql)injectionorcross-sitecompilation(xss)attacks [20]. Secondly, an attacker can use an organization name withadifferenttld(e.g.Sld)suchaswww.ua.eduinsteadof www.ua.com[21].Thus,uninformeduserscanfallintothis trapandbecomevictims.Therefore,organizationsfocuson developing a system that detects phishing attacks by analyzingdomainnames,sldandtldtogether.Thirdly,urls usedifferentconventionssuchas“shttp”.Fifthly,anattacker cancreateaURLusingrandomwords,whichcanmakethe URLverylong.Sixth,theattackercreatestheURLandwraps itbehindasmallerURL.TinyURLisaservicethatshortens urls by creating new urls with different patterns [22], for example, https://tinyurl.com/brbm97cx URL. Normal detectionmodelsfailtodetecttinyurlsbecausetheyhavea different structure than the original URL. Finally, the attackerusespopularbloghostingplatformssuchasgoogle sites[23]tocreateawebsiteandplacesthephishingURLas alinktothefakeblog.Thus,theattackerhidesthephishing URLonalegitimatewebsite.Therefore,phishingdetection technology will not detect these attacks because it can identifyurlsgeneratedbygooglesites(legitimateurls,not phishingurls)andprotectyourusersfromthreats.Whenwe use the internet to facilitate our work, at the same time, many attackers try to steal information from our system. There are many ways to combat bad urls. Blocklists are included in antivirus programs, blockchain/tracking systems,andspamfilters.Theblocklistmethodissimpleand gives the best accuracy if the list is updated in a timely manner,butthismethodwillnotdetectthenewlycreated URLproblem.

Nowadays, machine learning is used in many ways, and network securityisone ofthem.Intoday'smaliciousURL detection, machine learning plays an important role in identifying malicious urls. The URL represents the actual application site, which indicates www. The same service receivedhastwoparts:

a)Username,i.e.thedomainnameoripaddresswherethe applicationislocated.

B)Procedurespecifieswhichproceduretouse.

Machine learning uses a portion of the URL data for statisticallearningtolearnapredictivefunctiontoclassify urlsasmaliciousordangerous.Thisresultsinthecreationof new urls instead of blocking the path. A special point for training learning models is the availability of teaching materials.InthecaseofmaliciousURLreturn,thiswill be withalargesetofurls.Machinelearningcanbedividedinto supervised, unsupervised learning, and semi-supervised learning,wheretrainingdataiscollected,unlabeled,anda limited portion of the training data is shown. Labels are associated with information about whether the URL is maliciousorharmless.Aftercollectingthedatarequiredfor training,weneedtoextracttheinstructionsthatwillfully describetheurlandallowmathematicaltransformationby themachinelearningmodel.Here,wefirstextractthelength, International Research Journal of

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net p-ISSN: 2395-0072

number, and binary features of the existing urls in the repository, then add them as rows to the repository and analyzethesefeatures.Tounderstandandreadinformation, first do the information and see the information. Machine learning such as Adaboost and random forest are used to separatedangerousurlsfromdangerousurls.Groupvoting isusedtovotefortheclassificationthatprovidesthemost accuracy.

Machine learning algorithms were used to achieve the purposeofthisarticle.Machinelearningalgorithmsarenot enoughtoprocessbigdataandbringbadurlsinabetterand moreaccurateway.Theroleofmachinelearningis touse embedded historical data to predict the resulting results. This article uses Adaboost and random forest algorithms. Thevotinggroupisusedto votefortheclassificationthat providesthebestaccuracyforthebestclassificationofspam andnormalurls.

II Related Work

DivyaKapiletal.,[24]experiments4algorithmsusingthe weka tool and also compares the different detection techniques. ISC URL 2016 dataset was used for the experiment, which shows the results using performance metricsTPR,FPR,Precision,RecallandF-measure.Random numbersofsamplesweretaken.Sampledatasetcontains47 attributes.Somefeatureswererejectedsothatoverfitting problems can be avoided. The dataset is in ‘Malware’ , ‘Spam’ , ‘Benign’ , ‘Defacement’and‘Phishing’form.Malware URLs are a critical issue for the researchers and machine learning techniques are very helpful in various areas. MaliciousURLsdetectionusingmachinelearningisabetter idea than conventional techniques. Dataset is divided into 80%fortrainingand20%fortesting.J48,Randomforest, Bayes-Net and lazy classifiers were used and multi-class classificationwasperformed,whereclasseswerelabeledas Defacement, Phishing, Spam, Malware and benign. It is observed that Random Forest achieves the highest TPR about96%followedbyLazyclassifierwith95%TPR.

Jino S Ganesh et al., [25] identified Phishing URLs under weaksupervision,whichrequiresasmallamountoflabeled datatostartthelearningprocess.Theyimplementedanew hybrid model which combined NLP (natural language processing-based features) and word vector. Two parallel modulesareusedthatextractfunctionalrepresentationsof URLs.ThefirstisthecharacterlevelCNNmodule.Theother is an attention-based hierarchical RNN module that is proposed to find phishing URL detection. The Detecting processisbasedon:

 Findingthedata,retrievingandsummarizingthedata.

 Makingthepredictionbasedontheanalysisdata.

 Calculatingtheprobabilitiesofthespecificresults.

 Adaptingtocertaindevelopmentautonomously.

 Optimizingtheprocessbasedontherecognizedpattern.

Four different machine learning algorithms are used, like logistic regression, decision tree, random forest, and multilayer perceptron neural networks. Random forest achievesthehighestaccuracyabout98.6%.

R.Nareshetal.,[26]identifiedmaliciousuniformresource locators employing a combination of URL lexical options, payloadsize,andpythonsupplyoptions.Featureextraction techniques such as Host-based features, lexical based featuresandpopularitybasedfeatureswereusedtodetect themaliciousURLs.Featureextractiontechniqueisusedto detectthemaliciousURLs.

Detecting process: Web Crawling, Feature extraction and processing,TrainingofclassifiersandRunningclassification to detect malicious URLs. URLs were collected from the Alexa ranking website. Numbers of URL (400,000) out of which80,000weremaliciousandothersclean,thismakes ourdataset.Theresearchtargetsmainlyondomain-name andURL'sattributes.

ClassifiersusedareSVMandLogisticRegression.Theyused a Support Vector Machine with a polynomial kernel and logistic regression to attain maximum accuracy. Logistic regressionachievesaccuracyabout98%.

Rajesh Kumar et al., [27] used Black and white list technologyandmachinelearningalgorithmsandformeda multilayerfilteringmodelfordetectionofmaliciousURLs. Themodelwastrainedforeachmachinelearningalgorithm i.e.,naiveBayesianclassificationanddecisiontreeclassifier threshold and this threshold is used to refer to guide two classifiers for filtering URL. Naive Bayesian classifier, DecisionTreeclassifierandSVMclassifierswerecombined in one multilayer model to improve the malicious URL detectionsystemintermsofaccuracy.TheMultilayerfilter model performsbetterthan allthethreeclassifiermodels achievingtheaccuracyabout79.55%.

Gopinath Palaniappan et al., [28] explored an active DNS analysisapproachforclassifyingadomainnameasbenignor maliciousbyincludingtheweb-basedfeaturesofthedomain nameinadditiontotheusuallyusedlexical-basedandDNSbasedfeatures.They extractedfeaturesofa domainname underDNS-based,web-based,blacklistingandlexical-based categories, and trained a logistic regression classifier and testedtheclassifiertoclassifyunlabeleddatasetofdomain names and got an accuracy of about 60% using a small datasetofabout10000domainnames.Theusageofwebbased features of domain names in addition to using blacklists,DNSdata,andlexicalfeaturestoidentifymalicious domainshasbeenshown. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net p-ISSN: 2395-0072

III. PROPOSED WORK

Theproposedmodelflowisshownbelowinfigure3.1.

Figure3.1:ModelFlow-work.

3.1 STEPS IN BUILDING THE MODEL ARE:

(i) Data Encoding: This is the initial step in building the model,thefeatureextractioncarriedinsuchawaythatthe outputsarebinaryvalues(0and1).

(ii) Feature Extraction: Length, count and binary features havebeenextractedandaddedtoourdatasetusinglambda function.

(iii)URLparser:Thisisafreelyavailabletool/parserwhich is used in this project to split the given URL into separate partslikehostnamelength,pathlength,TLDlength,etc,.

(iv) Tokenizing URL: “In Python tokenization basically means separating text into small lines, words or even creating non-English language words. Various token functionsarebuiltintothenltkmoduleitselfandcanbeused inprograms.HereURLsaresubdividedintosmallerfields fordataanalysis”

(v) Data Scaling: Noneedfordatascalingastherewereno suchcolumnsordatawaspresent.

(vi) Training and Test Set: Thefifthstepistosplitthedata into train data and test data using the ‘train_test_split’ functionavailableatthe‘sklearn’library.Theclassification rateisselectedat80:20,whichmeansthat80%ofthedatais selectedfortrainingandtheremaining20%isselectedfor theevaluationofnewcommentsandclassdecisions.Ahigh percentageoftrainingdatamakesthemodeltrainbetter

(vii) Balancing Data: Thedatasetishighlyimbalancedwith 76.80%ofbenignURLsand23.20%ofmaliciousURLs.The datasethastobebalancedbeforetrainingthemodel.Hence asthenextstep,balancingoftrainingdataisdoneusingthe oversamplingtechnique.Ifthedataisimbalanced,themodel willbebiasedtowardsthemajorityvariable.

(viii) Feature Selection: Thefifthstepistoselectthekey featuresofthemodel.Thisstepplaysacrucialroleasitis veryimportanttofindthemostrelevantfeaturesrelatedto themachinelearningmodel.

(ix) Model Building: The last step is to build a binary classification model to detect and classify the benign and malicious URLs. AdaBoost classifier, Random Forest classifier are used, compared and voted using the voting classifierwhichclearlyprovesthatRandomForestClassifier givesbetterperformancemetrics.

(x) Cross-Validation: A5-foldcross-validationstrategyis implemented to validate the Machine Learning model to checkthevalidityoftheresults.

(xi) Detection: Function has been created and different URLs were provided as user input to detect and classify whetherthegivenURLbelongstoabenignclassofmalicious class.IfthegivenURLisidentifiedasmalicious,thenanalert message box is displayed, and a "Safe URL" message is printed.

TheclassifierslikeAdaBoostandRandomForestalgorithms wereusedinmodelbuildingwhichwerelatergivenintothe votingclassifiertocheckthebestmodelforourdataset.

A) Adaptive Boosting Algorithm (AdaBoost):

AdaptiveBoostingAlgorithm,popularlyknownasAdaBoost AlgorithmisonoftheMachine

LearningmethodsusedasanEnsembleMethod.Themost commonalgorithmusedwith

AdaBoost is single-level decision trees which means for Decisiontreeswithonly1division.

ThesetreesarealsocalledDecisionStumps.Thisalgorithm creates a model and provides equal weights for all data points.Itthengivesushighweightsonpointsthatarepoorly organized.

Nowallthepointswiththehighestweight aregivenextra value in the next model. It will retain the training models untilfurthernoticewithoutaminorerror.

B. Voting Classifier:

AvotingclassifierisaMachineLearningensemblemethod which is trained on multiple models and gives an output (class)whichisbasedontheprobabilityofthatclassbeing

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net p-ISSN: 2395-0072

predictedishigh.Thepredictionfromeachclassifieraresent tovotingclassifierwhichcombinesthepredictionsandgives finalpredictionsbasedonmajorityofvotes.Inthisway,we can eliminate use of multiple models to predict multiple outcomesbypassingthepredictionsfrommultipleclassesto the voting classifier which predicts based on majority of votes.Therearetwotypesofvoteswhicharesupportedby votingclassifier.

1. Hard Voting: Inhardvoting,thefinalpredictedoutcome istheoutcomewhichhasmajorityofvotes,i.e.,theoutcome which has highest probability of being predicted by individual classifiers. Suppose there are three classifiers whichpredictedtheoutcomeclassas(2,1,2)respectively. So,herethemajorityofclassifierspredicted2asoutcome. Hence2willbethefinalpredictionoutcome.

2. Soft Voting: Insoftvoting,thefinalpredictionoutcomeis basedontheaverageoftheprobabilitieswhicharegivesto thatclassbyeachoftheclassifiers.Suppose,fortheinputs provided to the classifiers, the prediction probability for class 1 = (0.20, 0.37) and class 2 = (0.11, 0.53). So the averageforclass1is0.285andclass2is0.32.So,itisevident that class 2 has highest average probability. Hence, the prediction outcome of the voting classifier through soft votingisclass2.

IV. RESULTS WORK

Theresultsofthemodelsaregivenbelowinfigure4.1.From thefigureitisconcludedthatalltheclassifiershavebetter accuracy.Alsotheaccuracygetsreducedafterapplyingthe databalancingtechniques.

Figure4.1:AccuracyComparisonChart.

The training accuracy of the model is 100% and the test accuracyisalso99%.Hencewecanensurethatthereisno overfittinginthemodel.Also,wehaveapplieda5-foldcrossvalidationstrategytocrosscheckthevalidityofthemodel. BothmaliciousandbenignURLsweregivenasuserinputto themodeltovalidatethemodel.Themodelisobservedto givethebestaccuracyandpredictsbetterwhentested.

4.1 CONFUSION MATRIX

A confusion matrix is a multi-dimensional square matrix which is used to test the performance of the given classification model. The size of matrix is the number of target classes. The confusion matrix compares the actual targetvaluestothevaluesthatarepredictedbythemodel. The confusion matrix for the classifiers has following information.

● True positive(s) (TP): Thesearethelinksinwhichwe havepredictedbenignandareactuallybenign.

● True negative(s) (TN): The links which we have predicted spam/malicious, and they are actually spam/malicious.

● False Positive(s) (FP): Thelinksthatwehavepredicted malicious, but they are actually benign. (Also known as “TypeIerror”)

● False negative(s) (FN): The links which we have predicted benign, but they are actually malicious. (Also knownas“TypeIIerror”)

V. Conclusion

Many Malicious URL detection model does the binary classification using machine learning classifiers namely Random Forest and AdaBoost classifiers. The voting classifier is used to check the model that is giving highest accuracyamongstallthemodels.Theresultsshowsthatthe Random Forest algorithm performs well compared to AdaBoost classifier. The Random Forest classifier gives accuracy about 99.8% whereas AdaBoost classifier gives accuracyabout99.5%andhencevotingclassifierpredicts thattheRandomForestalgorithmperformswell.

ThefunctionhasbeencreatedwhichidentifiestheURLasa spamorhamURL.TheURLsareacceptedasuserinputand detectedwhethertheyarebenignormaliciousones.Ifthe URL is found malicious then an alert message box will be displayedshowingthat“AvoidclickingonsuchURLs”else “SafeURL”message will be printed.FortheURLsthatare found malicious, some precautionary measures can be carriedout,suchasblacklistingofURLsandred-listingthe URLssothattheydon'tappearanymoreinthefuture.

VI. REFERENCE

[1] Y.Zhang,Y.Xiao,K.Ghaboosi,J.Zhang,andH.Deng, ‘‘Asurveyofcybercrimes,’’Secur.Commun.Netw.vol.5,no. 4,pp.422–437,2012.

[2] APWGDevelopers.(2021).PhishingActivityTrends Report.[Online].Available:https://apwg.org/trendsreports.

[3] M. Lei, Y. Xiao, S. V. Vrbsky, and C.-C. Li, “Virtual passwordusingrandomlinearfunctionsforon-lineservices, ATM machines, and pervasive computing,’’ Comput. Commun.vol.31,no.18,pp.4367–4375,Dec.2008.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056

Volume: 12 Issue: 3 | Mar-2025 www.irjet.net p-ISSN: 2395-0072

[4] P.Burda,L.Allodi,andN.Zannone,“Don’tforgetthe human: A crowd sourced approach to automate response and containment against spear phishing attacks,’’ in Proc. IEEEEur.Symp.Secur.PrivacyWorkshops(EuroSPW),Sep. 2020,pp.471–476.

[5] P.Prakash,M.Kumar,R.R.Kompella,andM.Gupta, ‘‘PhishNet: Predictive blacklisting to detect phishing attacks,’’inProc.IEEEINFOCOM,Mar.2010,pp.1–5.

[6] W.Zhang,Y.-X.Ding,Y.Tang,andB.Zhao,‘‘Malicious webpagedetectionbasedonon-linelearningalgorithm,’’in Proc. Int. Conf. Mach. Learn. Cybern. vol. 4, Jul. 2011, pp. 1914–1919.

[7] A.C.Bahnsen,E.C.Bohorquez,S.Villegas,J.Vargas, and F. A. González, ‘‘Classifying phishing URLs using recurrentneuralnetworks,’’inProc.APWGSymp.Electron. CrimeRes.(eCrime),2017,pp.1–8.

[8] B. Cui, S. He, X. Yao, and P. Shi, ‘‘Malicious URL detection with feature extraction based on machine learning,’’Int.J.HighPerform.Comput.Netw.vol.12,no.2, pp.166–178,2018.

[9] Y. Fang, C. Zhang, C. Huang, L. Liu, and Y. Yang, ‘‘PhishingemaildetectionusingimprovedRCNNmodelwith multilevelvectorsandattentionmechanism,’’IEEEAccess, vol.7,pp.56329–56340,2019.

[10] J.Feng,L.Zou,O.Ye,andJ.Han,‘‘Web2Vec:Phishing webpage detection method based on multidimensional features driven by deep learning,’’ IEEE Access, vol. 8,pp. 221214–221224,2020.

[11] H.Cheng,J.Liu,T.Xu,B.Ren,J.Mao,andW.Zhang, ‘‘Machinelearningbasedlow-rateDDoSattackdetectionfor SDNenabledIoTnetworks,’’Int.J.Sens.Netw.,vol.34,no.1, pp.56–69,2020.

[12] S.Christin,É.Hervet,andN.Lecomte,‘‘Applications fordeeplearninginecology,’’MethodsEcol.Evol.,vol.10,no. 10,pp.1632–1644,Oct.2019.

[13] A.Aggarwal,A.Rajadesingan,and P.Kumaraguru, ‘‘PhishAri: Automatic realtime phishing detection on Twitter,’’inProc.eCrimeRes.Summit,Oct.2012,pp.1–12.

[14] H.Ma,Y.Zuo,andT.Li,‘‘Vesselnavigationbehavior analysisandmultiple-trajectorypredictionmodelbasedon AISdata,’’J.Adv.Transp.,vol.2022,pp.1–10,Jan.2022.

[15] J. Fang, B. Li, and M. GAO, ‘‘Collaborative filtering recommendationalgorithmbasedondeepneuralnetwork fusion,’’Int.J.Sens.Netw.,vol.34,no.2,pp.71–80,2020.

[16] E.S.Gualberto,R.T.DeSousa,T.P.DeBritoVieira,J. P.C.L.DaCosta,andC.G.Duque,‘‘Theanswerisinthetext: Multi-stagemethodsforphishingdetectionbasedonfeature engineering,’’IEEEAccess,vol.8,pp.223529–223547,2020.

[17] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, ‘‘Short-term residential load forecasting based on LSTM recurrent neural network,’’ IEEE Trans. Smart Grid, vol.10,no.1,pp.841–851,Jan.2019.

[18] E.Zhu,Y.Chen,C.Ye,X.Li,andF.Liu,‘‘OFS-NN:An effective phishing websites detection model based on optimalfeatureselectionandneuralnetwork,’’IEEEAccess, vol.7,pp.73271–73284,2019.

[19] T.Mahjabin,Y.Xiao,T.Li,andC.L.P.Chen,‘‘Load distributedandbenign-botmitigationmethodsforIoTDNS floodattacks,’’IEEEInternetThingsJ.,vol.7,no.2,pp.986–1000,Feb.2020.

[20] W. Yang, W. Zuo, and B. Cui, ‘‘Detecting malicious URLs via a keyword based convolutional gated-recurrentunitneuralnetwork,’’IEEEAccess,vol.7,pp.29891–29900, 2019.

[21] M.Somesha,A.R.Pais,R.S.Rao,andV.S.Rathour, ‘‘Efficient deep learning techniques for the detection of phishingwebsites,’’Sadhan¯a¯,vol.45,no.1,pp.1–18,Dec. 2020.

[22] A. Aljofey, Q. Jiang, Q. Qu, M. Huang, and J.-P. Niyigena,‘‘Aneffectivephishingdetectionmodelbasedon character level convolutional neural network from URL,’’ Electronics,vol.9,no.9,p.1514,Sep.2020.

[23] Google Developers. (2020). Google Site. [Online]. Available:https://sites.google.com/.

[24] D.B.A.N.J.Kapil,"MachineLearning-BasedMalicious URLDetection,"2,vol.8,no.4S,pp.22-26,2020.

[25] S.L.R.R.L.A.V.M.G.L.M.R.J.JanyShabu,"Machine Learning-Based Malicious Website Detection," Journal of ComputationalandTheoreticalNanoscience,vol.17,no.8, pp.3468-3472,2020.

[26] R.A.G.S. Naresh, "Malicious URL detection system using combined SVM and logistic regression model," InternationalJournalofAdvancedResearchinEngineering andTechnology,vol.11,no.4,pp.63-73,2020.

[27] A.L.L.W.P.S.S.Joshi,"UsingLexicalFeaturesfor MaliciousURLDetection–AMachineLearningApproach," 2019.

[28] G.S.S.S.B.S.S.B.B.W.Palaniappan,"MaliciousDomain Detection Using Machine Learning on Domain Name Features, Host-Based Features and Web-Based Features," ProcediaComputerScience,vol.171,no.2019,pp.654-661, 2020.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.