International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
![]()
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
2 Department of Computer Science and Engineering, MIT School of Engineering MIT Arts Design and Technology University, Pune, India ***
Abstract - Each and every employee is the most precious asset ofa company. It is onlybecauseofthe employees that an organisation is able to run smoothly and hence Employee attrition is one of the key metrics that the comapnies are focusing on these days. Attrition may sometimes occur due to unavoidablecircumsatnces suchas transfertoa differentcity, retirement etc. But when the attrition starts causing a hole in the pockets of a business it needs to be monitored.A business spends huge amounts of its resources while hiring employees.To overcome the process of rehiring and to maintain a strong workforce the analysis of systematic machine learning models need to be adapted from which a suitable model can be chosen that measures the risk of attrition. This not only helps in saving resources of a business but also helps to maintain an equilibrium in the workforce.
Key Words: Employee, Attrition, Machine Learning, Analysis.
Anemployeeisaboontoanycompany.Everyemployeewho joinsabusinessisboundtoleaveitatsomepointoftimedue tovariousreasons.Attritioncanbethusdefinedastheexitof any employee due to avoidable or unavoidable circumstancesincludingretirement,death,transfer,better opportunities,etc.Theorganizationspendslotsandlotsof time and resources when hiring an employee. When employee’s departure starts to affect the business in a negativeway,itbecomesatopicofconcernforeveryonein the business but especially for the HR. Due to the exit of skilled employee’s the business not only loses its skilled professionals but also needs to rehire and train the new person.Thismakesitsworkforceweakerthusaffectingthe business as a whole. Due to increased globalization, especiallypostpandemicera,therehasbeenavastnumber ofopportunitiesineveryfield.Duetobetteropportunities andforfurthergrowthanemployeedecidestodepartfrom one businessandjoinsanother.Thisattritioninfluencesa business in a negative way for a brief period of time. To maintain the manpower and to reduce costs Artificial Intelligencecanbeincorporatedtopredicttheattrition.
Thispaperdiscussesaboutthevariousmethodsthatcanbe usedtopredictemployeeattritionandalsoanalysesthebest possiblesolutionwiththehelpofmodelcomparison.
Fig1showsthevariousreasonsduetowhichanemployee maydecidetoleaveanorganization.
Many researchers have studied the causes and effects of employee attrition. One such paper states that [1] the maintenance of skilled and deserving employee’s is a significantaspectthattheHRneedstopayattentionto.The studypointedoutthemostappropriatemetricswhichcould help in the prediction of attrition. It highlighted that the number of job opportunities is directly proportional to employee’seducationandexperiencelevel.Italsostatedthat someofthemostagreeablefactorsthathelpinmaintaininga workforce include good work-life balance, healthy workplace relationships, better policies, etc.Anothersuch paper [2] states that in order for an organization to maximizeitsprofits,itshouldgiveutmostimportanceand valuetoitsemployee’s.Thiscanbeachievedbyfocusingon the development of opportunities and by bringing in new technologies that helps in maintaining the interest of an employeetowardsanorganization.Thestudyalsohighlights thatitisnecessaryforanorganizationtoconducttraining programs, cultural events, etc. on a regular basis. These types of activities help in lowering the barrier of communicationandalsofacilitatesinteractionandgrowth. Themainideaofthisstudywastoexplainthatitisnecessary
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
foranorganizationtohavetransparentworkculturesothat every person is well informed about their job and its outcome.
Artificialintelligencehasledtoexponentialgrowthinallthe fields. It has helped in finding solutions to many complex problems.Employeeattritionisonesuchproblemwhichis intalksthesedays.Artificial Intelligencehastheability to give a robust solution for this problem to various organizations. The incorporation of machine learning to predict attrition is helping companies worldwide. Similar research has been done where various models such as SupportVectorMachines,RandomForest,KNNclassifier,XG Boostaretriedandtested.Table1.0depictstheinformation aboutsomesuchresearches.
Sr no. Author ObjectofStudy Recommend Technique
1. Rahul Yedida, Rahul Reddy, Rakshit Vahi, Rahul J,Abhilash and Deepti Kulkarni[3]
2. B. Sri Harsha, A Jithendra Varaprasad, L.V N Pavan Sai Sujith[4]
3. YueZhao,Maciej K. Hryniewicki, FrancescaCheng, Boyang Fu and XiaoyuZhu[5]
4. Adarsh Patel, Nidhi Pardeshi, Shreya Patil, Sayali Sutar, Rajashri Sadafule and SuhasiniBhat[6]
5. Ozdemir, Coskun, Gezer andGungor[7]
Employee Attrition Prediction
KNNclassifier
Early Attrition Prediction Random Forest
Prediction of Employee Turnoverusing Machine Learning
Predictive model for Employee Turnoverusing Machine Learning
Using data mining techniques to predict attrition
Table 1.0: SurveyTable
XGBoost
The above figure3.1 depicts the architecture of the system.Theproposedsystemworksondifferentmodelsof machinelearning.Eachmodelattemptstopredict attrition using the same dataset. The dataset consists of various employeerecords(bothpastandpresent).Theinputdataset is first cleaned and preprocessed by managing all the missing, Nan values , etc and removing unwanted columns.Then comes the model building phase where various models are taken into consideration for prediction.The dataset is then spilt into train and test sub datasetandthetrainsetisusedfortrainingofeachmodel used.All the predictions are compared on the basis of evaluationmetricsandthebestmodelissuggested.
The open source dataset consists of employee related information.All non – numerical values were given a designation (A1 ,A2 ,A3),etc. and all the unwanted parameters were discarded. Table 3.1 shows some parametersthatareconsideredintheprediction.
Random Forest
SVM
Fig
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
LogisticRegressionisthoughttobeoneofthemost valuable statistical models. It is also a renowned data mining technique used by scientists and researchers for the analysis of proportional and binarykindsofdatasets.Oneadvantagethatmakes logisticregressionspecialisthatithastheabilityto workformulticlassproblemsaswell[8].Itisoneof themostwidelyusedalgorithmforthepurposeof classification.
Belowistheequationthatrepresentstheequation forlogisticregression: where:y=dependentvariable x1,x2,x3…xn=independentvariables b0,b1,b2…bn=constants
Whenthewordtreeisusedincomputerjargon,the treestructureisvisulaised.Adecisiontreeconsists of root, branches and leafs. The root node is consideredtobetheparentnode.Everyattributeis represented by nodes and the connection link betweenthemarethebranches.Thesebranchesare rules or decisions. The leaf is supposed to be the outupt or outcome.Some of the most commonly useddecisiontreealgorithmsincludeCHAID,ID3, CART [9].This algorithm is used for classification problemsandcaneasilyworkwithbothcontinous andcategoricalvalues.
C.
KNNisasupervisedmachinelearningalgorithmused for bothclassificationandregressionproblems. TheKNNalgorithm usestheinformationabouttheinput and predicts the output. The input is split into respective categories.The algorithm tends to search for the most optimallocationforanewdatapointtoliein.Theinputdata points are studied and the location for the new point is decidedonthebasisofit.Followingisthealgorithmused:
Step1:Selectnumberofneighbours(K)
Step2:TheEuclideandistanceofKneighboursiscalculated
Step3:IdentifyKnearestneighboursbyuseofStep2.
Step4:Countnumberofpointsfromeachcategory.
Step5:Thenewpointisassignedtoacategorywherethe neighboursaremore
Step6:Finish
D.
SVM is another type of widely used supervised machine learning model.It is mainly used for classification problems but can also be used for regression.The main idea of the algorithm is to createalineoraboundarywhichsplitsthespace intonclassesorcategories.Whenanewdatapoint isfedintothisspaceitcaneasilysearchforitsplace in the created categories. The line that seperates theseclassesisalsocalledasahyperplane.Whena straight line is enough for a problem of classification then the algorithm is linear.When a straightlineisnotsufficientandratheracrooked lineisobtainedthenitistermedasnon-linearSVM.
Randomforestisamachinelearningalgorithmused for regressions and classification type of problems.It is inherited from the concept of ensemble learning.It is similar to decision trees. This algorithm takes into consideration various trees by dividing the dataset to multiple subsets.Due to this method multiple results are obatinedandthefinalresultistheaverageofallthe subresults.Themorethenumberoftreesandsub datasets are considered the more the accuracy of thealgorithm.Duetothisbehaviouritiscapableof managinghugeamountofdataset[10].Followingis thealgorithm:
Step1:SelectKdatapointsrandomlyfromthetrain set
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072
Step2:ConstructDecisiontreesofthesubsets
Step3:SelectNwhichwillbenumberofdecisoon trees
Step4:RepeatS1andS2
Step5:Assignnewdatapointacategoryaccordingto thepredictionofeachtree.
Naiye Bayes algorithm is formulated using the Bayes Theorem and is a popular supervised machine learning method.It is probabilistic in natureandsotheworkingofthisalgorithmisbased ontheprobabilityofanobject.Itisusuallyusedfor problemswheretextclassificationisneededbutit canbeusedforotherclassificationproblemsaswell [11].
FollowingistheformulaforBayesTheorem: P(A|B)=P(B|A)P(A)
P(B)
Where Posterior probability is denoted by P(A|B), Likelihood probability is denoted by P(B|A). P(A) is Prior ProbabilityandP(B)ismarginalprobability.
Theabovefig5.1.1isaheatmapthathelpsinidentifyingthe strong and weak correlation between the attributes considered.
Theabovefig5.2.1andfig5.2.2representstheinformation aboutsomeoftheattritubuteswithrespecttoattritionina graphicalformat.
Thefirstgraphisarelationbetweenpromtionandattrition. Itisvisiblefromthegraphthatanemployeeismorelikelyto stayintheorganisationinthecasewheretherehasbeena promotion.
The second graph is a representation of the effect on attrition based on gender.The graph shows how male candidatesaremorelikelytostayintheorganisationthan femalecandidates.
5.3 Results:
Figure 5.3.: Graphicalresultrepresentation
Factor value: 7.529 | ISO 9001:2008 Certified
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Theabovefig5.3.1isagraphicalrepresentationoftheresult obtained when all the mentioned machine learning algorithmsareappliedonthedataset.
MODEL
ACCURACY
LogisticRegression 0.877095
KNNClassifier 0.592179
SupportVectorMachines 0.865922
NaiyeBayes 0.832402
DecisionTrees 0.804469
RandomForest 0.832402
Above table shows the test accuracy of each model seperately.
FromtheabovetableitisclearthatLogisticRegressionhas performed best as it has the most amount of accuracy followedbyRandomforest.
Thefollowingtable5.3.2and5.3.3givesanoverviewabout the classification report of the two best models obtained fromourdataset.
- Precision Recall F1score Support
Stay 0.88 0.91 0.89 118
Leave 0.81 0.75 0.78 61
Table 5.3.2: ClassificationreportofRandomForest Algorithm
- Precision Recall F1score Support
Stay 0.91 0.90 0.91 118 Leave 0.81 0.84 0.82 61
Table 5.3.3: ClassificationreportofLogisticRegression Algorithm
Themotoofthispaperistodeterminewhichalgoritmyields thebestresultforthechosendatasettopredicttheattrition ofemployees.Atotalofsixmachinelearningalgorithmswere appliedonanopensourcedatasetandtheoutputobtained wasinformed.Itcanbeinferredfromtheoutputthatlogistic regression performs the best on the dataset followed by random forest algorithm.The attributes mentioned in the paperaresomeofthemaincausesofattritionandtherecan bemanymoreparametersthatcanbeaddedaccordingtoan organisations requirement.The aim of this paper is to compare some of the most widely used machine learning modelssothatitcanhelpvariouskindsoforganisationsto maitain its workforce and lessen the rate of employee attrition.
1. JournalofInterdisciplinaryCycleResearchVolumeXI, Issue XII, December/2019 ISSN NO: 0022-1945-A SURVEY PAPER ON EMPLOYEE ATTRITION PREDICTIONUSINGMACHINELEARNINGTECHNIQUES
2. VSRD International Journal of Business and ManagementResearch,VolVIIssueVIIAugust2016EMPLOYEEATTRITION ANDSTRATEGIC RETENTION CHALLENGES IN INDIAN MANUFACTURING INDUSTRIES:ACASESTUDY
3. https://www.researchgate.net/publication/326029536 _Employee_Attrition_Prediction-Employee Attrition Prediction
4. INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGYRESEARCHVOLUME9,ISSUE03,MARCH 2020ISSN2277-86163374IJSTR©2020www.ijstr.org EARLYPREDICTIONOFEMPLOYEEATTRITION
5. https://www.researchgate.net/publication/328772915 _Employee_Turnover_Prediction_with_Machine_Learnin g_A_Reliable_Approach-EmployeeTurnoverPrediction withMachineLearning
6. Employee Attrition Predictive Model Using Machine Learning-InternationalResearchJournalofEngineering and Technology (IRJET) Volume: 07 Issue: 05 | May 2020e-ISSN:2395-0056 p-ISSN:2395-0072
7. F. Ozdemir, M. Coskun, C. Gezer and V.C Gungor, “Assessing Employee Attrition Using Classifications Algorithms,” In Proceedings of the 2020 the 4th International Conference on Information System and DataMining,pp.118-122,May2020.
8. InternationalJournalofDataAnalysisTechniquesand Strategies3(3):281-299July2011–LogisticRegression in Data Analysis: An overview DOI 10.1504/IJDATS.2011.041335
9. JCSE International Journal of Computer Sciences and Engineering Vol.-6, Issue-10, Oct. 2018 E-ISSN: 23472693 - Study and Analysis of Decision Tree Based ClassificationAlgorithms
10. IJCSIInternationalJournalofComputerScienceIssues, Vol. 9, Issue 5, No 3, September 2012 ISSN (Online): 1694-0814-RandomForestsandDecisionTrees
11. International Journal of Advance Engineering and ResearchVolume4,Issue11,November-2017-Short SurveyonNaiveBayesAlgorithm
Volume: 10 Issue: 01 | Jan 2023 www.irjet.net p-ISSN: 2395-0072 © 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page176