Post Graduate Admission Prediction System

Page 1

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

Post Graduate Admission Prediction System

Dr. Bindiya M K1, Abhijna S2, Abhishek Rawat3, Anushri N R4, Indudhar L Gowda5

1Associate Professor, Department of Computer Science and Engineering, SJB Institute of Technology 2,3,4,5Student, Department of Computer Science and Engineering, SJB Institute of Technology ***

Abstract - Each year, a huge number of students apply to post graduate programs allover the worldinorder toadvance their careers. In this project, we focus on the students applying for Masters programs in universities abroad, particularly the universities in the United States of America. Here, we present a new universityadmissionpredictionsystem using machine learning algorithms by recognizingthe factors that affect the likelihood of admission. We use severalmachine learning algorithms to statistically analyze the admission predictability of such factors. The objective is to predict the list of universities that a student might get into rather than the chance of admit.

Key Words: Post-graduate admission prediction, Machine Learning, Masters programs, PCA, MLR, SVM

1. INTRODUCTION

Applying to universities abroad is often difficult. Studentsfindithardtoestimateexactlywheretheystand.It involvesmanystepsandprocedurestofollow.Researching the universities and programs is itself an arduous and lengthy task. Choosing the right universities to apply to is definitely a hurdle students have to face. Many students applyfortheuniversitiesinwhichtheyhaveaslimchanceof acceptance.Universityapplicationsalonecancosthundreds of dollars. This can take a toll on the finances of students comingfromapooreconomicbackground.Studentshaveto throwawaylotsofhard earnedmoneyfornothingiftheyget rejectedbytheseuniversities.

In this project, we will be using the admission_predict dataset in CSV format to predict the universities that a student might get into based on several academic performancemeasurements.

Parametersthathelppredicttheuniversities:

1. GREScores(outof340)

2. TOEFLScores(outof120)

3. UniversityRating(outof5)

4. StatementofPurposeandLetterof RecommendationStrength(outof5)

5. UndergraduateGPA(outof10)

6. ResearchExperience(either0or1)

Thisprojectwillbeimplementedusingseveralmachine learning algorithms, Python programming language and Flaskwebframework.Toyieldthemostaccurateresult,we will be going through several steps such as data pre processing,exploratorydataanalysis,featureselection,cross validation,modelselection,trainingthemodelandsoon.

2. LITERATURE SURVEY

Asubstantialnumberofresearchprogramshavebeen carriedoutonsubjectsrelatedtouniversityadmissions.Each onehasuseddatasetsfromvarioussourcesandisspecificto acertaincourseoruniversity.Hence,thepredictionmaynot beaccurateforeverystudent.

Previousstudiesdoneinthisareaevaluatedthechance ofstudentadmissionintorespectiveuniversitiesprimarily basedonasingleparameter theGREscore.Thedownsideis thattheyonlyconsideredtheGREscoreandleftoutallthe otherfactorsthatmightcontributetoadmissionprediction.

Our base research paper has considered all these parameters, but the machine learning model has low accuracy.Anotherconisthatthedatasetusedisspecificto Indianstudentsandisnotgeneralized.

Themaindrawbackofallthepreviousresearchcarried out is that it only predicts the chance of admission of a student.Abettersystemwouldpredicttheuniversitiesthata studentmightgetinto,ratherthanthechanceofadmission. Inthisway,astudentwouldbeabletoapplytouniversities wheretheyhaveabetterchanceatgettingadmission.

3. METHODOLOGY

Problem Understanding: Wefirstneedtounderstandthe problemsfacedbystudentsduringtheuniversityapplication processandfigureoutwaystoovercomethem.Precisegoals must be set. A clear path to achieve the set goals must be apparent.

Data Understanding: Weneedtounderstandthetrendsin eachdatasetandconsiderthemostdiverseoneinorderto getageneralizedmodelthatcanbeusefulforeverystudent. Exploratorydataanalysisisperformedforthesame.During thisphase, wewill alsoget a deeperunderstandingof the data and will be able to estimate which machine learning modelmightgivegoodresults.

©
Certified Journal | Page2250
2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

Data Preparation: Data should be cleaned for it to be suitableforthemachinelearningalgorithm.Inordertodo this,weneedtoperformdatapre processingsteps.Here,we willbesplittingthedatasetintoXandYcolumns,checking foranymissingvalues,categoricalvalues,outliersandsoon anddealingwiththeminaneffectivemanner.[4]

Building Models: WeneedtoexperimentwithvariousML modelsinordertoidentifythebestfitforourdataset.

Evaluation: Inthisstep,weanalyzehowwellthemodelfits our data. We check for any chances of overfitting or underfittingofthedata.Wealsostriveforagoodaccuracy andmakesuretheerrorrateisminimal.

Creation of an interface: After modeling the right ML algorithmwemoveontocreationofawebsite.Wewill be usingFlaskasa webframework withHTML forstructure, CSSforstyling,JavaScriptforinteractivity,andBootstrapfor aresponsivewebsiteandSQLiteforbackend.[5]

interpretabilityandatthesametime,minimizesthelossof information. PCA is an unsupervised algorithm and most commonlyusedasadimensionalityreductionalgorithm.

MultipleLinearregressionisastatisticaltechniquethatis used to model the linear relationship between several explanatoryvariablesandoneresponsevariable.

We first import all the required libraries, and the dataset. Data preprocessing in order to make the data suitable for machinelearningmodels.ExploratoryDataAnalysis(EDA)is done on the dataset in order to understand the data and applyappropriatemodels.FeatureScalingisperformedon the independent variables of the dataset to bring every featuretothesamefooting,withoutanyupfrontimportance of one over the other. PCA is then performed on the standardizeddata.Here,wehaveoptedforthreefeaturesas thatexplainsover85%ofthevarianceinthedata

Fig 1: Methodology

Fig 3: MLRFormula

Thedataisthensplitintotrainandtestsets.Duetosample variabilitybetweenthetrainandtestsets,machinelearning modelsmaygiveabetterpredictiononthetrainingdata,but failtogeneralizeoverthetestdata.Thisleadstoahightest error.Inordertoavoidsuchasituation,wehaveoptedforK fold Cross Validation. This method gave us an accuracy of 0.8949.UsingtheK foldCrossValidationmethodincreased theaccuracyofthemodelbyover12%.

Fig – 2: User’sperspectiveoftheworkflow

4. IMPLEMENTATION

Multiple Linear Regression with PCA

Large datasets are increasingly common in Machine Learning. While a large dataset is necessary for a good model, this introduces a great deal of complexity and difficultyininterpretation.PrincipalComponentAnalysisisa technique that is popularly used for reducing the dimensionality of such datasets. It helps in increasing

Fig 4: K foldCrossValidation

Random Forest Regression

Random Forest Regression (RFR) is another popular machine learning algorithm that uses the concept of ensemblelearning.RandomForestisasupervisedlearning method that can be used for classification as well as

Page2251
© 2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal |

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

regressionproblems.RandomForestcontainsanumberof decision treeson varioussubsetsofthegiven datasetand takes the average of the results to improve the predictive accuracy of the dataset. An upside to this model is that it preventstheproblemofoverfitting.

Inthismodel,wefollowedaslightlydifferentapproachthat transforms the regression problem that we have, into a classification one. The approach here is to have a binary column called ‘Status’ that takes the value True for applicantsthathavemorethan83%ofchanceofadmit.This valuewastakenasthethresholdafteracarefulstudyofthe datadistribution.

Thisapproachgaveus anaccuracyof96%.Theconfusion matrixrevealedthat3out100studentsthatwererejected werepredictedasadmittedand1outof100ofstudentsthat wereadmittedwerepredictedasrejected.

rejected were predicted as admitted and 2 out of 100 of studentsthatwereadmittedwerepredictedasrejected.

Logistic Regression

LogisticRegressionisasupervisedlearningtechniquethatis usedtopredictacategoricaldependentvariableusingaset ofindependentvariables.Thedependentvariableisabinary value,butinsteadofgivingtheexactbinaryvalues,itgives theprobabilisticvaluesthatliesbetween0and1.Logistic regressionandlinearregressionarequitesimilar.Inlogistic regression, instead of fitting a regression line, we fit an S shaped curve or a logistic function. The curve from the logisticfunctionindicatesthelikelihoodofsomething,such asthechancesofastudentgettingadmissioninaparticular university.

Theclassificationapproachdiscussedearlierisusedhereas well.Thisgaveusanaccuracyof97%.Theconfusionmatrix revealed that 2 out 100 students that were rejected were predictedasadmittedand1outof100ofstudentsthatwere admittedwerepredictedasrejected.

Fig 5: RandomForestRegression

Support Vector Machine

SupportVectorMachineisasupervisedlearningalgorithm that can be used for classification as well as regression. Althoughitcanbeusedforregression,itisbestsuitedfor classification problems. The objective of SVM is to find a hyperplane in an N dimensional space that distinctly classifiesthedatapoints.Thedimensionofthishyperplane dependsonthenumberoffeatures.

Fig 7: LogisticRegression

Thisisthe bestaccuracythatwecouldachieve.Itgave us satisfactoryresults.

Mapping chance of admit to universities

We initially set out to collect data so as to predict the universitiesinsteadofthechanceofadmit.Duetoalackof enough data, we decided to map the chance of admit predictedinthecurrentdatasettouniversities.Todoso,we collected the cut off GRE and TOEFL scores of several universitiesandusedthisdatasetformappingpurposes.

The universities thus obtained are then categorized as ‘Dream Universities’, ‘Safe Universities’ and ‘Backup Universities’. The students can divide their resources and applyforacombinationoftheseuniversities.

5. RESULTS

Fig 6: SupportVectorMachine

Here,we’veusedthesameclassificationapproachdiscussed earlier. This gave us an accuracy of 96% as well. The confusionmatrixrevealedthat2out100studentsthatwere

As demonstrated in the previous section. The logistic regressionmodelgavethebestresults,followedbySVMand RFR,thenMLRwithPCA.

The central concept of this application is to allow the students to predict the universities that they can get

© 2022,
|
|
Certified Journal | Page2252
IRJET
Impact Factor value: 7.529
ISO 9001:2008

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

admission into. This allows the student to manage their funds and save money during the university application process,byonlyapplyingtothoseuniversitiesthattheyhave a highchanceofgettinginto.Awebsitewascreatedusing HTML,CSS,Bootstrapforfrontend,andSQLitefordatabase. Flaskwasusedasawebframework.

Fig 11: AreaofExpertise

Fig 11showsthe expertisepage. Thispageletstheuser enterhis/herareaofexpertise.Oncetheselectionisdone, theuserisredirectedtoapredictionpage.

Fig 8: Comparisonofmodels

Fig 9: HomepageofPostpred

Fig 9showsthehomepageofourwebsite,PostPred.Thisis thepagethattheuserencountersonceheopensthewebsite.

Fig 12: PredictionPage

Fig 13: PredictionResults

The page shown in Fig 13 shows the prediction results basedonthepreviousscores.Theresultsarepredictedby the Machine Learning model. The universities are characterized as Dream University, Safe University and BackupUniversity.

Fig 10: UserLogin

Fig 10showstheuserloginpage.Theusercanentertheir emailIDandpasswordinordertoaccesstheiraccount.Ifthe userdoesn’talreadyhaveanaccount,he/shecancreateone usingtheSignUpbuttonbelow.

Fig

14: ApplicationPageforPurdueUniversity

© 2022, IRJET |
7.529 | ISO 9001:2008 Certified Journal | Page2253
Impact Factor value:

International Research Journal of Engineering and Technology (IRJET) e ISSN: 2395 0056

Volume: 09 Issue: 06 | Jun 2022 www.irjet.net p ISSN: 2395 0072

Fig 14 shows the application page of Purdue University. The user is redirected to the application of the respective universitieswhenhe/sheclicksontheapplybuttonnextto theuniversityasshowninFig 13.

6. CONCLUSION

The central concept of this application is to allow the students to predict the universities that they can get admission into. This allows the student to manage their funds and save money during the university application process,byonlyapplyingtothoseuniversitiesthattheyhave ahighchanceofgettinginto.

We’veusedmultiplemachinelearningmodelsforprediction ofthesame.Thebestonei.e.,LogisticRegressionisusedasa pickle file and used in the Flask web framework. A user friendlyinterfaceisthusprovidedtomaketheprocesseasier forusersfromanon technicalbackground.

REFERENCES

[1] “Prediction fora University Admission Using Machine Learning”byChitraApoorvaD.A,MalepatiChanduNath, Peta Rohith, Swaroop S, Bindushree S Blue Eyes Intelligence Engineering & Sciences Publication. International Journal of Recent Technology and Engineering(IJRTE)ISSN:2277 3878,Volume 8,Issue 6 March2020.

[2] “MachineLearningBasicswiththeK NearestNeighbors Algorithm” https://towardsdatascience.com/machine learning basics with the k nearest neighbors algorithm

[3] “Random Forest Regression” https://www.kaggle.com/dansbecker/random forests

[4] “Data pre processing & Machine Learning” https://archieve.ics.uci.edu/ml/index.php

[5] “User Interface Design” https://pidoco.com/en/help/ux/user interface design

[6] “Graduate Admission Prediction Using Machine Learning”bySaraAljasmi,AliBouNassif,IsmailShahin, Ashraf Elnagar ResearchGate Publication, December 2020.

[7] Sujay S “Supervised Machine Learning Modelling & AnalysisForGraduateAdmissionPrediction”Published in International Journal of Trend in Research and Development (IJTRD), ISSN: 2394 9333, Volume 7 | Issue 4,August2020.

2022, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified

| Page2254
©
Journal

Turn static files into dynamic content formats.

Create a flipbook