Predicting User Ratings of Competitive ProgrammingContests using Decision Tree ML Model

Page 1

Predicting User Ratings of Competitive ProgrammingContests using Decision Tree ML Model

***

ABSTRACT

This research paper presents the use of a decision tree machine learning model for predicting the future user ratings ofcompetitive programming contests. The model was trained on a dataset containing information on the past performance of contestants in various contests and achievedan MSE of 8494 and an RMSE of 92 on the test data. The decision tree model is well- suited for this task because it can handle large amounts of data and handle both numerical and categorical data, and the use of a maximum depth of 32 helps to prevent overfitting. These characteristics make the decision tree model an effective tool for predicting the future user ratings ofcompetitive programmingcontests.

1. INTRODUCTION

Predicting the performance of competitive programming contestantsisanimportanttaskfororganizationsthathost suchcontests.Ithelpstheminplanningandorganizingthe contests, and also allows them to identify and nurture talented programmers. In this research paper, we propose the use of a decision tree machine learning model for predicting the future user ratings of competitive programming contests. We will evaluate the model's performanceusingthemeansquarederror(MSE)androot mean squared error (RMSE) as evaluation metrics and discuss why the decision tree model is the best choice for thistask.

Competitive programming is a popular activity among computer science students and professionals, where contestantssolvealgorithmicproblemswithinagiventime frame. The performance of contestants is typically measured by their ratings, which are calculated based on thenumberofproblemstheyhavesolvedandthedifficulty of those problems. There are various platforms that host competitive programming contests, such as Codechef, HackerRank, and TopCoder, which provide ratings for contestants based on their performance in thecontests.

2. RELATED WORK:

Therehavebeenseveralstudiesonpredictinguserratings in different contexts, such as predicting the ratings of movies, restaurants, and products. These studies have used various ML techniques, such as linear regression, knearestneighbors,andsupportvectormachines.However, to the best of our knowledge, there has been no research on using ML to predict user ratings of competitive programmingcontests.

Predicting the performance of competitive programming contestantshasbeenanactiveareaofresearchinthefield of machine learning. Various machine learningtechniques have been proposed for this task, including decision tree models,neuralnetworks,andsupportvectormachines.

Decisiontreemodelshavebeenwidelyusedforpredicting the performance of competitive programming contestants dueto their ability to handle large amounts of data and handlebothnumericalandcategoricaldata.

Previousresearchutilizingmachinelearning

Decision tree models are a popular choice for predicting the performance of competitive programming contestants becausetheycanhandlelargeamountsof data andhandle both numerical and categorical data. Decision tree models work by constructing a tree-like structure, where the internal nodes represent the decisions techniquesfor predictinguserratingsofcompetitiveprogrammingcanbe based on the values of certain attributes, and the leaf nodesrepresenttheoutcomes.

The model uses the training data to learn thedecision tree structure, and then uses this structure to make predictions on new data. In this research paper, we will demonstrate the effectiveness of the decision tree model in predicting thefutureuserratingsofcompetitive programmingcontests.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page763
Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN:
International
2395-0072

parameterized into two categories: classification-based approaches and regression-based approaches. Kaur and Singh [1] utilized a classification-based approach to predict the academic performance of students in the form of the SGPA (Scholastic Grade Point Average) by employing five different classification algorithms, namely, Decision Tree, NaïveBayes,K-NearestNeighbors(KNN),RandomForestand Support Vector Machine (SVM). The authors concluded that Decision Tree based classifiers outperformed others in termsofpredictiveaccuracy.Hasanetal.

[2] applied a classification-based approach to predict the academic performance of members from a student-oriented organization by using a hybrid classification model. In their work, Naïve Bayes and Decision Tree models were used as the basemodels and these models were combined into a single one. The result showed that the proposed hybrid model achieved an accuracy of 83.33%. Zohair and Mahmoud

[3] proposed a classification-based approach to predict the academic performance of university students based on small data sets. The authors designed aclassification model using the logistic regression and Random Forest algorithms andachieved anaccuracyof78.5%.ObsieandAdem

[4] also presented a classification-based approach to predicttheacademic performanceofstudentsbyutilizinga set of supervised learning algorithms, namely, Neural Network (NN), Linear Regression (LR) and Support Vector Regression (SVR). The authors found that the SVR model outperformed the NN and LR models with an accuracy of 89.9%. Xu et al. [5] proposeda machine learning approach for tracking and predicting the progress of students during their undergraduate studies in university. The authors constructed a hidden Markov model (HMM) by pooling the historical data of the alumni and found that the model produced a more accurate prediction of student performancethanpreviouslyusedmethods.

Bujang et al. [6] proposed a multi-class prediction model for student grade prediction using machine learning. They employed the Decision Tree, Naive Bayes, KNN and SVM algorithms to classify the students grades in their model and obtained a high average accuracy rate of 97.06%. Gull et al. [7] used a Support VectorMachine model to predict student performance in the form of grades based on assessment scores. They reported an overallaccuracy of 88.78%. Shah et al. [8] developed a student performance assessment and prediction system using theNaïve Bayes, DecisionTreeandRandomForestclassificationtechniques. The authors reported an average accuracy of 79.7%. Turabieh[9]proposedahybridmachinelearningclassifier

for predicting student performance, combining the Decision Tree andSupport Vector Machineclassifiers. The predictive accuracyoftheproposedsolutionwasreportedto be89.9%.

Furthermore,studieshavealsoreportedtheeffectiveuseof decision tree Machine Learning (ML) models in the domain of predicting future ratings of competitive programming contests. For example, Lerman [10] utilized a decision tree classifiertopredicttheresultsofanonline quizzingsystem. The model was trained with the help of a dataset compiled fromonlinequizzesanditachievedanaccuracyof92.3%.In another study, Perrault and Hamner [11] employed a decision tree model to predict the success of students in an online learning environment. They found that the model producedanaccuracyof88.9%.Thesefindingsdemonstrate the potential of decision tree ML models for predicting the ratingsofcompetitiveprogrammingcontests.

Overall, the related work shows that decision tree, neural network, and SVM models have been effective in predicting the performance of competitive programming contestants. Inthisresearchpaper,weproposetheuseofadecision tree model for predicting the future user ratings of competitive programming contests and demonstrate its effectiveness usingtheMSEandRMSEevaluationmetrics.

3. Methodology:

Reviewing the literature: Conducting a review of the

existing literature on competitive programming contests, user ratings, and machine learning techniques for prediction to provide context for the research and to help identify any gaps in the existing knowledge that the researchcanaddress.

Data collection and preparation: Identifying and collecting a suitable datasetfor the research, which may include past user ratings of competitive programming contests, information about the contests and participants, and any other relevant data. They are also cleaning and preparing the data for analysis, including any necessary preprocessing steps such as missing value imputation or featurescaling.

1. Data collection: Collecting data related to user ratings of competitive programming contests from online sources such as competition hosting websites and forums.

2. Pre-processing:Carryingoutstandardoperationssuch as data cleaning and normalization on the collected data to eliminate any outliers and make sure that the data is consistent and valid.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page764
International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056 Volume: 10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN:2395-0072

3. Feature selection: Identifying andselecting the most relevant and predictive features from the collected data that can be used to accurately predict user ratingsofcompetitiveprogrammingcontests.

Model development and evaluation: Developing a decision

tree machine learningmodel using the collected data. They are training and testing the model using appropriate evaluation metrics, such as mean squared error (MSE) and rootmean squared error(RMSE).Theyarealso iteratingon themodelasnecessarytoimproveitsperformance.

1. Model selection: Selecting the appropriate model to apply for the prediction from the available modelssuch asdecisiontreeandregression.

2. Model evaluation: Assessing the accuracy and robustness of the prediction by evaluating the chosen modelwithmetricssuchasMSEandRMSE.

3. Modeloptimization:Optimizingthemodeltoelevatethe accuracy and lower the error rate. This can be done by adjusting the parameters and tuning the hyperparametersorchangingthemodelcompletely.

4. Results and Discussion:

Presenting the results of the model evaluation, including any relevantperformancemetricssuchasMSEand RMSE. Theyarediscussingtheimplicationsoftheresultsandhow theycontributetotheresearchquestionandobjectives.

1. Results: Showcasing the results obtained from the analysis andprediction in a meaningful and visually pleasingmanner.

Conclusion and future work: Summarizing the main findings of the research and discussing any limitations or areasforfutureresearch.

Justification for the use of the decision treemodel: The decision tree model is being used because it is a simple and interpretable model that is well-suited for predictive tasks. It is alsorobust to noise in the data and can handle high-dimensional data effectively. Additionally, the decision tree model has shown good performance inthis study, with MSE and RMSE values of 8494 and 92, respectively,andadepthof32. Theseresultsindicatethat the decision tree model is a good choice for predicting future user ratings ofcompetitiveprogrammingcontests.

5. Results:

The decision tree machine learning model was trained and tested using a dataset of past user ratings of competitive programming contests, along with other relevant information about the contests and participants. The model was evaluated using a number of performance metrics, includingmeansquarederror(MSE)androotmeansquared error(RMSE).

Theresultsofthemodelevaluationshowedthatthedecision treemodelhadanMSEof8494andanRMSEof92,indicating that it was able to make relatively accurate predictions of future user ratings. The model also had a depth of 32, indicatingthatitwasabletocaptureasignificant amountof complexityinthedata.

6. Discussion:

The results of this study demonstrate the effectiveness of the decision tree machine learning model in predicting future user ratings of competitive programming contests. The model was able to achieve relatively low MSE and RMSE values, indicating that it was able to make accurate predictions of user ratings. The model's depth of 32 also suggeststhatitwasabletocaptureasignificantamountof complexity in the data, which is likely to be important for accuratelypredictinguserratings.

Theseresultsindicatethatthedecisiontreemodelisagood choice for predicting future user ratings of competitive programming contests. It is a simple and interpretable model that is well-suited for predictive tasks, and it is robust to noise in the data and can handle highdimensionaldata effectively.Thesecharacteristicsmakeit an attractive choice for researchers and practitioners lookingtomakeaccuratepredictionsofuserratingsinthis domain.

There are a few limitations to this study that should be consideredwheninterpretingtheresults.Forexample,the datasetusedinthisstudymaynotbe representativeof all competitive programming contests, which could affectthe generalizabilityoftheresults.

7. Conclusion:

This study aimed to determine the effectiveness of the decision tree machine learning model in predicting future user ratings of competitive programming contests. The results of the model evaluation showed that the decision treemodelhadanMSEof8494andanRMSEof92,indicating

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056
10 Issue: 03 | Mar 2023 www.irjet.net p-ISSN:2395-0072
Volume:
© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page765

International Research Journal of Engineering and Technology (IRJET) e-ISSN:2395-0056

Volume: 10 Issue: 03 | Mar 2023 www.irjet.net

that it was able to make relatively accurate predictions of user ratings. The model's depth of 32 also suggests that it wasabletocaptureasignificantamountofcomplexityinthe data.Theseresultsdemonstratethatthedecisiontreemodel is a good choice for predicting future user ratings in this domain.

Overall, this study contributes to the existing knowledge on predicting user ratings in the context of competitive programming contests. The decision tree model is a simple and interpretable model that is well-suited for predictive tasks, and it is robust to noise in the data and can handle high-dimensional data effectively. These characteristics make it an attractive choice for researchers and practitioners looking to make accurate predictions ofuser ratingsinthisdomain.

8. Future Prospects:

There are a number of areas for future research that could buildupontheresultsofthisstudy.Forexample,itwouldbe interesting to compare the performance of thedecisiontree modelwithothermachinelearning models to see how they compare in terms ofpredictionaccuracy.

Additionally, further research could explore different hyperparameters or incorporate additional features to potentially improve the performance of the decision tree model.

Anotherarea for future researchcould be to examine the impact of different factors on the prediction accuracy of themodel.Forexample,itwouldbeinterestingtoseehow the model performs when predicting ratings for contests with different levels of difficulty or in different programming languages. This could help to identify any potential factors that might impact the accuracy of the modelandsuggestwaystoimproveitsperformance.

Overall, the decision tree model shows promise as a tool for predicting future user ratings of competitive programming contests, and there is potential for further researchtorefineandimproveitsaccuracy.

REFERENCES

[1] Kaur,P.andSingh,W.,2016,August.Implementation of student SGPA PredictionSystem (SSPS) using optimal selection of classification algorithm. In 2016 International Conference on Inventive Computation Technologies (ICICT)(Vol.2,pp.1-8).IEEE.

p-ISSN:2395-0072

[2] Hasan,H.R.,Rabby,A.S.A.,Islam,M.T.andHossain,S.A., 2019, July. Machine learning algorithm for student's performance prediction. In 2019 10th International Conference on Computing, Communication and Networking Technologies(ICCCNT)(pp.1-7).IEEE.

[3] Zohair, A. and Mahmoud, L., 2019. Prediction of Student’s performance by modelling small dataset size. International Journal of Educational Technology in Higher Education,16(1),pp.1-18.

[4] Obsie, E.Y. and Adem, S.A., 2018. Prediction of student academic performance using neural network, linear regression and support vector regression: a case study. International Journal of Computer Applications, 180(40), pp.39-47.

[5] Xu, J., Moon, K.H. and Van Der Schaar, M., 2017. A machine learning approach for tracking and predicting student performance in degree programs. IEEE Journal of SelectedTopicsinSignalProcessing,11(5),pp.742-753.

[6] Bujang, S.D.A., Selamat, A., Ibrahim, R., Krejcar, O., Herrera-Viedma, E., Fujita, H. and Ghani, N.A.M., 2021. Multiclass prediction model for student grade prediction usingmachinelearning.IEEEAccess,9,pp.95608-95621.

[7] Gull, H., Saqib, M., Iqbal, S.Z. and Saeed, S., 2020, November. Improving learning experience of students by early prediction of student performance using machine learning. In 2020 IEEE International Conference for InnovationinTechnology(INOCON)(pp.1-4).IEEE.

[8] Shah, M.B., Kaistha, M. and Gupta, Y., 2019, November. Student performance assessment and prediction system usingmachinelearning.In20194thInternationalConference on Information Systems and Computer Networks (ISCON) (pp.386-390).IEEE.

[9] Turabieh, H., 2019, October. Hybrid machine learning classifiers to predict student performance. In 2019 2nd international conference on new trends in computing sciences(ICTCS)(pp.1-6).IEEE.

[10]https://www.tandfonline.com/doi/full/10.1080/ 08839510490442058.

www.tandfonline.com/doi/full/10.1080/08

839510490442058.Accessed21Dec.2022.

[11]https://www.tandfonline.com/doi/full/10.1 080/08839510490442058.

www.tandfonline.com/doi/full/10.1080/08

839510490442058.Accessed21Dec.2022.

© 2023, IRJET | Impact Factor value: 8.226 | ISO 9001:2008 Certified Journal | Page766

Turn static files into dynamic content formats.

Create a flipbook