Issuu

From Human Judgment to AI Algorithms: A Literature-Based Analysis of Automated Essay Scoring Accuracy and Limitations

Aaryan Karlapalem1

1Student pursuing Computer Science at Tomball Memorial High School, Tomball, Texas, USA

Abstract - Since its release in 2022, arti/icial intelligence (AI) has become a prominent part of most education systems, with nearly 20% of American schools using some form of AI in the classroom [3]. While most of these schools just use AI for lesson planning and personalized learning, however, one of the most labor-intensive tasks in education, grading essays, has also begun to shift toward automation. To reduce teacher workload and improve grading ef/iciency, teachers and state agencies such as the Texas Education Agency have been exploring AI-based tools for evaluating student writing. These tools, called automated essay scoring (AES) systems, use natural language processing (NLP) and machine learning to mimic human judgment in reviewing human writing. This paper examines the capabilities and limitations of modern AES technologies, comparing their grading accuracy, feedback quality, and overall fairness to traditional human grading. Using recent research, it explores whether AI is capable of permanently replacing human graders. Ultimately, it concludes that while current AES systems offer some bene/its in scalability and ef/iciency, they are not yet advanced enough to fully replace human judgment in high-stakes or nuanced essay evaluation contexts.

Keywords: Artificial intelligence (AI), automated essay scoring (AES), Natural language processing (NLP)

1. Introduction

DuringtheCOVID-19pandemic,virtuallyeverymajorindustryexperiencedhugeimpactsbecauseoftheglobaloutbreak,and theeducationindustrywasnoexceptiontothat,withthousandsofschoolsworldwidebeingforcedtoswitchtovirtualschool andvideocallstoreplacephysicalschoolsandclasses.Duringthistime,theburdenputonteacherswasimmense,withteachers having to juggle their personal health as well as their duty to provide a quality learning experience to students despite the pandemic.Thisquicklyresultedinteachersmass-quittingtheirjobsbecauseofincreasedstressandfatigue.Thiscontinuesto beamajorproblemeventoday,withateachershortagebeingmarkedasaseriouscommunityproblem.Schoolsanduniversities globallyhavebeenstrugglingtohirequaliGiedteachers,aproblemthathasonlybecomeworsebecauseoftheglobalpandemic. Furthermore,youngpeopletodayhavealsobeenshowinglessinterestinpursuingacareerineducation,withMillennialsand GenerationZbeingtheleastinterestedinbecomingfutureteachers.AllthesefactorstogetherintensiGiedtheoverallproblem andtheneedtoacceleratewaystobringinmoreteachers.Additionally,accordingtoastudydonebyMerrimackCollegein2022, itwasshownthatfrom2012to2022,thepercentageofK-12teacherswhowere“verysatisGied”withtheirjobsfellfrom39%to 12%,thelowestthisstatistichaseverbeen.Additionally,moreteachershavereportedthattheywanttoleavetheirjobstopursue acareeroutsideofeducation,furthershowingthegrowingdisinterestinteaching.

2. The Rise of Arti Bicial Intelligence in Education

Eversincetheearly2020s,artiGicialintelligencehasbeenexperiencingrapidadoptionamongvariouseducationsystemsaround theworld,andforgoodreason.EversincetheglobalCOVID-19pandemic,teachershavebeenundermorestressthanever,and they desperately need to streamline their work Glows to keep up with the increasing amount of work they have. Thus, AI technologiespresentedthemselvesasthesolutiontothis,providingtoolstocreatelessonplans,slideshows,andevengrading toanextent.Initially,theseAIservicesweresimplybeingusedtoautomatepartsofateacher’sjob,suchasthegenerationof lesson materials and assessments. However, with the recent rise of various LLMs (large language models) that specialize in certaintasks,suchasGPT-3.5,GPT-4omini,Gemini2.5Glash,andmore,theextenttowhichAIisbeingutilizedintheclassroom hasonlybeenincreasing.Today,thesemodelsarebeingtestedanddeployedformorecomplextasks,likeessayevaluationand feedback,grading,andeventutoringtosomeextent.Theseadvancementsbringaboutashiftfromsimplyhelpingteacherswith theirjobstopotentiallytryingtoreplacethem,especiallyinareaslikewritingassessmentthataretime-intensiveforteachers. One of the emerging concepts tied to this shift is precision education, which aims to use artiGicial intelligence to customize instructionandgradingtoeachstudent’sneeds[5].Thiskindofsystemintegratesdeeplearning,transferlearning,andlearning analyticstohelpstudentspersonally.Withinthisnewframework,automatedessayscoring(AES)systemsareoftenusedtonot

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072

Volume: 13 Issue: 01 | Jan 2026 www.irjet.net

onlygradestudentessays,butalsototracktheprogressionofthestudent’swritingovertime,providingvaluableinsightinto theirwritingskills.Thiskindofsystemhasalsocaughttheattentionofstate-levelagencies,suchastheTexasEducationAgency (TEA),withthemrecentlymakingthedecisiontouse artiGicialintelligencesystemstogradethestandardizedStateofTexas AssessmentsofAcademicReadiness(STAAR)teststhataretakenbymillionsofkidsfromgrades3-12[4].Thehopeofthisisthat itreducestheworkloadofgraderswhowouldotherwisehavetospendhundredsoreventhousandsofhoursgradingessays overthesummer,andthissolutionhasbeenshowntobehighlyeffectiveatsolvingthisissue.However,whilethepromiseof speedandscalabilityispromising,theseearlyAIimplementationsraisenumerousquestionsregardingwhetherAIcanreplicate thenuance,empathy,andcontextualunderstandingthathumangradershave.

3. How Automated Essay Scoring Works

AutomatedEssayScoring(AES)systemsareAItoolsspeciGicallydesignedtomimicaspectsofhumanjudgementwhengrading student-producedwriting.Thesetoolslookatthecontent,structure,andspeciGicallythelanguageofessaystoproduceaGinal grade and feedback. This is all powered by a combination of natural language processing (NLP) and machine learning (ML) techniques, with the primarygoal being to offer a fastand efGicient way to evaluatehuman writing, something that is much neededintheeducationindustryamidtheincreasingteacherworkloadsandshortages.

3.1. Holistic vs Trait-SpeciEic Scoring

AESmodelsgenerallyoperateinoneoftwoways:holisticscoringortrait-speciGicscoring.Holisticsystemsanalyzetheentire essayandreturnasinglevaluetorepresent theessay’sgeneralquality,verysimilartohowanormalteacherwouldgradean essay. On the other hand, trait-speciGic scoring systems assess very distinct components of a piece of writing, such as the grammar,essaystructure,andsometimesevencreativity/originality,assigningindividualscoresforeachcategory.Tocompare, holistic systems are often faster and much simpler, but trait-speciGic models provide much more detailed and constructive feedback for both students and teachers. For example, the ASAP (Automated Student Assessment Prize) dataset, commonly used in AES research, includes essays evaluated along four main categories: ideas, organization, style, and conventions [10]. Thesetraitsserveastrainingdataformachinelearningmodelsthattrainonpasthumanessaysandthenreplicatethegrading process.

3.2. The ArtiEicial Intelligence Models Behind AES

Earlier AESsystems,such asProject EssayGrade (PEG) and e-rater, usedhard-coded features, suchasword count, sentence length,andspellingaccuracy,topredictscores.However,thesemodelsoftenfailedtounderstandthedeepermeaning,structure, orlogicalcoherenceoftheessaystheywereanalyzing.Morerecently,however,deeplearninghasdramaticallyenhancedthe sophisticationofAEStools.

Newer systems now rely on transformer-based models such as BERT (Bidirectional Encoder Representations from Transformers)andGPT(GenerativePre-trainedTransformer),whichcananalyzetextsatamuchdeeperlevel.Thesemodels understand word relationships, sentence context, and even document structure to provide a more accurate score that more closelymimicstraditionalhumangrading.Forexample,BERTcanevaluatesentence-levelgrammarandcoherence,whileGPT-4 andotherLLMscananalyzeentireessaysforlogic,consistency,andoriginality.Ina2025studyconductedbySeBleretal, 37 teachersfromGermanygraded20Germanstudentessaysbasedon10speciGiccriteria.Thesameessayswerethengradedby LLMssuchasGPT-3.5,o1,LLaMA,andMiztralusingastandardizedprompt.Thebestperformingmodel,o1,achievedaSpearman correlationcoefGicientof0.742inoverallalignmentwithhumanraters,whichdemonstratesanexceptionallystrongagreement withhumangraders[7].

Interestingly, the study also found that the closed-source models (GPT-4, o1) consistently outperformed their open-source counterpartsinbothinter-raterreliabilityandalignmentwithhumanfeedback,especiallyinlanguage-relatedareaslikespelling andgrammar.However,eventopmodelsstruggledincontent-basedcategoriessuchasevaluatingthelogicofanargumentor theeffectivenessofanarrativeconclusion,whichrequirealevelofunderstandingnotyetpresentineventhemostadvancedAI modelstoday.However,atthefastspeedthattheseLLMsareevolvingandbecomingmoreadvanced,itishighlylikelythatAI canfullymimicahumangraderbytheearly2030s,especiallywiththerecentadvancesino1andLLaMA.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072

Volume: 13 Issue: 01 | Jan 2026 www irjet net

3.3. Hybrid and Collaborative AES Systems

Thepreviousmodelsdiscussedfallunderthecategoryofsingle-modelapproaches,astheyonlyuseoneAImodeltodoallthe processing. However, researchers have recently been exploring the possibility of collaborative models, which divide grading tasksamongmultiplespecializedneuralnetworks.In2024,oneoftheveryGirstCollaborativeDeepLearningNetworks(CDLNs) was introduced. This combined convolutional neural networks (CNNs), recursive neural networks (RvNNs), and LSTM to individuallyassessgrammar,structure,andcontentbeforecombiningthemallintooneGinalgrade.Thisnewmodelachieveda gradingaccuracyof85.5%,outperforminganytraditionalsystemandevenmostBERT-basedmodels[10].

Thismodularapproachmimicsthewayhumansgradeessaysaswellbyassessingwritingfrommultipleangles,suchasgrammar, clarity,ideadevelopment,andlogicalGlow,whichallowsforeachneuralnetworktospecializeinonespeciGicangleofgrading theessay.Thisalsoresultsinamorebalancedandthoroughevaluationoverall.

3.4. Limitations and Constraints

Despite all these advances, however, AES systems do come with some limitations. For one, their performance is heavily inGluencedbythetrainingdata,whichmightreGlectexistingbiases,suchasfavoringnativespeakersorformalwritingstyles. Moreover,AESsystemsstrugglewithcreative,emotional,orunconventionalwriting,contextsthathumansunderstandfarbetter thaneventhebestAImodels.Furthermore,mostsystemsstillrelyonpatternrecognitionasopposedtogenuineunderstanding, whichcanoftenleadtoinaccuratescores,especiallywhenstudentsusecomplexorunconventionallanguage.Ultimately,while the technical complexity of AES systems has indeed increased, they remain best suited for low-stakes situations or as supplementstohumangrading,ratherthanbeingafull-onreplacement.

4. Accuracy: Can AI Match Human Grading?

ThemainclaimbehindadoptingAEStoolsisthattheycanassessstudentwritingasaccuratelyashumaneducators,butwiththe beneGitofbeingfarmoreefGicientandscalable.Whilemostrecentadvancesinnaturallanguageprocessing(NLP)andlarge languagemodels(LLMs)havebroughtAESsystemsclosertothisgoal,questionsremainabouttheirreliabilityinhigh-stakes grading.NumerousstudieshaveshownthatAIsystemscanapproximatehumanscoringundersomecircumstances,butthese studiesalsorevealsomediscrepancies,especiallywhenessaysrequiresubjectiveinterpretation,inwhichcasethereisverylittle accuracywhencomparedtohumangraders.

4.1. Consistency and Correlation with Human Raters

SeveralrecentstudieshaveshownstrongcorrelationsbetweenAESoutputandhumangrading,particularlywhenessaysare scoredaccordingtostructuredrubrics.Inacomparativestudyin2025,itwasfoundthatclosed-sourceLLMslikeGPT-4ando1 performed exceptionally well, achieving very high scores that indicate a high level of consistency with human graders. In particular,theo1modelmanagedtoscoreaSpearmancorrelationof0.742,indicatingstrongalignmentinoverallscoring[7].

However, these systems were most accurate in evaluating surface-level features, such as spelling, grammar, and basic organization, not deeper attributes like logic of argumentation, originality, and effectiveness of narrative endings. Similarly, anotherstudyconductedin2024foundthathybriddeeplearningmodels,suchastheirCollaborativeDeepLearningNetwork (CDLN), scored 85.5% against human scorers, further supporting the idea that deep learning models can match human evaluators,butonlyinstandardacademicconventions[10].

4.2. When AI Falls Short: Depth, Nuance, and Interpretation

WhileAESsystemshavemadesigniGicantprogressinconsistencyandlanguageaccuracy,theystillhavedifGicultywithaspects thatrequirehumaninterpretation,suchashigher-orderthinking,argumentdevelopment,andemotionaldepth.Evenadvanced systemslikeGPT-4oftenmisinterpretorundervalueessaysthatdifferfromstandardformatsordelveintocomplexpersonal experiences[9].Theresearchshowsthatwhen essays presentedunconventional views,subtle irony,orGigurativelanguage, humanevaluatorsweremuchmorelikelytonoticeandrewardthatsophisticationthanAImodelsdid.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072

Volume: 13 Issue: 01 | Jan 2026 www.irjet.net

This struggle to fully grasp context or intent comes from how AES systems are trained. These models depend on statistical patternsandcannot"understand"likehumans.Becauseofthis,creativeorculturallyrichwriting,whichmaystronglyconnect withateacher,canreceivelowerscoresorfacepenaltiesfromAIforlackingstructuralconformityorbeingseenasambiguous.

4.3. Score IN Elation and Gaming the System

AnothermainconcernaboutcurrentAESsystemsisthepotentialforthemtobemanipulatedbystudentswholearnwhattheAI “likes”andspecificallycrafttheiressaysaroundthesepreferences.Inrecentstudies,acorrelationhasbeenfoundbetweenthe amountofcomplexvocabularyandstructuredparagraphinginanessayandthescoreitreceives,regardlessoftheoverallquality oftheessay[6].Thiscreatesasituationinwhichstudentscanpotentiallygamethesystembyadheringtothesepreferencesto artificiallyboosttheiressayscoresdespitewritingaseeminglyworseessayfromtheviewpointofahuman.

ThisstudyhighlightsacommonphenomenonknownasscoreinGlation,alsocommonlyreferredtoasgradeinGlation.Thisis wherestudentswriteessaysaroundcriteriathattheAES“likes”togetahigherscore,takingadvantageofthewaytheAESgrades essays.Inalotofcases,AESsystemsgradedessaysashigherqualitysimplybecausetheywerelonger,evenwhilethewriting lackedbasicgrammar,functionality,andstructure.Suchvulnerabilitiesriskencouragingastyle-over-substanceapproachesto writingandunderminingthepedagogicalgoalofdevelopingformalwritingskills.

4.4. Fairness, Bias, and Cultural Blind Spots

EvenwithalltherecentadvancesinAESsystems,biasremainsoneofthebiggestethicalissuessurroundingthem.AESmodels areusuallytrainedondataconsistingofmostlyformal,academicEnglishwritingoftenwrittenbynativespeakers[2].Thus,the AESscoringwillnaturallyreGlectitstrainingdata,leadingtostudentsfromdifferentlinguisticorculturalbackgroundsbeing potentiallypenalizedforwritingthatdeviatesfromthesenorms,evenwhenitisclear,correct,andcompelling.

ThisissueiscompoundedwhenAItoolsaredeployedwithoutmechanismsfordetectingorcorrectingsuchbiases.Thestudy warnsthatcertaindialectsornarrativestylesaresystematicallyundervaluedandlikelytoreceivealowerscore,especiallyin essayswherepersonalexperienceintersectswithculturalidentity.Thisraisesconcernsaboutnotonlythefairnessofscoring, butalsoabouthowAESsystemsmightreinforceexistingeducationalinequalitiesifusedwithouthumanoversight.

4.5. The Importance of Human Oversight

EventhoughAEStechnologieshavemadeencouragingadvancements,bothstudiesagreedthatAESsystemsareyettobecome readytoreplaceeducatorsinevaluatingstudentessays[2,6].Whiletheycanhelpingradingthelower-levelaspectsofessays, such as grammar, syntax, and structural coherence, they lack the more sophisticated thinking needed to evaluate complex aspects, such as argument development and emotional depth. Both accounts go further to conclude that the optimal grading remedyislikelytobeahybridmodelinwhichAIdoeslow-levelandrepetitiveassessments,butwithoutforfeitingthecapability ofeducatorstoprovidethelastword,especiallywherehighstakesareinvolved.Thisenableseducatorstoleveragethescalability andcost-effectivenessofAIwithoutsacriGicingthenuanceandcompassionofhumanbeings.Aseducationalinstitutionscontinue to explore AI-based writing grading systems, there is a necessity to treat AES models as tools for enhancement and not the substitutionofhumanjudgment.

5. Effectiveness of AI-Generated Feedback

OneofthestrongestpromisesofAESsystemsisnotjusttheirabilitytoassignagrade,butalsotoprovidedetailedfeedbackfor studentstoimprovetheirwritinginsubsequentassignments.Intheory,AIcanofferquickanddetailedresponsesonspeciGic aspectsofanessay,butwhilethesecritiquescanbeausefultool,theireducationalvalueisveryunevenandverylimited.

5.1. Strengths of AI Feedback

ResearchhasfoundthatAIhasbeenhighlyeffectiveatidentifyinglower-orderconcernssuchasspelling,grammar,andsentence structure.GPT-4can,forexample,Glagrepeatedwords,clumsyphrases,orformattingerrorsinamatterofseconds.Thisability

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072

Volume: 13 Issue: 01 | Jan 2026 www.irjet.net

todeliverultra-fast,targetedcorrectionscanhelptospeeduptherevisionprocessforstudentswhoarelookingforquickGixes andminimizetheturnaroundtimebetweendifferentsubmissions.

AES tools also offer a scalable form of feedback in large classroom settings or high-stakes standardized testing scenarios. Educators can use AI-generated reports as a baseline to prioritize and focus their own time and attention on more complex issues, such as content quality or argumentation. In this hybrid approach, teachers have reported a signiGicant reduction in gradingfatiguewhilestillprovidingstudentswithfeedbackonbasicwritingconventions.

5.2. Limitations in Depth and Nuance

Despitetheseadvantages,AESfeedbacklacksthedepththathumancommentaryprovides.A2024studyrevealedthatwhileAI can Glag surface-level mistakes with high precision, its comments on argument logic or thematic development were mostly generic,offeringvaguesuggestionslike“improveyourconclusion”withoutexplicitlyexplaininghow[2].Thislackofdepthlimits theusefulnessofAIfeedbackforhelpingstudentsimproveatwriting.Whereashumanscanadapttheirfeedbacktoastudent’s pastperformanceandpersonalizedlearningstyle,AEStoolsoperatewithoutthislevelofpersonalization.Asaresult,feedback canfeelimpersonalandgeneric,reducingstudentengagementwiththerevisionprocess.

5.3.

Limitations in Depth and Nuance

Studenttrustinfeedbackalsoplaysaroleinitseffectiveness.Researchfroma2021studyrevealedthatmanystudentswere initiallyskepticalofAIcomments,especiallywhentheyconGlictedwiththeirownunderstandingoftheirownworkwithprior humanfeedback[6].SomestudentsevenignoredsomeoftheseAIsuggestionsiftheyseemedirrelevantoroverlymechanical. Conversely, students who saw AI feedback as fair and accurate were more likely to revise their own work in line with the suggestions.Thishighlightstheimportanceoftransparency:studentsneedtounderstandwhyaparticularsuggestionwasmade toproperlytrustandapplyit.

5.4. The Role of Feedback in Learning Outcomes

Ultimately,thegoaloffeedbackistoimprovestudentwritingovertime,notjusttocorrectsmallerrors,andcurrentevidence suggeststhatAIaloneisnotenoughforthispurpose.WhileAESsystemscanprovidefastandreliablecorrections,thebiggest improvementsinstudentwritinghappenwhenAIfeedbackiscombinedwithhumansupport.Thiscombinationallowsstudents to receive immediate corrections along with personalized advice that helps develop higher-order skills like argument development,audienceawareness,andrhetoricalstrategy.Inshort,AESfeedbackisbestseenasaGirstdraftofcommentary;it ishelpfulfortacklingthemechanicalaspectsofwritingbutfallsshortwithouttheinsight,encouragement,andcontextthata humanteacheroffers.

6. Conclusion

In conclusion, artiGicial intelligence in education has created both opportunities and challenges. Automated Essay Scoring systemsaimtotackleteachershortages,lessenworkloads,andimprovegradingefGiciency.ResearchindicatesthatmodernAES models, especially those based on large language models like GPT-4 and o1, can match human graders in consistency when assessingbasicwritingelementssuchasgrammar,structure,andspelling.Theyalsoofferquick,scalablefeedback,makingthem especially useful in large classrooms or standardized testing settings. However, accuracy drops when essays require evaluation of creativity, nuance, or cultural context. AES systems often favor formulaic responses and struggle with unconventionalor emotionallynuancedwriting.Whiletheirfeedbackisfastandfrequentlytechnicallycorrect, ittendstobe too general to promote signiGicant long-term improvement in student writing. Moreover, issues of bias, fairness, and transparencyraiseimportantethicalconcernsaboutusingthesetoolsinhigh-stakesacademicsituations.

Overall,theseGindingsindicatethatAESsystemscannotfullyreplacehumanjudgmentinessayevaluation.Rather,theirmost effectiveroleisincollaborationwithteachers.AIcanmanagerepetitiveandsimplertaskswhileeducatorsremainresponsible formorecomplex,contextualgrading.Thiscombinedapproachoffersapromisingsolution,mergingtheefficiencyofautomation withtheuniqueinsightsofhumaneducators.

International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 p-ISSN: 2395-0072

Volume: 13 Issue: 01 | Jan 2026 www.irjet.net

AseducationcontinuestochangeinanAI-drivenworld,thechallengewillbetobalanceinnovationwithfairnessandtrust.AES technologies are expected to become more advanced in the coming years, but until they can match the depth of human understanding,theyshouldbeusedastoolstoassistteachersratherthanreplacethem.

References

[1] A. Chevalier, J. Orzech, and P. Stankov, “RETRACTED: Man vs Machine: Can AI Grade and Give Feedback Like a Human?,”IZA-InstituteofLaborEconomics,2024.[Online].Available:http://www.jstor.org/stable/resrep69949

[2]. K.Jonall,ArtiGicialintelligenceinacademicgrading:Amixed-methodsstudy,2024.

[3] J.Floden,“Gradingexamsusinglargelanguagemodels:AcomparisonbetweenhumanandAIgradingofexamsin higher education using ChatGPT,” British Educational Research Journal, vol. 51, no. 1, 2024. [Online]. Available: https://doi.org/10.1002/berj.4069

[4] “Texas Education Agency using AI to grade parts of STAAR tests,” AACRAO, 2024. [Online]. Available: https://www.aacrao.org/edge/emergent-news/texas-education-agency-using-ai-to-grade-parts-of-staartests

[5] S.J.H.Yang,“GuestEditorial:PrecisionEducation-ANewChallengeforAIinEducation,”EducationalTechnology & Society,vol.24,no.1,pp.105–108,2021.[Online].Available:https://www.jstor.org/stable/26977860

[6] A. C. M. Yang, I. Y. L. Chen, B. Flanagan, and H. Ogata, “From Human Grading to Machine Grading: Automatic Diagnosisofe-BookTextMarkingSkillsinPrecisionEducation,”EducationalTechnology&Society,vol.24,no.1, pp.164–175,2021.[Online].Available:https://www.jstor.org/stable/26977865

[7] K. Se ler, M. Furstenberg, B. Buhler, and E. Kasneci, “Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring,” in Proc. 15th Int. Learning Analytics &KnowledgeConf.,pp.462–472,2025.[Online].Available:https://doi.org/10.1145/3706468.3706527

[8] E L. Wetzler et al., “Grading the Graders: Comparing Generative AI and Human Assessment in Essay Evaluation,” Teaching of Psychology, vol. 52, no. 3, pp. 298–304, 2024 (original work published 2025). [Online]. Available:https://doi.org/10.1177/00986283241282696

[9] K.BouzianeandA.Bouziane,“AIversushumaneffectivenessinessayevaluation,”DiscoverEducation,vol.3,no.1, 2024.[Online].Available:https://doi.org/10.1007/s44217-024-00320-6

[10] M. Maliha and V. Pramanik, “Hey AI Can You Grade My Essay?: Automatic Essay Grading,” arXiv, 2024. [Online]. Available:https://arxiv.org/abs/2410.09319

[11] L. Loewus, “Why teachers leave or don’t: A look at the numbers,” Education Week, May 4, 2021. [Online]. Available: https://www.edweek.org/teaching-learning/why-teachers-leave-or-dont-a-look-at-thenumbers/2021/05