sing Generative Large Language Models for Assertion Detection in Medical Notes U MS Computer Science Thesis Project By: Cynthia Zhao Advisor: Dr. Simona Doboli he unstructured nature of electronic medical records and clinical notes make it difficult for Natural T LanguageProcessingtechniquestoautomaticallyanalyze,yetitisimportantforhealthcareprofessionals toaccuratelyandefficientlyunderstandclinicalnotesanddeterminethepresence,absence,orpossibility of certain medical problems that may exist in their patients. To address this issue, the task of categorizingdifferentassertions(i.e.statementsexpressingfacts)canbeemployed.Thisisalsoknown asclinicalassertiondetectionandithasbeenexploredusingnon-generativemodelstoidentifyassertion classes of medical entities in unstructured clinicalnotes.Non-generativemodelsaremodelsthatdon’t generatetextbutaregoodatsortingdataandmakingpredictionsbasedonpatterns.Thisprojectdiffers in that we employ a generative LLM for assertion detection. Generative models have the ability to generate text, enabling thegenerationofexplanationsalongsidetheassertionclassification,potentially offering more supportformedicalprofessionals.Therearemanyapplicationsforthisinthehealthcare realm. Primarily, it is crucial for healthcare professionals to efficiently comprehend the information contained in clinical notes.Itcanalsobeusedfortheautomationofclinicaldecisionsupport,medical risk evaluation, and much more. urprimaryaimistoutilizeaclinicalnotealongwithamedicalentityanditscontext,asprovidedby O the 2010 i2b2 Assertion Task dataset, to enable the model to determine whether theentityispresent, absent, or possible inapatient.ThegenerativeLLMthatweusediscalledMistralandithas7billion parameters.Throughataskcalledfinetuning,wecanchangeor“finetune”theseparameterssothatthe modelisbettersuitedtoourspecifictaskofclinicalassertiondetection.Usingthei2b2datasetandthe comprehensivetrainingdatasetthatwegeneratedthroughaniterativeprocess,weobservedtremendous progressinthefinetunegeneratedpredictions.Improvingfromapre-finetunedmicroF1-scoreof0.50to a post-fine tuned micro F1-score of 0.88, underscoringthecapabilityofourfine-tunedmodeltolearn essential patterns and excel in assertion detction. his project also demonstrates the importance of prompting formats and their impact on the model’s T generated responses. We observed that utilizing one-shot prompting, or prompting that provides an exampleofthedesiredresponse,improvedthemodel’spredictionsaconsiderableamountoverzero-shot prompting, or prompting without any examples in the prompt, without any additional finetuning. Combiningone-shotpromptingwithothermethodslikemultiplechoicepromptingandchainofthought promptingenabledustodevelopapromptingformatthatenhancedresponsegeneration,surpassingthe performance of a basic zero-shot prompt. hileourresultsdonotoutperformthoseofthestate-of-the-artBERTmodel,ourresultsarepromising W and show the potential of the application of generative models in the healthcare field. Some areasof further research include exploration into medical question and answerpromptengineeringtogenerate optimal prompts as well as finetuning Mistral with other clinical datasets and fine tuning other generative LLMs on medical clinical notes.