Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 13 Issue: 01 | Jan 2026

p-ISSN: 2395-0072

www.irjet.net

SynthoMed AI: Generating Synthetic Medical Record via an AI Chat bot SURYA G 1, MOHAMMED RASOOL R 2, SHRINANDA HK 3, MOHAMMED FLAH LAHORI 4 5 Assit.Prof: Mary Anitha T 1,2,3,4 Dept. of Artificial Intelligence & Machine Learning Engineering,

The Oxford College of Engineering, Bommanahalli, Bengaluru-68, Karnataka, India 5 Dept. of AIML Engineering, the Oxford College of Engineering, Bommanahalli, Bengaluru-68, Karnataka, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Access to authentic medical records is an essential part of healthcare education, yet it is often restricted due to strict

patient privacy regulations and ethical concerns. This paper is presenting a novel going to addresses the challenges by AI-driven chat bot system designed to generate high-fidelity synthetic medical records. The project methodology encompasses a complete development lifecycle, including data pre-processing, AI model training, chat bot integration, and final deployment. By leveraging Natural Language Processing (NLP) and probabilistic sampling, the system allows students and healthcare learners to query and generate realistic patient data scenarios without risking the exposure of sensitive personal information. The results demonstrate a functional, interactive tool that democratizes access to medical data for training purposes. Furthermore, the development process highlights the efficacy of combining deep learning with conversational interfaces to solve practical challenges in health informatics and technical education. Key Words: Synthetic, AI Chat bot, Synthetic Medical Data(SMD), Healthcare Privacy, Generative Adversarial Networks (GANs), Natural Language Toolkit (NLTK), Synthetic Data Generation (SDG), Medical Informatics, Health Insurance Portability and Accountability Act (HIPAA), Personally Identifiable Information (PII), Electronic Health Records (EHR), Natural Language Processing (NLP).

1. INTRODUCTION 1.1 Overview The healthcare industry is undergoing rapid transformation, driven significantly by advances in data analysis and artificial intelligence (AI). High-quality medical data forms the essential foundation for numerous critical activities like: training future healthcare professionals, supporting clinical research, and developing robust AI diagnostic tools. Electronic Health Records (EHRs) have become the digital standard, encapsulating patient histories, laboratory results, and treatment plans in a structured format. However, a persistent tension exists between the immense value of this data and the paramount need to protect patient privacy. This paper introduces a novel approach designed to navigate this tension: an AI-powered chatbot system named Syntho Med AI, which generates realistic synthetic medical records. By generating synthetic patient profiles that replicate the statistical and linguistic patterns of real data, without containing any actual personal information, this device aims to make high-quality medical data accessible to all for research, education and innovation, while upholding to strict ethical standards of patient confidentiality. 1.2 Problem Statement The real-world medical record is restricted and access for educational and research purposes, thus creating well-documented advancements in health care. This limitation arises due to rigorous ethical and legal frameworks, most especially those surrounding the strict protection of Personally Identifiable Information (PII) as through regulations such as the Health Insurance Portability and Accountability Act (HIPAA). These are protecting patient rights, but have unintended of creating formidable barriers in path of students, researchers, and developers. These individuals require rich, realistic datasets in order to hone analytical skills, test new software. The results are a scarcity of practical, legally compliant learning resources. This space not only limits practical experience but also stifles innovation, highlighting the urgent need for a solution can reconcile dual imperatives of privacy preservation and data accessibility. 1.3 Objectives The main goal of this project is to develop a reliable and secure system that provides equal access to medical data for learning and experimentation. We aim to construct a functional pipeline that processes user requests, generates medically plausible data, and delivers it in a user-friendly format. The system is designed to be robust, scalable, and suitable for deployment in test and educational environments.

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 42