Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 10 | Oct 2025

p-ISSN: 2395-0072

www.irjet.net

HYBRID MODEL FOR DIABETES PREDICTION FEATURE ANALYSIS S. deepika1, S. Amsa2 1PG student, Department of Computer Application, Jaya College of Arts and Science, Tiruninravur.

2Assistant Professor, Department of Computer Application, Jaya College of Arts and Science, Tiruninravur.

---------------------------------------------------------------------***---------------------------------------------------------------------

developing a novel hybrid model that fuses the feature extraction prowess of Deep Learning with the classification stability of Random Forest. The key novelty of our approach lies in its architecture: using the DNN not as a classifier, but as an intelligent feature engineering layer that transforms raw input into a higher-order representation optimized for the Random Forest.

Abstract - Predicting diabetes with high accuracy remains a significant challenge in medical informatics. Existing models, whether traditional machine learning or deep learning, often hit a performance ceiling due to issues like overfitting or an inability to capture the full complexity of patient data. Our work confronts this limitation head-on by proposing a novel hybrid architecture that strategically integrates a Deep Neural Network (DNN) with a Random Forest (RF) classifier. The DNN acts as a powerful feature extractor, uncovering subtle, non-linear patterns within the data, which are then passed to the robust RF for final classification. We trained and tested this model on the Pima Indians Diabetes Dataset from the UCI Machine Learning Repository [1], applying SMOTE to ensure class balance. The results are compelling: our hybrid model achieved a 96.4% accuracy, outperforming both standalone Random Forest (91.2%) and Deep Learning (93.0%) models. This demonstrates that the synergy between these two algorithms creates a more reliable and accurate tool for diabetes risk assessment, holding real promise for clinical decision-support systems.

To ensure rigorous, reproducible, and comparable results, this study is benchmarked on the Pima Indians Diabetes Dataset from the UCI Machine Learning Repository [1]. The results demonstrate that our hybrid model achieves superior performance, offering a promising path toward reliable, AIpowered clinical decision-support systems.

1.1 LITERATURE REVIEW The application of machine learning to diabetes prediction has evolved significantly, beginning with traditional statistical models. Algorithms such as **Logistic Regression (LR)** and **Support Vector Machines (SVM)** were valued for their interpretability and efficiency on smaller clinical datasets [5]. While these methods provided a foundational baseline, their performance is often limited by an inherent inability to model the complex, non-linear interactions between key risk factors such as glucose, BMI, and insulin levels, which are critical for accurate diagnosis.

Key Words: Diabetes Prediction, Machine Learning, Random Forest, Deep Learning, Hybrid Model, Feature Importance, Clinical Analytics.

1. INTRODUCTION Diabetes mellitus represents a significant worldwide health burden, a persistent metabolic disorder marked by abnormally high concentrations of sugar in the blood. Left unmanaged, it can lead to severe complications, including heart disease, kidney failure, and neuropathy. The cornerstone of mitigating these outcomes is early and accurate detection. Traditional diagnostic methods, while effective, often rely on multiple tests and clinical visits, creating a barrier to rapid screening.

To overcome these limitations, ensemble methods like **Random Forest (RF)** became the benchmark for this task [3]. By aggregating predictions from multiple decision trees, RF effectively reduces overfitting and captures non-linear relationships more reliably than single models, typically achieving accuracies between 85-90% on standard benchmarks. However, its performance can plateau, as it may struggle to infer highly complex, abstract patterns directly from the raw feature space without advanced feature engineering.

The emergence of machine learning (AI/ML) offers a paradigm shift, enabling the analysis of complex patient datasets to identify at-risk individuals efficiently. However, single-model approaches frequently possess inherent weaknesses. For instance, Random Forest models might overlook complex feature interactions, while Deep Learning models can be data-hungry and prone to overfitting on smaller clinical datasets.

The recent shift towards **deep learning (DL)** promised a solution through automatic feature representation. Deep Neural Networks (DNNs) can learn intricate hierarchies of features, pushing accuracy boundaries further. However, their "black-box" nature and propensity to overfit on small, tabular clinical datasets remain significant hurdles. This has spurred interest in **hybrid models** that combine the strengths of different algorithms. While recent ensemble techniques like stacking have shown promise, our work introduces a novel sequential pipeline that uniquely leverages a DNN as a dedicated feature extractor for a

This research is driven by a simple yet powerful question: can we combine the strengths of different algorithms to create a more potent predictor? We answer this by

Impact Factor value: 8.315

ISO 9001:2008 Certified Journal

Page 793