Issuu

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 13 Issue: 01 | Jan 2026

p-ISSN: 2395-0072

www.irjet.net

Liver Disease Prediction Using Optimized Feature Selection and Data Balancing Techniques Mr. B. Upender1, Bugga Venu Kumar2, Dendi Sai Prakash Reddy3, Allapuram Dheeraj Reddy4, Katiki Sandhya5 1Assistant Professor, Department of Information Technology, TKR College of Engineering and Technology,

Telangana, India

2345Department of Information Technology, TKR College of Engineering and Technology, Telangana, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - The liver is an essential organ in human body

Regression, and Random Forest, are not able to provide accurate results when dealing with complex data.

that undertakes detoxification and metabolism processes in the body. Liver disorders can be influenced by different factors, which include viral infections, genetic diseases, high alcohol consumption, and toxins. The symptoms may vary from one individual to another hence; early detection is not always achievable. In this case, the delay in diagnosis makes it difficult to treat and leads to severe health issues. Thus, early prediction of liver disorders is a critical requirement. Most of the current systems employ basic preprocessing with standard machine learning models like Decision Trees, SVM, and Random Forest. However, there are issues of imbalanced data, noisy records, and limited selection of features, which make them less accurate and poorly predictive. The proposed system addresses these challenges with the help of advanced preprocessing optimized feature selection using RFE, and hybrid data balancing using SMOTE-ENN. It uses powerful boosting algorithms like LightGBM for improving the accuracy and robustness of the prediction. In addition, SHAP provides interpretability for the model to be more reliable and clinically useful. The enhanced approach ensures faster, more accurate, and efficient liver disease prediction.

To overcome these limitations, this study proposes an enhanced liver disease prediction system that combines state-of-the-art machine learning algorithms with optimized data processing. The proposed system uses Recursive Feature Elimination (RFE) to select the most important clinical features and balances the data using the SMOTE-ENN hybrid approach to remove noise. Moreover, efficient boosting algorithms like LightGBM are used to achieve high prediction accuracy, training speed, and robustness. To ensure clinical validity, Explainable AI (XAI) techniques such as SHAP are used to provide clear and interpretable results of the machine learning model. By leveraging optimized feature selection, hybrid data balancing, and high-performance machine learning algorithms, this system is expected to provide a more accurate, reliable, and interpretable solution for early liver disease prediction, which can help healthcare professionals make more informed decisions.

1.1 Machine Learning-Based Liver Disease Prediction

Key Words: Liver Disease Prediction, Machine Learning, LightGBM, Recursive Feature Elimination (RFE), SMOTEENN, Explainable AI (XAI), SHAP.

Machine learning-based liver disease prediction aims at processing clinical and laboratory information to detect liver diseases at an early stage. Liver diseases are hard to diagnose because of their diverse symptoms, imbalanced patient information, noisy medical records, and presence of irrelevant features. These issues make traditional diagnosis and statistical analysis less effective.

1. INTRODUCTION Liver disease is a serious concern in global health; millions of people are affected every year. The liver is a vital organ which plays a crucial role in metabolism, detoxification, and nutrient regulation. Therefore, any damage to the liver can cause life-threatening complications. It is important to detect liver diseases early because, in the initial stages, the symptoms are not noticeable, and it becomes difficult for medical professionals to provide an accurate diagnosis.

In machine learning-based prediction models, complex patterns are learned from patient data and predictions are made for new patients. Boosting models like LightGBM are more appropriate for medical data due to their efficiency in dealing with non-linear patterns, missing values, and highdimensional features. These models are faster to train and more accurate than traditional classifiers.

Machine learning has recently been recognized as a promising approach in medical diagnosis. However, in realworld medical applications, including liver disease, the class distribution is imbalanced, there are missing values, noise, and irrelevant features, which makes the performance of traditional machine learning models less reliable. Many existing models, such as Decision Trees, SVM, Logistic

Impact Factor value: 8.315

To make predictions more reliable, optimized feature selection methods like Recursive Feature Elimination (RFE) are used to detect the most important clinical features that

ISO 9001:2008 Certified Journal

Page 77