Glass Box Explainability in AI on Lung Cancer Prediction by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 12 Issue: 03 | Mar 2025

p-ISSN: 2395-0072

www.irjet.net

Glass Box Explainability in AI on Lung Cancer Prediction Vishakha Mistry Head of Department, Department of Information Technology, 360 Research Foundation, Tumkaria, Bihar, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - According to the WHO, Lung cancer is the most

Authors, Gaoyang Liu and Bochao Sun in [1] have used EBM for compressive strength prediction. Concrete mix design data was collected from the UCI repository. EBM, Random forest, Decision Tree, XGBoost are performed over downloaded datasets. The authors evaluated the performance of EBM and other ML models and found that the EBM algorithm outperformed the other models.

common cause of cancer-related fatalities globally, accounting for the highest death rates among both men and women. As a result, identifying, diagnosing, and predicting lung cancer at an early stage is critical. This paper investigates the feasibility of predicting lung cancer illness using machine learning black box models and interpreting the findings using the machine learning package InterpreteML.

The authors of [2] presented a case study to predict which patients are most likely to be readmitted to the hospital within 30 days of being released. The AUC of Logistic Regressor, Random Forest, and Generalized Additive Model with pairwise interaction (GA2M) have been compared and they have shown that give the best accuracy and as well as maintain intelligibility.

Key Words: Machine Learning, Lung cancer detection, explainable Artificial Intelligence (XAI), Explainable Boosting Machines (EBMs).

1. INTRODUCTION Cancer is more than simply an illness. There are several varieties of cancer, which can occur anywhere in the body. Lung cancer is defined as the uncontrolled growth of abnormal cells in the lungs. Lung cancer normally does not create apparent symptoms until it has progressed across the lungs. So this type of cancer is more serious than many other types of cancers. The survival rate is determined by the extent of the cancer's spread. As a result, early detection of cancer can make a significant impact.

Senthil and B. Ayshwarya [3] has presented Lung Cancer prediction using Feed Forward Back Propagation Neural Networks with Optimal Features. Lung cancer feature extraction is done by particle Swarm optimization (PSO) technology. Performance comparison of KNN, SVM, Bayes Network, and proposed NN-PSO has been shown. The proposed method has demonstrated remarkable accuracy, and NN-PSO can be used effectively by Lung Cancer oncologists.

Machine Learning has widespread applications in the real world. The uses of ML in healthcare are rising, and it benefits patients and professionals in various ways. Machine learning enables us to relate existing data to future illness estimates.

Authors in [4] have proposed SVM classifier on the Lung cancer dataset. According to the assessment results, SVM with two rounds of SMOTE resampling is performed on the Lung Cancer dataset to achieve the greatest performance. The accuracy of the KNN method is 68.9%, but SVM reaches 98.8% accuracy.

Though Machine Learning solves complicated issues, it may also be a black box, without explaining why or how judgments are made. This may confuse, especially when consequences relate to human lives or healthcare applications. InterpretML meets these demands. InterpretML supports both interpretable glassbox models and noninterpretable black box models. There are two basic categories of interpretability: global and local. The purpose of this research is to use the ML model- Explainable Boosting Machine under InterpreteML API to predict lung cancer at an early stage, highlighting both accuracy and transparency.

MRI dataset is taken for the prediction of Alzheimer's disease by the Authors [5]. After dealing with missing values and categorical data, Chi-square and L1 regularization are used to choose features. The authors demonstrated that both strategies can generate superior outcomes. The suggested model is compared against ResNet-50, RF Classifier, Deep CNN, and VGG16-LIME. Overall, EBM achieves 94.35% accuracy.

3. LUNG CANCER DATASET DESCRIPTION AND PREPROCESSING

2. LITERATURE SURVEY Various researchers have applied machine learning to predict lung cancer. Many researchers have not employed the interpretability technique with an Explainable Boosting machine. Here, I evaluated a variety of lung cancer prediction research publications and interpretable machinelearning research papers.

Impact Factor value: 8.315

The data was obtained from the Kaggle website. The dataset has 309 instances and 16 attributes such as Gender, Age, Smoking, Yellow fingers, Anxiety, Peer pressure, Chronic Disease, Fatigue, Allergy, Wheezing, Alcohol, Coughing, Lung Cancer, Chest pain, Swallowing Difficulty, Shortness of

ISO 9001:2008 Certified Journal

Page 642