A Survey on Loan Default Prediction using Machine Learning Techniques by IRJET Journal

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 11 Issue: 11 | Nov 2024

p-ISSN: 2395-0072

www.irjet.net

A Survey on Loan Default Prediction using Machine Learning Techniques Adwait Mandge1, Rohan Fatehchandka2, Kunal Goudani3, Tanaya Shelke4, Prof. Pramila M. Chawan5 11B. Tech Student, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India 21B. Tech Student, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India

31B. Tech Student, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India 41B. Tech Student, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India

5Associate Professor, Dept of Computer Engineering, and IT, VJTI College, Mumbai, Maharashtra, India

---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Loan default prediction is a critical challenge

defaults with greater precision but also significantly reduce the resources required for credit assessments.

in the banking sector, where inaccurate assessments can lead to significant financial losses. Recent advances in machine learning, particularly in ensemble methods and deep learning, present valuable opportunities for automating and improving loan default predictions. This paper presents a novel Weighted Voting Ensemble approach by training multiple models—Random Forest, XGBoost, and Neural Networks—and assigning weights to their predictions based on performance metrics. Neural networks capture complex, high-dimensional patterns, while traditional models like Random Forest and XGBoost handle simpler but crucial features. This hybrid method enhances prediction robustness, optimizing loan default predictions by leveraging the strengths of each model. Additionally, data imbalance is addressed using SMOTE to improve model performance.

This study aims to examine the potency of several algorithms in predicting loan defaults. Using a dataset sourced from Kaggle, we compare the performance of individual and ensemble models, with a particular focus on addressing the challenge of data imbalance. The objective is to determine which model provides the highest accuracy and reliability for lenders, thereby helping in making informed decisions and loss prevention.

2. LITERATURE REVIEW 2.1 Machine Learning Algorithms Logistic Regression calculates the probability that a given input is a member of a specific class. The output is a score in the range [0,1] which is then threshold-based to produce a binary forecast (yes/no, spam/not spam, etc.).

Key Words: Machine Learning, Deep Learning, Ensemble, Loan Default Prediction

Naive Bayes is a classification algorithm that supposes that attributes are conditionally independent of the given class label provided,derived from Bayes' Theorem and works well even when this assumption is violated in practice, especially with high-dimensional data. The advantages of this algorithm are its speed, simplicity, and effectiveness, particularly for text classification and large datasets.

1.INTRODUCTION In today’s financial landscape, loan default is a significant risk that banks and lending institutions constantly face. The challenge of ensuring loan repayment while minimizing default rates is crucial for maintaining profitability and operational stability. Traditionally, credit assessments were performed manually, requiring extensive human intervention and time, which often resulted in inefficiencies and inaccuracies. As the volume of loan applications increased, it became evident that manual methods could no longer meet the needs of modern financial systems, particularly in dealing with large datasets.

Decision Tree is a rule based technique used for classification and regression of data points.. The data points are divided based on feature values,each node represents a test on attributes, and each leaf node represents a final outcome. The algorithm offers ease of interpretation, supports both categorical and numerical data, and does not require data scaling. It is capable of incorporating missing values and provides seamless visualization, providing insights into feature importance.

The prevalence of Machine Learning has made it easier for banks to access advanced technologies that can automate and improve credit risk management. Machine learning models are capable of analyzing data in bulk and identify and recognize structures that are perplexing for humans to detect through manual analysis. By leveraging these models, financial institutions can not only predict loan

Impact Factor value: 8.315

Random Forest learning technique is used to solve various problems of regression and classification problems. It builds several trees during training and aggregates the outcomes to increase prediction accuracy.

ISO 9001:2008 Certified Journal

Page 145