Predicting the Loan Default using Logistic Regression Model

Abstract

Under the direction of Dr. Giancarlo Schrementi Predicting loan default is an important problem for managing risk at banks. Banks began to emerge as key players in the lending market since industrial revolution and they would use the collateral to minimize their risk. Loan default is an important issue for banks because it can lead to bank’s insolvency and can have broader impact on economy. Hence, it is important to manage the risk of loan defaults to promote financial stability and economic growth. Previous studies have been done in this field to predict the probability of loan default using logistic regression, machine learning models and python programming models. This study examines using a logistic regression model in predicting the probability of loan default of customers. A logistic regression is a statistical analysis method to predict a binary outcome, such as yes or no, based on prior observations of a data set. Thus, a logistic regression model is used here because loan default is a binary prediction problem i.e. (a loan is defaulted or not) and logistic regression is commonly used in binary prediction. The dataset is taken from the Kaggle dataset repository, an open dataset platform, and contains a wide assortment of features, half of them being categorical and half being quantitative. The data has highly unbalanced class proportions, as most customers do not default. The methods include exploratory data analysis, data wrangling and cleansing, feature selection, and evaluating the resulting model

    Similar works

    Full text

    thumbnail-image

    Available Versions