Supervised and unsupervised data mining approaches in loan default prediction

Abstract

Given the paramount importance of data mining in organizations and the possible contribution of a data-driven customer classification recommender systems for loan-extending financial institutions, the study applied supervised and supervised data mining approaches to derive the best classifier of loan default. A total of 900 instances with determined attributes and class labels were used for the training and cross-validation processes while prediction used 100 new instances without class labels. In the training phase, J48 with confidence factor of 50% attained the highest classification accuracy (76.85%), k-nearest neighbors (k-NN) 3 the highest (78.38%) in IBk variants, naïve Bayes has a classification accuracy of 76.65%, and logistic has 77.31% classification accuracy. k-NN 3 and logistic have the highest classification accuracy, F-measures, and kappa statistics. Implementation of these algorithms to the test set yielded 48 non-defaulters and 52 defaulters for k -NN 3 while 44 non-defaulters and 56 defaulters under logistic. Implications were discussed in the paper

    Similar works