1,310 research outputs found

    Credit risk prediction in an imbalanced social lending environment

    Full text link
    © 2018, the Authors. Credit risk prediction is an effective way of evaluating whether a potential borrower will repay a loan, particularly in peer-to-peer lending where class imbalance problems are prevalent. However, few credit risk prediction models for social lending consider imbalanced data and, further, the best resampling technique to use with imbalanced data is still controversial. In an attempt to address these problems, this paper presents an empirical comparison of various combinations of classifiers and resampling techniques within a novel risk assessment methodology that incorporates imbalanced data. The credit predictions from each combination are evaluated with a G-mean measure to avoid bias towards the majority class, which has not been considered in similar studies. The results reveal that combining random forest and random under-sampling may be an effective strategy for calculating the credit risk associated with loan applicants in social lending markets

    Handling Uncertainty in Social Lending Credit Risk Prediction with a Choquet Fuzzy Integral Model

    Full text link
    As one of the main business models in the financial technology field, peer-to-peer (P2P) lending has disrupted traditional financial services by providing an online platform for lending money that has remarkably reduced financial costs. However, the inherent uncertainty in P2P loans can result in huge financial losses for P2P platforms. Therefore, accurate risk prediction is critical to the success of P2P lending platforms. Indeed, even a small improvement in credit risk prediction would be of benefit to P2P lending platforms. This paper proposes an innovative credit risk prediction framework that fuses base classifiers based on a Choquet fuzzy integral. Choquet integral fusion improves creditworthiness evaluations by synthesizing the prediction results of multiple classifiers and finding the largest consistency between outcomes among conflicting and consistent results. The proposed model was validated through experimental analysis on a real- world dataset from a well-known P2P lending marketplace. The empirical results indicate that the combination of multiple classifiers based on fuzzy Choquet integrals outperforms the best base classifiers used in credit risk prediction to date. In addition, the proposed methodology is superior to some conventional combination techniques

    Bankruptcy prediction model using cost-sensitive extreme gradient boosting in the context of imbalanced datasets

    Get PDF
    In the process of bankruptcy prediction models, a class imbalanced problem has occurred which limits the performance of the models. Most prior research addressed the problem by applying resampling methods such as the synthetic minority oversampling technique (SMOTE). However, resampling methods lead to other issues, e.g., increasing noisy data and training time during the process. To improve the bankruptcy prediction model, we propose cost-sensitive extreme gradient boosting (CS-XGB) to address the class imbalanced problem without requiring any resampling method. The proposed method’s effectiveness is evaluated on six real-world datasets, i.e., the LendingClub, and five Polish companies’ bankruptcy. This research compares the performance of CS-XGB with other ensemble methods, including SMOTE-XGB which applies SMOTE to the training set before the learning process. The experimental results show that i) based on LendingClub, the CS-XGB improves the performance of XGBoost and SMOTE-XGB by more than 50% and 33% on bankruptcy detection rate (BDR) and geometric mean (GM), respectively, and ii) the CS-XGB model outperforms random forest (RF), Bagging, AdaBoost, XGBoost, and SMOTE-XGB in terms of BDR, GM, and the area under a receiver operating characteristic curve (AUC) based on the five Polish datasets. Besides, the CS-XGB model achieves good overall prediction results

    Feature selection in credit risk modeling: an international evidence

    Get PDF
    This paper aims to discover a suitable combination of contemporary feature selection techniques and robust prediction classifiers. As such, to examine the impact of the feature selection method on classifier performance, we use two Chinese and three other real-world credit scoring datasets. The utilized feature selection methods are the least absolute shrinkage and selection operator (LASSO), multivariate adaptive regression splines (MARS). In contrast, the examined classifiers are the classification and regression trees (CART), logistic regression (LR), artificial neural network (ANN), and support vector machines (SVM). Empirical findings confirm that LASSO’s feature selection method, followed by robust classifier SVM, demonstrates remarkable improvement and outperforms other competitive classifiers. Moreover, ANN also offers improved accuracy with feature selection methods; LR only can improve classification efficiency through performing feature selection via LASSO. Nonetheless, CART does not provide any indication of improvement in any combination. The proposed credit scoring modeling strategy may use to develop policy, progressive ideas, operational guidelines for effective credit risk management of lending, and other financial institutions. The finding of this study has practical value, as to date, there is no consensus about the combination of feature selection method and prediction classifiers

    Credit Risk Scoring: A Stacking Generalization Approach

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Risk Analysis and ManagementCredit risk regulation has been receiving tremendous attention, as a result of the effects of the latest global financial crisis. According to the developments made in the Internal Rating Based approach, under the Basel guidelines, banks are allowed to use internal risk measures as key drivers to assess the possibility to grant a loan to an applicant. Credit scoring is a statistical approach used for evaluating potential loan applications in both financial and banking institutions. When applying for a loan, an applicant must fill out an application form detailing its characteristics (e.g., income, marital status, and loan purpose) that will serve as contributions to a credit scoring model which produces a score that is used to determine whether a loan should be granted or not. This enables faster and consistent credit approvals and the reduction of bad debt. Currently, many machine learning and statistical approaches such as logistic regression and tree-based algorithms have been used individually for credit scoring models. Newer contemporary machine learning techniques can outperform classic methods by simply combining models. This dissertation intends to be an empirical study on a publicly available bank loan dataset to study banking loan default, using ensemble-based techniques to increase model robustness and predictive power. The proposed ensemble method is based on stacking generalization an extension of various preceding studies that used different techniques to further enhance the model predictive capabilities. The results show that combining different models provides a great deal of flexibility to credit scoring models

    Basel II compliant credit risk modelling: model development for imbalanced credit scoring data sets, loss given default (LGD) and exposure at default (EAD)

    No full text
    The purpose of this thesis is to determine and to better inform industry practitioners to the most appropriate classification and regression techniques for modelling the three key credit risk components of the Basel II minimum capital requirement; probability of default (PD), loss given default (LGD), and exposure at default (EAD). The Basel II accord regulates risk and capital management requirements to ensure that a bank holds enough capital proportional to the exposed risk of its lending practices. Under the advanced internal ratings based (IRB) approach Basel II allows banks to develop their own empirical models based on historical data for each of PD, LGD and EAD.In this thesis, first the issue of imbalanced credit scoring data sets, a special case of PD modelling where the number of defaulting observations in a data set is much lower than the number of observations that do not default, is identified, and the suitability of various classification techniques are analysed and presented. As well as using traditional classification techniques this thesis also explores the suitability of gradient boosting, least square support vector machines and random forests as a form of classification. The second part of this thesis focuses on the prediction of LGD, which measures the economic loss, expressed as a percentage of the exposure, in case of default. In this thesis, various state-of-the-art regression techniques to model LGD are considered. In the final part of this thesis we investigate models for predicting the exposure at default (EAD). For off-balance-sheet items (for example credit cards) to calculate the EAD one requires the committed but unused loan amount times a credit conversion factor (CCF). Ordinary least squares (OLS), logistic and cumulative logistic regression models are analysed, as well as an OLS with Beta transformation model, with the main aim of finding the most robust and comprehensible model for the prediction of the CCF. Also a direct estimation of EAD, using an OLS model, will be analysed. All the models built and presented in this thesis have been applied to real-life data sets from major global banking institutions
    corecore