48,202 research outputs found

    Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets

    Get PDF
    For the emerging peer-to-peer (P2P) lending markets to survive, they need to employ credit-risk management practices such that an investor base is profitable in the long run. Traditionally, credit-risk management relies on credit scoring that predicts loans’ probability of default. In this paper, we use a profit scoring approach that is based on modeling the annualized adjusted internal rate of returns of loans. To validate our profit scoring models with traditional credit scoring models, we use data from a European P2P lending market, Bondora, and also a random sample of loans from the Lending Club P2P lending market. We compare the out-of-sample accuracy and profitability of the credit and profit scoring models within several classes of statistical and machine learning models including the following: logistic and linear regression, lasso, ridge, elastic net, random forest, and neural networks. We found that our approach outperforms standard credit scoring models for Lending Club and Bondora loans. More specifically, as opposed to credit scoring models, returns across all loans are 24.0% (Bondora) and 15.5% (Lending Club) higher, whereas accuracy is 6.7% (Bondora) and 3.1% (Lending Club) higher for the proposed profit scoring models. Moreover, our results are not driven by manual selection as profit scoring models suggest investing in more loans. Finally, even if we consider data sampling bias, we found that the set of superior models consists almost exclusively of profit scoring models. Thus, our results contribute to the literature by suggesting a paradigm shift in modeling credit-risk in the P2P market to prefer profit as opposed to credit-risk scoring models

    Effect of training set selection when predicting defaulter SMEs with unbalanced data

    No full text
    We focus on credit scoring methods to separate defaulter small and medium enterprises from non-defaulter ones. In this framework, a typical problem occurs because the proportion of defaulter firms is very close to zero, leading to a class imbalance problem. Moreover, a form of bias may affect the classification. In fact, classification models are usually based on balance sheet items of large corporations which are not randomly selected. We investigate how different criteria of sample selection may affect the accuracy of the classification and how this problem is strongly related to the imbalance of the classes

    Deep Generative Models for Reject Inference in Credit Scoring

    Get PDF
    Credit scoring models based on accepted applications may be biased and their consequences can have a statistical and economic impact. Reject inference is the process of attempting to infer the creditworthiness status of the rejected applications. In this research, we use deep generative models to develop two new semi-supervised Bayesian models for reject inference in credit scoring, in which we model the data generating process to be dependent on a Gaussian mixture. The goal is to improve the classification accuracy in credit scoring models by adding reject applications. Our proposed models infer the unknown creditworthiness of the rejected applications by exact enumeration of the two possible outcomes of the loan (default or non-default). The efficient stochastic gradient optimization technique used in deep generative models makes our models suitable for large data sets. Finally, the experiments in this research show that our proposed models perform better than classical and alternative machine learning models for reject inference in credit scoring

    Reject Inference, Augmentation and Sample Selection

    Get PDF

    Pre-Purchase Counseling Impacts on Mortgage Performance: Empirical Analysis of NeighborWorks America's Experience

    Get PDF
    NeighborWorks America's (NeighborWorks) nationwide network of affiliates offers pre-purchase homebuyer counseling and education throughout the country. Using information on about 75,000 loans originated between October 2007 and September 2009, Neil Mayer and Associates together with Experian analyzed the impact of pre-purchase counseling and education, provided by NeighborWorks' network, on the performance of counseled borrowers' mortgages. It compares mortgage performance for counseled buyers over two years after the mortgages are originated, compared to mortgage performance of borrowers who receive no such services.The study's findings show that pre-purchase counseling and education works: clients receiving pre-purchase counseling and education from NeighborWorks organizations are one-third less likely to become 90+ days delinquent over the two years after receiving their loan than are borrowers who do not receiving pre-purchase counseling from NeighborWorks organizations. The finding is consistent across years of loan origin, even as the mortgage market changed in a period of financial crisis. It applies equally to first-time homebuyers and repeat buyers

    APPLICATION OF RECURSIVE PARTITIONING TO AGRICULTURAL CREDIT SCORING

    Get PDF
    Recursive Partitioning Algorithm (RPA) is introduced as a technique for credit scoring analysis, which allows direct incorporation of misclassification costs. This study corroborates nonagricultural credit studies, which indicate that RPA outperforms logistic regression based on within-sample observations. However, validation based on more appropriate out-of-sample observations indicates that logistic regression is superior under some conditions. Incorporation of misclassification costs can influence the creditworthiness decision.finance, credit scoring, misclassification, recursive partitioning algorithm, Agricultural Finance,

    Does segmentation always improve model performance in credit scoring?

    No full text
    Credit scoring allows for the credit risk assessment of bank customers. A single scoring model (scorecard) can be developed for the entire customer population, e.g. using logistic regression. However, it is often expected that segmentation, i.e. dividing the population into several groups and building separate scorecards for them, will improve the model performance. The most common statistical methods for segmentation are the two-step approaches, where logistic regression follows Classification and Regression Trees (CART) or Chi-squared Automatic Interaction Detection (CHAID) trees etc. In this research, the two-step approaches are applied as well as a new, simultaneous method, in which both segmentation and scorecards are optimised at the same time: Logistic Trees with Unbiased Selection (LOTUS). For reference purposes, a single-scorecard model is used. The above-mentioned methods are applied to the data provided by two of the major UK banks and one of the European credit bureaus. The model performance measures are then compared to examine whether there is improvement due to the segmentation methods used. It is found that segmentation does not always improve model performance in credit scoring: for none of the analysed real-world datasets, the multi-scorecard models perform considerably better than the single-scorecard ones. Moreover, in this application, there is no difference in performance between the two-step and simultaneous approache

    Learning From Labeled And Unlabeled Data: An Empirical Study Across Techniques And Domains

    Full text link
    There has been increased interest in devising learning techniques that combine unlabeled data with labeled data ? i.e. semi-supervised learning. However, to the best of our knowledge, no study has been performed across various techniques and different types and amounts of labeled and unlabeled data. Moreover, most of the published work on semi-supervised learning techniques assumes that the labeled and unlabeled data come from the same distribution. It is possible for the labeling process to be associated with a selection bias such that the distributions of data points in the labeled and unlabeled sets are different. Not correcting for such bias can result in biased function approximation with potentially poor performance. In this paper, we present an empirical study of various semi-supervised learning techniques on a variety of datasets. We attempt to answer various questions such as the effect of independence or relevance amongst features, the effect of the size of the labeled and unlabeled sets and the effect of noise. We also investigate the impact of sample-selection bias on the semi-supervised learning techniques under study and implement a bivariate probit technique particularly designed to correct for such bias

    Does reject inference really improve the performance of application scoring models?

    Get PDF
    The voter model on Zd is a particle system that serves as a rough model for changes of opinions among social agents or, alternatively, competition between biological species occupying space. When d≄3, the set of (extremal) stationary distributions is a family of measures Όα, for α between 0 and 1. A configuration sampled from Όα is a strongly correlated field of 0's and 1's on Zd in which the density of 1's is α. We consider such a configuration as a site percolation model on Zd. We prove that if d≄5, the probability of existence of an infinite percolation cluster of 1's exhibits a phase transition in α. If the voter model is allowed to have sufficiently spread-out interactions, we prove the same result for d≄3
    • 

    corecore