283 research outputs found
Deep Generative Models for Reject Inference in Credit Scoring
Credit scoring models based on accepted applications may be biased and their
consequences can have a statistical and economic impact. Reject inference is
the process of attempting to infer the creditworthiness status of the rejected
applications. In this research, we use deep generative models to develop two
new semi-supervised Bayesian models for reject inference in credit scoring, in
which we model the data generating process to be dependent on a Gaussian
mixture. The goal is to improve the classification accuracy in credit scoring
models by adding reject applications. Our proposed models infer the unknown
creditworthiness of the rejected applications by exact enumeration of the two
possible outcomes of the loan (default or non-default). The efficient
stochastic gradient optimization technique used in deep generative models makes
our models suitable for large data sets. Finally, the experiments in this
research show that our proposed models perform better than classical and
alternative machine learning models for reject inference in credit scoring
Towards a Better Microcredit Decision
Reject inference comprises techniques to infer the possible repayment
behavior of rejected cases. In this paper, we model credit in a brand new view
by capturing the sequential pattern of interactions among multiple stages of
loan business to make better use of the underlying causal relationship.
Specifically, we first define 3 stages with sequential dependence throughout
the loan process including credit granting(AR), withdrawal application(WS) and
repayment commitment(GB) and integrate them into a multi-task architecture.
Inside stages, an intra-stage multi-task classification is built to meet
different business goals. Then we design an Information Corridor to express
sequential dependence, leveraging the interaction information between customer
and platform from former stages via a hierarchical attention module controlling
the content and size of the information channel. In addition, semi-supervised
loss is introduced to deal with the unobserved instances. The proposed
multi-stage interaction sequence(MSIS) method is simple yet effective and
experimental results on a real data set from a top loan platform in China show
the ability to remedy the population bias and improve model generalization
ability
Low-Default Portfolio/One-Class Classification: A Literature Review
Consider a bank which wishes to decide whether a credit applicant will obtain credit or not. The bank has to assess if the applicant will be able to redeem the credit. This is done by estimating the probability that the applicant will default prior to the maturity of the credit. To estimate this probability of default it is first necessary to identify criteria which separate the good from the bad creditors, such as loan amount and age or factors concerning the income of the applicant. The question then arises of how a bank identifies a sufficient number of selective criteria that possess the necessary discriminatory power. As a solution, many traditional binary classification methods have been proposed with varying degrees of success. However, a particular problem with credit scoring is that defaults are only observed for a small subsample of applicants. An imbalance exists between the ratio of non-defaulters to defaulters. This has an adverse effect on the aforementioned binary classification method. Recently one-class classification approaches have been proposed to address the imbalance problem. The purpose of this literature review is three fold: (I) present the reader with an overview of credit scoring; (ii) review existing binary classification approaches; and (iii) introduce and examine one-class classification approaches
SUPPORT OF MANAGERIAL DECISION MAKING BY TRANSDUCTIVE LEARNING
Transductive inference has been introduced as a novelparadigm towards building predictive classi¯cation modelsfrom empirical data. Such models are routinely employedto support decision making in, e.g., marketing, risk manage-ment and manufacturing. To that end, the characteristics ofthe new philosophy are reviewed and their implications fortypical decision problems are examined. The paper\u27s objec-tive is to explore the potential of transductive learning forcorporate planning. The analysis reveals two main factorsthat govern the applicability of transduction in business set-tings, decision scope and urgency. In a similar fashion, twomajor drivers for its e®ectiveness are identi¯ed and empir-ical experiments are undertaken to con¯rm their in°uence.The results evidence that transductive classi¯ers are wellsuperior to their inductive counterparts if their speci¯c ap-plication requirements are ful¯lled
Credit Scoring Using Machine Learning
For financial institutions and the economy at large, the role of credit scoring in lending decisions cannot be overemphasised. An accurate and well-performing credit scorecard allows lenders to control their risk exposure through the selective allocation of credit based on the statistical analysis of historical customer data. This thesis identifies and investigates a number of specific challenges that occur during the development of credit scorecards. Four main contributions are made in this thesis. First, we examine the performance of a number supervised classification techniques on a collection of imbalanced credit scoring datasets. Class imbalance occurs when there are significantly fewer examples in one or more classes in a dataset compared to the remaining classes. We demonstrate that oversampling the minority class leads to no overall improvement to the best performing classifiers. We find that, in contrast, adjusting the threshold on classifier output yields, in many cases, an improvement in classification performance. Our second contribution investigates a particularly severe form of class imbalance, which, in credit scoring, is referred to as the low-default portfolio problem. To address this issue, we compare the performance of a number of semi-supervised classification algorithms with that of logistic regression. Based on the detailed comparison of classifier performance, we conclude that both approaches merit consideration when dealing with low-default portfolios. Third, we quantify the differences in classifier performance arising from various implementations of a real-world behavioural scoring dataset. Due to commercial sensitivities surrounding the use of behavioural scoring data, very few empirical studies which directly address this topic are published. This thesis describes the quantitative comparison of a range of dataset parameters impacting classification performance, including: (i) varying durations of historical customer behaviour for model training; (ii) different lengths of time from which a borrower’s class label is defined; and (iii) using alternative approaches to define a customer’s default status in behavioural scoring. Finally, this thesis demonstrates how artificial data may be used to overcome the difficulties associated with obtaining and using real-world data. The limitations of artificial data, in terms of its usefulness in evaluating classification performance, are also highlighted. In this work, we are interested in generating artificial data, for credit scoring, in the absence of any available real-world data
Reject Inference Methods in Credit Scoring
International audienceThe granting process of all credit institutions is based on the probability that the applicant will refund his/her loan given his/her characteristics. This probability also called score is learnt based on a dataset in which rejected applicants are de facto excluded. This implies that the population on which the score is used will be different from the learning population. Thus, this biased learning can have consequences on the scorecard's relevance. Many methods dubbed "reject inference" have been developed in order to try to exploit the data available from the rejected applicants to build the score. However most of these methods are considered from an empirical point of view, and there is some lack of formalization of the assumptions that are really made, and of the theoretical properties that can be expected. In order to propose a formalization of such usually hidden assumptions for some of the most common reject inference methods, we rely on the general missing data modelling paradigm. It reveals that hidden modelling is mostly incomplete, thus prohibiting to compare existing methods within the general model selection mechanism (except by financing "non-fundable" applicants, which is rarely performed in practice). So, we are reduced to empirically assess performance of the methods in some controlled situations involving both some simulated data and some real data (from Crédit Agricole Consumer Finance (CACF), a major European loan issuer). Unsurprisingly, no method seems uniformly dominant. Both these theoretical and empirical results not only reinforce the idea to carefully use the classical reject inference methods but also to invest in future research works for designing model-based reject inference methods, which allow rigorous selection methods (without financing "non-fundable" applicants)
- …