Search CORE

2,222 research outputs found

Towards a Better Microcredit Decision

Author: Song Mengnan
Su Suisui
Wang Jiasong
Publication venue
Publication date: 23/08/2022
Field of study

Reject inference comprises techniques to infer the possible repayment behavior of rejected cases. In this paper, we model credit in a brand new view by capturing the sequential pattern of interactions among multiple stages of loan business to make better use of the underlying causal relationship. Specifically, we first define 3 stages with sequential dependence throughout the loan process including credit granting(AR), withdrawal application(WS) and repayment commitment(GB) and integrate them into a multi-task architecture. Inside stages, an intra-stage multi-task classification is built to meet different business goals. Then we design an Information Corridor to express sequential dependence, leveraging the interaction information between customer and platform from former stages via a hierarchical attention module controlling the content and size of the information channel. In addition, semi-supervised loss is introduced to deal with the unobserved instances. The proposed multi-stage interaction sequence(MSIS) method is simple yet effective and experimental results on a real data set from a top loan platform in China show the ability to remedy the population bias and improve model generalization ability

arXiv.org e-Print Archive

Default Predictors and Credit Scoring Models for Retail Banking

Author: Evžen Kocenda
Martin Vojtek
Publication venue
Publication date
Field of study

This paper develops a specification of the credit scoring model with high discriminatory power to analyze data on loans at the retail banking market. Parametric and non- parametric approaches are employed to produce three models using logistic regression (parametric) and one model using Classification and Regression Trees (CART, nonparametric). The models are compared in terms of efficiency and power to discriminate between low and high risk clients by employing data from a new European Union economy. We are able to detect the most important characteristics of default behavior: the amount of resources the client has, the level of education, marital status, the purpose of the loan, and the number of years the client has had an account with the bank. Both methods are robust: they found similar variables as determinants. We therefore show that parametric as well as non-parametric methods can produce successful models. We are able to obtain similar results even when excluding a key financial variable (amount of own resources). The policy conclusion is that socio-demographic variables are important in the process of granting credit and therefore such variables should not be excluded from credit scoring model specification.credit scoring, discrimination analysis, banking sector, pattern recognition, retail loans, CART, European Union

Research Papers in Economics

Generic machine learning inference on heterogenous treatment effects in randomized experiments

Author: Chernozhukov Victor
Demirer Mert
Duflo Esther
Fernández‐Val Iván
Publication venue
Publication date: 01/06/2018
Field of study

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. Our approach is agnostic and does not make unrealistic or hard-to-check assumptions; we don’t require conditions for consistency of the ML methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. The inference method could be of substantial independent interest in many machine learning applications. An empirical application to the impact of micro-credit on economic development illustrates the use of the approach in randomized experiments. An additional application to the impact of the gender discrimination on wages illustrates the potential use of the approach in observational studies, where machine learning methods can be used to condition flexibly on very high-dimensional controls.https://arxiv.org/abs/1712.04802First author draf

Boston University Institutional Repository (OpenBU)

Fairness in Credit Scoring: Assessment, Implementation and Profit Implications

Author: Jacob Johannes
Kozodoi Nikita
Lessmann Stefan
Publication venue
Publication date: 04/03/2021
Field of study

The rise of algorithmic decision-making has spawned much research on fair machine learning (ML). Financial institutions use ML for building risk scorecards that support a range of credit-related decisions. Yet, the literature on fair ML in credit scoring is scarce. The paper makes two contributions. First, we provide a systematic overview of algorithmic options for incorporating fairness goals in the ML model development pipeline. In this scope, we also consolidate the space of statistical fairness criteria and examine their adequacy for credit scoring. Second, we perform an empirical study of different fairness processors in a profit-oriented credit scoring setup using seven real-world data sets. The empirical results substantiate the evaluation of fairness measures, identify more and less suitable options to implement fair credit scoring, and clarify the profit-fairness trade-off in lending decisions. Specifically, we find that multiple fairness criteria can be approximately satisfied at once and identify separation as a proper criterion for measuring the fairness of a scorecard. We also find fair in-processors to deliver a good balance between profit and fairness. More generally, we show that algorithmic discrimination can be reduced to a reasonable level at a relatively low cost.Comment: Preprint submitted to European Journal of Operational Researc

arXiv.org e-Print Archive

Differential Replication for Credit Scoring in Regulated Environments

Author: Nin Jordi
Pujol Vila Oriol
Unceta Irene
Publication venue: 'MDPI AG'
Publication date: 14/10/2021
Field of study

Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solution

Diposit Digital de la Universitat de Barcelona

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem

Author: Foerster Jakob
Gal Elena
Lyons Terry
Pacchiano Aldo
Singh Shaun
Walker Ben
Publication venue
Publication date: 15/08/2023
Field of study

In many real world settings binary classification decisions are made based on limited data in near real-time, e.g. when assessing a loan application. We focus on a class of these problems that share a common feature: the true label is only observed when a data point is assigned a positive label by the principal, e.g. we only find out whether an applicant defaults if we accepted their loan application. As a consequence, the false rejections become self-reinforcing and cause the labelled training set, that is being continuously updated by the model decisions, to accumulate bias. Prior work mitigates this effect by injecting optimism into the model, however this comes at the cost of increased false acceptance rate. We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation. The goal of AdOpt is to learn an unbiased but informative representation of past data, by reducing the distributional shift between the set of accepted data points and all data points seen thus far. AdOpt significantly exceeds state-of-the-art performance on a set of challenging benchmark problems. Our experiments also provide initial evidence that the introduction of adversarial domain adaptation improves fairness in this setting

arXiv.org e-Print Archive

Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

Author: Chernozhukov Victor
Demirer Mert
Duflo Esther
Fernández-Val Iván
Publication venue
Publication date: 03/09/2019
Field of study

We propose strategies to estimate and make inference on key features of heterogeneous effects in randomized experiments. These key features include best linear predictors of the effects using machine learning proxies, average effects sorted by impact groups, and average characteristics of most and least impacted units. The approach is valid in high dimensional settings, where the effects are proxied by machine learning methods. We post-process these proxies into the estimates of the key features. Our approach is generic, it can be used in conjunction with penalized methods, deep and shallow neural networks, canonical and new random forests, boosted trees, and ensemble methods. It does not rely on strong assumptions. In particular, we don't require conditions for consistency of the machine learning methods. Estimation and inference relies on repeated data splitting to avoid overfitting and achieve validity. For inference, we take medians of p-values and medians of confidence intervals, resulting from many different data splits, and then adjust their nominal level to guarantee uniform validity. This variational inference method is shown to be uniformly valid and quantifies the uncertainty coming from both parameter estimation and data splitting. We illustrate the use of the approach with two randomized experiments in development on the effects of microcredit and nudges to stimulate immunization demand.Comment: 53 pages, 6 figures, 15 table

arXiv.org e-Print Archive