2,708 research outputs found
Loan Default Prediction: A Complete Revision of LendingClub
Predicción del default: Una revisión completa de LendingClub El objetivo del estudio es determinar un modelo de predicción de default crediticio usando la base de datos de LendingClub. La metodología consiste en estimar las variables que influyen en el proceso de predicción de préstamos pagados y no pagados utilizando el algoritmo Random Forest. El algoritmo define los factores con mayor influencia sobre el pago o el impago, generando un modelo reducido a nueve predictores relacionados con el historial crediticio del prestatario y el historial de pagos dentro de la plataforma. La medición del desempeño del modelo genera un resultado F1 Macro Score con una precisión mayor al 90% de la muestra de evaluación. Las contribuciones de este estudio incluyen, el haber utilizado la base de datos completa de toda la operación de LendingClub disponible, para obtener variables trascendentales para la tarea de clasificación y predicción, que pueden ser útiles para estimar la morosidad en el mercado de préstamos de persona a persona. Podemos sacar dos conclusiones importantes, primero confirmamos la capacidad del algoritmo Random Forest para predecir problemas de clasificación binaria en base a métricas de rendimiento obtenidas y segundo, denotamos la influencia de las variables tradicionales de puntuación de crédito en los problemas de predicción por defecto.The study aims to determine a credit default prediction model using data from LendingClub. The model estimates the effect of the influential variables on the prediction process of paid and unpaid loans. We implemented the random forest algorithm to identify the variables with the most significant influence on payment or default, addressing nine predictors related to the borrower's credit and payment background. Results confirm that the model’s performance generates a F1 Macro Score that accomplishes 90% in accuracy for the evaluation sample. Contributions of this study include using the complete dataset of the entire operation of LendingClub available, to obtain transcendental variables for the classification and prediction task, which can be helpful to estimate the default in the person-to-person loan market. We can draw two important conclusions, first we confirm the Random Forest algorithm's capacity to predict binary classification problems based on performance metrics obtained and second, we denote the influence of traditional credit scoring variables on default prediction problems
How to deal with extreme cases for credit risk monitoring: a case study in a credit risk data science company
The Global Financial Crisis triggered a severe hold on credit lending due to the financial
institutions’ inability to assess credit applicants risk levels properly. Based on U.S. data from
Lending Club, we conducted a study to evaluate the consequences of including
macroeconomic risk factors in individual credit application observations. Through historical
scenario stress testing, we find that this approach results in an increase in performance for
credit scoring models developed in a stable economic cycle and applied to a recession. The
inclusion of macroeconomic indicators reveals potential for credit institutions to better absorb
shocks derived from economic downturns
Explainable Artificial Intelligence Methods in FinTech Applications
The increasing amount of available data and access to high-performance computing allows companies to use complex Machine Learning (ML) models for their decision-making process, so-called ”black-box” models. These ”black-box” models typically show higher predictive accuracy than linear models on complex data sets. However, this improved predictive accuracy can only be achieved by deteriorating the explanatory power. ”Open the black box” and make the model predictions explainable is summarised under the research area of Explainable Artificial Intelligence (XAI). Using black-box models also raises practical and ethical issues, especially in critical industries such as finance. For this reason, the explainability of models is increasingly becoming a focus for regulators. Applying XAI methods to ML models makes their predictions explainable and hence, enables the application of ML models in the financial industries. The application of ML models increases predictive accuracy and supports the different stakeholders in the financial industries in their decision-making processes.
This thesis consists of five chapters: a general introduction, a chapter on conclusions and future research, and three separate chapters covering the underlying papers. Chapter 1 proposes an XAI method that can be used in credit risk management, in particular, in measuring the risks associated with borrowing through peer-to-peer lending platforms. The model applies correlation networks to Shapley values and thus the model predictions are grouped according to the similarity of the underlying explanations. Chapter 2 develops an alternative XAI method based on the Lorenz Zonoid approach. The new method is statistically normalised and can therefore be used as a standard for the application of Artificial Intelligence (AI) in credit risk management. The novel ”Shapley-Lorenz”-approach can facilitate the validation of model results and supports the decision whether a model is sufficiently explained. In Chapter 3, an XAI method is applied to assess the impact of financial and non-financial factors on a firm’s ex-ante cost of capital, a measure that reflects investors’ perceptions of a firm’s risk appetite. A combination of two explanatory
tools: the Shapley values and the Lorenz model selection approach, enabled the identification of the most important features and the reduction of the independent features. This allowed a substantial simplification of the model without a statistically significant decrease in predictive accuracy.The increasing amount of available data and access to high-performance computing allows companies to use complex Machine Learning (ML) models for their decision-making process, so-called ”black-box” models. These ”black-box” models typically show higher predictive accuracy than linear models on complex data sets. However, this improved predictive accuracy can only be achieved by deteriorating the explanatory power. ”Open the black box” and make the model predictions explainable is summarised under the research area of Explainable Artificial Intelligence (XAI). Using black-box models also raises practical and ethical issues, especially in critical industries such as finance. For this reason, the explainability of models is increasingly becoming a focus for regulators. Applying XAI methods to ML models makes their predictions explainable and hence, enables the application of ML models in the financial industries. The application of ML models increases predictive accuracy and supports the different stakeholders in the financial industries in their decision-making processes.
This thesis consists of five chapters: a general introduction, a chapter on conclusions and future research, and three separate chapters covering the underlying papers. Chapter 1 proposes an XAI method that can be used in credit risk management, in particular, in measuring the risks associated with borrowing through peer-to-peer lending platforms. The model applies correlation networks to Shapley values and thus the model predictions are grouped according to the similarity of the underlying explanations. Chapter 2 develops an alternative XAI method based on the Lorenz Zonoid approach. The new method is statistically normalised and can therefore be used as a standard for the application of Artificial Intelligence (AI) in credit risk management. The novel ”Shapley-Lorenz”-approach can facilitate the validation of model results and supports the decision whether a model is sufficiently explained. In Chapter 3, an XAI method is applied to assess the impact of financial and non-financial factors on a firm’s ex-ante cost of capital, a measure that reflects investors’ perceptions of a firm’s risk appetite. A combination of two explanatory
tools: the Shapley values and the Lorenz model selection approach, enabled the identification of the most important features and the reduction of the independent features. This allowed a substantial simplification of the model without a statistically significant decrease in predictive accuracy
FinTech, RegTech and the role of alternative lending: an analysis of the P2P platform LendingClub
ope
Unwrapping black box models: a case study in credit risk
The past two decades have witnessed the rapid development of machine learning
techniques, which have proven to be powerful tools for the construction of predictive
models, such as those used in credit risk management. A considerable volume of
published work has looked at the utility of machine learning for this purpose, the
increased predictive capacities delivered and how new types of data can be
exploited. However, these benefits come at the cost of increased complexity, which
may render the models uninterpretable. To overcome this issue a new field has
emerged under the name of explainable artificial intelligence, with numerous tools
being proposed to gain an insight into the inner workings of these models. This type
of understanding is fundamental in credit risk in order to ensure compliance with the
existing regulatory requirements and to comprehend the factors driving the
predictions and their macro-economic implications. This paper studies the
effectiveness of some of the most widely-used interpretability techniques on a neural
network trained on real data. These techniques are found to be useful for
understanding the model, even though some limitations have been encountered.En las dos últimas décadas se ha observado un rápido desarrollo de las técnicas
de aprendizaje automático, que han demostrado ser herramientas muy potentes
para elaborar modelos de predicción, como los utilizados en la gestión del riesgo de
crédito. En un volumen considerable de trabajos publicados se analizan la utilidad del
aprendizaje automático para este fin, las mayores capacidades predictivas que
ofrece y la forma en la que se pueden explotar nuevos tipos de datos. Sin embargo,
estas ventajas llevan aparejada una mayor complejidad, que puede imposibilitar la
interpretación de los modelos. Para solventar este punto ha surgido un nuevo campo
de investigación, denominado «inteligencia artificial explicable» (del inglés explicable
artificial intelligence), en el que se proponen numerosas herramientas para obtener
información relativa al funcionamiento interno de estos modelos. Este tipo de
conocimiento es fundamental en materia de riesgo de crédito para garantizar que se
cumplen los requerimientos regulatorios existentes y para comprender los factores
determinantes de las predicciones y sus implicaciones macroeconómicas. En este
artículo se estudia la eficacia de algunas de las técnicas de interpretabilidad más
utilizadas en una red neuronal entrenada con datos reales. Estas técnicas se
consideran útiles para la comprensión del modelo, pese a que se han detectado
algunas limitaciones
A dynamic credit scoring model based on survival gradient boosting decision tree approach
Credit scoring, which is typically transformed into a classification problem, is a powerful tool to manage credit risk since it forecasts the probability of default (PD) of a loan application. However, there is a growing trend of integrating survival analysis into credit scoring to provide a dynamic prediction on PD over time and a clear explanation on censoring. A novel dynamic credit scoring model (i.e., SurvXGBoost) is proposed based on survival gradient boosting decision tree (GBDT) approach. Our proposal, which combines survival analysis and GBDT approach, is expected to enhance predictability relative to statistical survival models. The proposed method is compared with several common benchmark models on a real-world consumer loan dataset. The results of out-of-sample and out-of-time validation indicate that SurvXGBoost outperform the benchmarks in terms of predictability and misclassification cost. The incorporation of macroeconomic variables can further enhance performance of survival models. The proposed SurvXGBoost meanwhile maintains some interpretability since it provides information on feature importance.
First published online 14 December 202
Essays in industrial organization of Peer-to-Peer online credit markets
This dissertation consists of three separate essays on Peer-to-Peer (P2P) online credit markets. The first essay presents new empirical evidence of decreases in loan demand and repayment when prices in the market are determined by competing lenders in auctions as compared to the case in which a platform directly controls all prices. The paper develops an econometric model of loan demand and repayment which is then used to predict borrower choices when they are offered prices set by lenders in a market. I find that when lenders set prices, borrowers are more likely to pick loans of shorter maturity and smaller sizes, and repay less. Aggregated at the market level, demand and repayment of credit fall by 10% and 2%, respectively.
In the second paper, I quantify the effects of implementation of finer credit scoring on credit demand, defaults and repayment in the context of a large P2P online credit platform. I exploit an exogenous change in the platform's credit scoring policy where the centralized price setting rules ensure that the one-to-one relationship between credit scores and prices remains intact unlike in a traditional credit market where it is broken. The results show that a 1% increase in interest rate due to the implementation of finer credit scoring results in an average decrease of 0.29% in the requested loan amount, an average increase of 0.01 in the fraction of borrowers who default and an average increase of 0.02 in the fraction of loan repaid. These findings contribute to a better understanding of how a reduction in information asymmetry affects borrower choices in a credit market.
The third paper explores the main drivers behind the geographic expansion in demand for credit from P2P online platforms. It uses data from the two largest platforms in the United States to conduct an empirical analysis. By exploiting heterogeneity in local credit markets before the entry of P2P online platforms, the paper estimates the effect of local credit market conditions on demand for credit from P2P platforms. The paper uses a spatial autoregressive model for the main specification. We find that P2P consumer credit expanded more in counties with poor branch networks, lower concentration of banks, and lower leverage ratios
- …