2,708 research outputs found

    Loan Default Prediction: A Complete Revision of LendingClub

    Get PDF
    Predicción del default: Una revisión completa de LendingClub El objetivo del estudio es determinar un modelo de predicción de default crediticio usando la base de datos de LendingClub. La metodología consiste en estimar las variables que influyen en el proceso de predicción de préstamos pagados y no pagados utilizando el algoritmo Random Forest. El algoritmo define los factores con mayor influencia sobre el pago o el impago, generando un modelo reducido a nueve predictores relacionados con el historial crediticio del prestatario y el historial de pagos dentro de la plataforma. La medición del desempeño del modelo genera un resultado F1 Macro Score con una precisión mayor al 90% de la muestra de evaluación. Las contribuciones de este estudio incluyen, el haber utilizado la base de datos completa de toda la operación de LendingClub disponible, para obtener variables trascendentales para la tarea de clasificación y predicción, que pueden ser útiles para estimar la morosidad en el mercado de préstamos de persona a persona. Podemos sacar dos conclusiones importantes, primero confirmamos la capacidad del algoritmo Random Forest para predecir problemas de clasificación binaria en base a métricas de rendimiento obtenidas y segundo, denotamos la influencia de las variables tradicionales de puntuación de crédito en los problemas de predicción por defecto.The study aims to determine a credit default prediction model using data from LendingClub. The model estimates the effect of the influential variables on the prediction process of paid and unpaid loans. We implemented the random forest algorithm to identify the variables with the most significant influence on payment or default, addressing nine predictors related to the borrower's credit and payment background. Results confirm that the model’s performance generates a F1 Macro Score that accomplishes 90% in accuracy for the evaluation sample. Contributions of this study include using the complete dataset of the entire operation of LendingClub available, to obtain transcendental variables for the classification and prediction task, which can be helpful to estimate the default in the person-to-person loan market. We can draw two important conclusions, first we confirm the Random Forest algorithm's capacity to predict binary classification problems based on performance metrics obtained and second, we denote the influence of traditional credit scoring variables on default prediction problems

    How to deal with extreme cases for credit risk monitoring: a case study in a credit risk data science company

    Get PDF
    The Global Financial Crisis triggered a severe hold on credit lending due to the financial institutions’ inability to assess credit applicants risk levels properly. Based on U.S. data from Lending Club, we conducted a study to evaluate the consequences of including macroeconomic risk factors in individual credit application observations. Through historical scenario stress testing, we find that this approach results in an increase in performance for credit scoring models developed in a stable economic cycle and applied to a recession. The inclusion of macroeconomic indicators reveals potential for credit institutions to better absorb shocks derived from economic downturns

    Explainable Artificial Intelligence Methods in FinTech Applications

    Get PDF
    The increasing amount of available data and access to high-performance computing allows companies to use complex Machine Learning (ML) models for their decision-making process, so-called ”black-box” models. These ”black-box” models typically show higher predictive accuracy than linear models on complex data sets. However, this improved predictive accuracy can only be achieved by deteriorating the explanatory power. ”Open the black box” and make the model predictions explainable is summarised under the research area of Explainable Artificial Intelligence (XAI). Using black-box models also raises practical and ethical issues, especially in critical industries such as finance. For this reason, the explainability of models is increasingly becoming a focus for regulators. Applying XAI methods to ML models makes their predictions explainable and hence, enables the application of ML models in the financial industries. The application of ML models increases predictive accuracy and supports the different stakeholders in the financial industries in their decision-making processes. This thesis consists of five chapters: a general introduction, a chapter on conclusions and future research, and three separate chapters covering the underlying papers. Chapter 1 proposes an XAI method that can be used in credit risk management, in particular, in measuring the risks associated with borrowing through peer-to-peer lending platforms. The model applies correlation networks to Shapley values and thus the model predictions are grouped according to the similarity of the underlying explanations. Chapter 2 develops an alternative XAI method based on the Lorenz Zonoid approach. The new method is statistically normalised and can therefore be used as a standard for the application of Artificial Intelligence (AI) in credit risk management. The novel ”Shapley-Lorenz”-approach can facilitate the validation of model results and supports the decision whether a model is sufficiently explained. In Chapter 3, an XAI method is applied to assess the impact of financial and non-financial factors on a firm’s ex-ante cost of capital, a measure that reflects investors’ perceptions of a firm’s risk appetite. A combination of two explanatory tools: the Shapley values and the Lorenz model selection approach, enabled the identification of the most important features and the reduction of the independent features. This allowed a substantial simplification of the model without a statistically significant decrease in predictive accuracy.The increasing amount of available data and access to high-performance computing allows companies to use complex Machine Learning (ML) models for their decision-making process, so-called ”black-box” models. These ”black-box” models typically show higher predictive accuracy than linear models on complex data sets. However, this improved predictive accuracy can only be achieved by deteriorating the explanatory power. ”Open the black box” and make the model predictions explainable is summarised under the research area of Explainable Artificial Intelligence (XAI). Using black-box models also raises practical and ethical issues, especially in critical industries such as finance. For this reason, the explainability of models is increasingly becoming a focus for regulators. Applying XAI methods to ML models makes their predictions explainable and hence, enables the application of ML models in the financial industries. The application of ML models increases predictive accuracy and supports the different stakeholders in the financial industries in their decision-making processes. This thesis consists of five chapters: a general introduction, a chapter on conclusions and future research, and three separate chapters covering the underlying papers. Chapter 1 proposes an XAI method that can be used in credit risk management, in particular, in measuring the risks associated with borrowing through peer-to-peer lending platforms. The model applies correlation networks to Shapley values and thus the model predictions are grouped according to the similarity of the underlying explanations. Chapter 2 develops an alternative XAI method based on the Lorenz Zonoid approach. The new method is statistically normalised and can therefore be used as a standard for the application of Artificial Intelligence (AI) in credit risk management. The novel ”Shapley-Lorenz”-approach can facilitate the validation of model results and supports the decision whether a model is sufficiently explained. In Chapter 3, an XAI method is applied to assess the impact of financial and non-financial factors on a firm’s ex-ante cost of capital, a measure that reflects investors’ perceptions of a firm’s risk appetite. A combination of two explanatory tools: the Shapley values and the Lorenz model selection approach, enabled the identification of the most important features and the reduction of the independent features. This allowed a substantial simplification of the model without a statistically significant decrease in predictive accuracy

    Unwrapping black box models: a case study in credit risk

    Get PDF
    The past two decades have witnessed the rapid development of machine learning techniques, which have proven to be powerful tools for the construction of predictive models, such as those used in credit risk management. A considerable volume of published work has looked at the utility of machine learning for this purpose, the increased predictive capacities delivered and how new types of data can be exploited. However, these benefits come at the cost of increased complexity, which may render the models uninterpretable. To overcome this issue a new field has emerged under the name of explainable artificial intelligence, with numerous tools being proposed to gain an insight into the inner workings of these models. This type of understanding is fundamental in credit risk in order to ensure compliance with the existing regulatory requirements and to comprehend the factors driving the predictions and their macro-economic implications. This paper studies the effectiveness of some of the most widely-used interpretability techniques on a neural network trained on real data. These techniques are found to be useful for understanding the model, even though some limitations have been encountered.En las dos últimas décadas se ha observado un rápido desarrollo de las técnicas de aprendizaje automático, que han demostrado ser herramientas muy potentes para elaborar modelos de predicción, como los utilizados en la gestión del riesgo de crédito. En un volumen considerable de trabajos publicados se analizan la utilidad del aprendizaje automático para este fin, las mayores capacidades predictivas que ofrece y la forma en la que se pueden explotar nuevos tipos de datos. Sin embargo, estas ventajas llevan aparejada una mayor complejidad, que puede imposibilitar la interpretación de los modelos. Para solventar este punto ha surgido un nuevo campo de investigación, denominado «inteligencia artificial explicable» (del inglés explicable artificial intelligence), en el que se proponen numerosas herramientas para obtener información relativa al funcionamiento interno de estos modelos. Este tipo de conocimiento es fundamental en materia de riesgo de crédito para garantizar que se cumplen los requerimientos regulatorios existentes y para comprender los factores determinantes de las predicciones y sus implicaciones macroeconómicas. En este artículo se estudia la eficacia de algunas de las técnicas de interpretabilidad más utilizadas en una red neuronal entrenada con datos reales. Estas técnicas se consideran útiles para la comprensión del modelo, pese a que se han detectado algunas limitaciones

    A dynamic credit scoring model based on survival gradient boosting decision tree approach

    Get PDF
    Credit scoring, which is typically transformed into a classification problem, is a powerful tool to manage credit risk since it forecasts the probability of default (PD) of a loan application. However, there is a growing trend of integrating survival analysis into credit scoring to provide a dynamic prediction on PD over time and a clear explanation on censoring. A novel dynamic credit scoring model (i.e., SurvXGBoost) is proposed based on survival gradient boosting decision tree (GBDT) approach. Our proposal, which combines survival analysis and GBDT approach, is expected to enhance predictability relative to statistical survival models. The proposed method is compared with several common benchmark models on a real-world consumer loan dataset. The results of out-of-sample and out-of-time validation indicate that SurvXGBoost outperform the benchmarks in terms of predictability and misclassification cost. The incorporation of macroeconomic variables can further enhance performance of survival models. The proposed SurvXGBoost meanwhile maintains some interpretability since it provides information on feature importance. First published online 14 December 202

    Essays in industrial organization of Peer-to-Peer online credit markets

    Get PDF
    This dissertation consists of three separate essays on Peer-to-Peer (P2P) online credit markets. The first essay presents new empirical evidence of decreases in loan demand and repayment when prices in the market are determined by competing lenders in auctions as compared to the case in which a platform directly controls all prices. The paper develops an econometric model of loan demand and repayment which is then used to predict borrower choices when they are offered prices set by lenders in a market. I find that when lenders set prices, borrowers are more likely to pick loans of shorter maturity and smaller sizes, and repay less. Aggregated at the market level, demand and repayment of credit fall by 10% and 2%, respectively. In the second paper, I quantify the effects of implementation of finer credit scoring on credit demand, defaults and repayment in the context of a large P2P online credit platform. I exploit an exogenous change in the platform's credit scoring policy where the centralized price setting rules ensure that the one-to-one relationship between credit scores and prices remains intact unlike in a traditional credit market where it is broken. The results show that a 1% increase in interest rate due to the implementation of finer credit scoring results in an average decrease of 0.29% in the requested loan amount, an average increase of 0.01 in the fraction of borrowers who default and an average increase of 0.02 in the fraction of loan repaid. These findings contribute to a better understanding of how a reduction in information asymmetry affects borrower choices in a credit market. The third paper explores the main drivers behind the geographic expansion in demand for credit from P2P online platforms. It uses data from the two largest platforms in the United States to conduct an empirical analysis. By exploiting heterogeneity in local credit markets before the entry of P2P online platforms, the paper estimates the effect of local credit market conditions on demand for credit from P2P platforms. The paper uses a spatial autoregressive model for the main specification. We find that P2P consumer credit expanded more in counties with poor branch networks, lower concentration of banks, and lower leverage ratios
    corecore