1,682 research outputs found

    Predicting Systemic Banking Crises using Extreme Gradient Boosting

    Get PDF
    571-575Considering the great ability of decision trees techniques to extract useful information from large databases and to handle heterogeneous variables, this paper applies Extreme Gradient Boosting for the prediction of systemic banking crises. To this end, prediction models have been constructed for different regions and the whole world. The results obtained show that Extreme Gradient Boosting overcomes the predictive power of existing models in the previous literature and provides more explanatory information on the causes that produce systemic banking crises, being the demand for deposits, the level of domestic credit and banking assets some of the most significant variables

    A Text Mining and Ensemble Learning Based Approach for Credit Risk Prediction

    Get PDF
    Traditional credit risk prediction models mainly rely on financial data. However, technological innovation is the main driving force for the development of enterprises in strategic emerging industries, which is closely related to enterprise credit risk. In this paper, a novel prediction framework utilizing technological innovation text mining data and ensemble learning is proposed. The empirical data from China listed enterprises in strategic emerging industries were applied to construct prediction models using the classification and regression tree model, the random forest model and extreme gradient boosting model. The results show that the model uses the technological innovation text mining data proven to have significant predict ability, and top management teamꞌs attention to innovation variables offer the best prediction capacities. This work improves the application value of enterprise credit risk prediction models in strategic emerging industries by embedding the mining of technological innovation text information

    Machine learning applied to banking supervision a literature review

    Get PDF
    Guerra, P., & Castelli, M. (2021). Machine learning applied to banking supervision a literature review. Risks, 9(7), 1-24. [136]. https://doi.org/10.3390/risks9070136Machine learning (ML) has revolutionised data analysis over the past decade. Like in-numerous other industries heavily reliant on accurate information, banking supervision stands to benefit greatly from this technological advance. The objective of this review is to provide a compre-hensive walk-through of how the most common ML techniques have been applied to risk assessment in banking, focusing on a supervisory perspective. We searched Google Scholar, Springer Link, and ScienceDirect databases for articles including the search terms “machine learning” and (“bank” or “banking” or “supervision”). No language, date, or Journal filter was applied. Papers were then screened and selected according to their relevance. The final article base consisted of 41 papers and 2 book chapters, 53% of which were published in the top quartile journals in their field. Results are presented in a timeline according to the publication date and categorised by time slots. Credit risk assessment and stress testing are highlighted topics as well as other risk perspectives, with some references to ML application surveys. The most relevant ML techniques encompass k-nearest neigh-bours (KNN), support vector machines (SVM), tree-based models, ensembles, boosting techniques, and artificial neural networks (ANN). Recent trends include developing early warning systems (EWS) for bankruptcy and refining stress testing. One limitation of this study is the paucity of contributions using supervisory data, which justifies the need for additional investigation in this field. However, there is increasing evidence that ML techniques can enhance data analysis and decision making in the banking industry.publishersversionpublishe

    A Hybrid Technological Innovation Text Mining, Ensemble Learning and Risk Scorecard Approach for Enterprise Credit Risk Assessment

    Get PDF
    Enterprise credit risk assessment models typically use financial-based information as a predictor variable, relying on backward-looking historical information rather than forward-looking information for risk assessment. We propose a novel hybrid assessment of credit risk that uses technological innovation information as a predictor variable. Text mining techniques are used to extract this information for each enterprise. A combination of random forest and extreme gradient boosting are used for indicator screening, and finally, risk scorecard based on logistic regression is used for credit risk scoring. Our results show that technological innovation indicators obtained through text mining provide valuable information for credit risk assessment, and that the combination of ensemble learning from random forest and extreme gradient boosting combinations with logistic regression models outperforms other traditional methods. The best results achieved 0.9129 area under receiver operating characteristic. In addition, our approach provides meaningful scoring rules for credit risk assessment of technology innovation enterprises

    Interpretable models of loss given default

    Get PDF
    Mestrado em Econometria Aplicada e PrevisãoA gestão do risco de crédito é uma área em que os reguladores esperam que os bancos adotem modelos de risco transparentes e auditáveis colocando de parte o uso de modelos de black-box apesar destes serem mais precisos. Neste estudo, mostramos que os bancos não precisam de sacrificar a precisão preditiva ao custo da transparência do modelo para estar em conformidade com os requisitos regulatórios. Ilustramos isso mostrando que as previsões de perdas de crédito fornecidas por um modelo black-box podem ser facilmente explicadas em termos dos seus inputs.Credit risk management is an area where regulators expect banks to have transparent and auditable risk models, which would preclude the use of more accurate black-box models. Furthermore, the opaqueness of these models may hide unknown biases that may lead to unfair lending decisions. In this study, we show that banks do not have to sacrifice predictive accuracy at the cost of model transparency to be compliant with regulatory requirements. We illustrate this by showing that the predictions of credit losses given by a black-box model can be easily explained in terms of their inputs. Because black-box models fit better the data, banks should consider the determinants of credit losses suggested by these models in lending decisions and pricing of credit exposures.info:eu-repo/semantics/publishedVersio

    Machine Learning applied to credit risk assessment: Prediction of loan defaults

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceDue to the recent financial crisis and regulatory concerns of Basel II, credit risk assessment is becoming a very important topic in the field of financial risk management. Financial institutions need to take great care when dealing with consumer loans in order to avoid losses and costs of opportunity. For this matter, credit scoring systems have been used to make informed decisions on whether or not to grant credit to clients who apply to them. Until now several credit scoring models have been proposed, from statistical models, to more complex artificial intelligence techniques. However, most of previous work is focused on employing single classifiers. Ensemble learning is a powerful machine learning paradigm which has proven to be of great value in solving a variety of problems. This study compares the performance of the industry standard, logistic regression, to four ensemble methods, i.e. AdaBoost, Gradient Boosting, Random Forest and Stacking in identifying potential loan defaults. All the models were built with a real world dataset with over one million customers from Lending Club, a financial institution based in the United States. The performance of the models was compared by using the Hold-out method as the evaluation design and accuracy, AUC, type I error and type II error as evaluation metrics. Experimental results reveal that the ensemble classifiers were able to outperform logistic regression on three key indicators, i.e. accuracy, type I error and type II error. AdaBoost performed better than the remaining classifiers considering a trade off between all the metrics evaluated. The main contribution of this thesis is an experimental addition to the literature on the preferred models for predicting potential loan defaulters

    Predicting prepayment in home loans: Modelling full and partial prepayment in the Portuguese banking sector using machine learning methods

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceExiste um pré-pagamento quando ocorre um reembolso antecipado de um empréstimo por parte do tomador, i.e., o tomador paga mais que o montante contratual acordado. Tal pode ocorrer como parte do principal em dívida (reembolso parcial) ou o valor total do principal em dívida (reembolso total). Do ponto de vista de um banco, o estudo do reembolso antecipado - seja total ou parcial - é importante, pois resulta numa mudança nos fluxos de caixa calendarizados. Em particular, há uma diminuição nos fluxos de caixa futuros resultantes de um evento futuro desconhecido. Assim, o principal objetivo deste estudo é a modelação dos eventos de pré-pagamento no crédito à habitação de um grande banco português, através de uma abordagem de machine learning, avaliando o seu desempenho através da utilização de técnicas como a Area Under the Receiver Operating Characteristic Curve (ROC), o gain or lift e Kolmogorov-Smirnov. Tal permite o estudo do fenómeno das amortizações antecipadas (ou pré-pagamentos) no mercado Português, utilizando dados reais, e através de modelos de machine learning. Uma vez que foram utilizados dados reais, a primeira parte deste estudo prendeu-se com o préprocessamento dos dados, de modo a garantir que os modelos não incluíam ruído e problemas de qualidade de dados. A segunda parte prendeu-se com a computação dos modelos de machine learning, testando modelos de artificial neural network e random forest, com a comparação da performance destes através de métricas como o ROC, gain or lift e Kolmogorov-Smirnov. Os resultados obtidos revelam que os modelos de pré-pagamento total e parcial apresentam bom desempenho nas três métricas de desempenho analisadas. Ambos os modelos apresentam resultados positivos e demonstram que os modelos apresentam bons resultados preditivos e capacidade discriminatória, sendo o modelo de amortização parcial superior ao modelo de amortização total, com uma diferença que, embora não muito grande, merece destaque. Este estudo é particularmente relevante dada a sua análise num banco português, e a aplicação de modelos de machine learning na modelação de pré-pagamento, para os quais os estudos são escassos. Por outro lado, têm recentemente ocorrido esforços (por parte do banco onde o estudo se encontra incluído) para a atualização dos modelos tradicionais atualmente em vigor.There is a loan prepayment when there is an early repayment of a loan from the borrower, i.e. the borrower pays more than the contractual amount due. The repayment may be part of the outstanding principal (partial repayment) or the total principal outstanding (full repayment). From a Bank’s perspective, the study of early repayment – be it full or partial – is relevant as they result in a change in the schedule cash flows. In particular, there is a decrease in the future cash flows resulting from an unknown future event. Hence, the primary purpose of this study is the modelling of the prepayment events in the mortgage loans of a large Portuguese bank, through a machine learning approach, assessing its performance through the use of techniques such as the Area Under the Receiver Operating Characteristic Curve (ROC), the Gain or Lift, and Kolmogorov-Smirnov statistic. This allows for the test of the prepayment phenomena in the Portuguese reality, using real Bank data, and through the use of machine learning models. As there was a use of real-life data, the first part of this study implied the pre-processing of the data, to ensure that the noise and data quality problems were not part of the models. The second stage implied the computation of the machine learning models, which occurred through the testing of Artificial Neural Network and Random Forest models, with the comparison of its performance using the ROC, Gain or Lift and Kolmogorov-Smirnov statistic. The results obtained reveal that both the total and partial prepayment models perform well in all the three performance metrics analysed. Both models present positive results and demonstrate that the models have good predictive results and discriminatory capacity. The partial repayment model is superior to the full repayment model, with a difference that is worthy of mention although not very large. This study is particularly relevant given its analysis in a Portuguese bank and the application of machine learning models in modelling prepayment, for which studies are scarce. Furthermore, there have been occurring efforts (in the bank where this study is framed) to update the traditional models currently in force

    A credit risk model with small sample data based on G-XGBoost

    Get PDF
    Currently existing credit risk models, e.g., Scoring Card and Extreme Gradient Boosting (XGBoost), usually have requirements for the capacity of modeling samples. The small sample size may result in the adverse outcomes for the trained models which may neither achieve the expected accuracy nor distinguish risks well. On the other hand, data acquisition can be difficult and restricted due to data protection regulations. In view of the above dilemma, this paper applies Generative Adversarial Nets (GAN) to the construction of small and micro enterprises (SMEs) credit risk model, and proposes a novel training method, namely G-XGBoost, based on the XGBoost model. A few batches of real data are selected to train GAN. When the generative network reaches Nash equilibrium, the network is used to generate pseudo data with the same distribution. The pseudo data is then combined with real data to form an amplified sample set. The amplified sample set is used to train XGBoost for credit risk prediction. The feasibility and advantages of the G-XGBoost model are demonstrated by comparing with the XGBoost model
    corecore