22 research outputs found
Rule Induction Methods For Credit Scoring
Credit scoring is the term used by the credit industry to describe methods used for classifying applicants for credit into risk classes according to their likely repayment behavior (e.g. “default” and “non-default”). The credit industry has been using such methods as logistic regression, discriminant analysis, and various machine learning techniques to more precisely identify creditworthy applicants who are granted credit, and non-creditworthy applicants who are denied credit. Accurate classification is of benefit both to the creditor (in terms of increased profit or reduced loss) and to the loan applicant (avoiding overcommitment). This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution deemed to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper uses rule induction methods (decision trees) to predict whether a particular applicant paid off or defaulted upon his/her loan. The main advantage of decision trees is their ability to generate if-then classification rules which are intuitive and easy to understand. Rules could be explained to business managers who would need to approve their implementation as well as loan applicants as the reason for denying a loan. The paper compares the correct classification accuracy rates of several decision tree algorithms with other data mining methods proposed in earlier works
A Comparison of Linear and Nonlinear Models in Forecasting Market Risk: The Evidence from Turkish Derivative Exchange
This paper aims to compare the volatility forecasting performance of linear and nonlinear models for ISE-30 future index which is traded in Turkish Derivatives Exchangefor the period between 04.02.2005-17.06.2011. As a result of analyses, we conclude that ANN model has better forecasting performance than traditional ARCH-GARCH models. This result is important in many fields of finance such as investment decisions, asset pricing, portfolio allocation and risk managemen
Using Memory-Based Reasoning For Predicting Default Rates On Consumer Loans
In recent years, financial institutions have struggled with high default rates for consumer lending. An ability to reliably predict the probability of consumer loan defaults would have a significant impact of the profitability of that lending for these institutions. In response to this need, the financial institutions have employed loan analysis techniques such as logistic regression, discriminant analysis, and various machine learning techniques to improve the accuracy of detecting loan defaults. The objective of these techniques is to more precisely identify creditworthy applicants who are granted credit, thereby increasing profits, from non-creditworthy applicants who would be then denied credit, thus decreasing losses. The objective of this article is to employ an emergent data analysis technique, memory-based or case-based reasoning method, to this problem to test its accuracy in discriminating between good and bad loans. This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper then compares the performance of this technique to other data mining techniques proposed in earlier works and analyzes the risk of default inherent in each loan for each technique
How Secure Are Good Loans: Validating Loan-Granting Decisions And Predicting Default Rates On Consumer Loans
The failure or success of the banking industry depends largely on the industrys ability to properly evaluate credit risk. In the consumer-lending context, the banks goal is to maximize income by issuing as many good loans to consumers as possible while avoiding losses associated with bad loans. Mistakes could severely affect profits because the losses associated with one bad loan may undermine the income earned on many good loans. Therefore banks carefully evaluate the financial status of each customer as well as their credit worthiness and weigh them against the banks internal loan-granting policies. Recognizing that even a small improvement in credit scoring accuracy translates into significant future savings, the banking industry and the scientific community have been employing various machine learning and traditional statistical techniques to improve credit risk prediction accuracy.This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution deemed to be qualified customers. The data consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off and defaulted upon. The paper uses three different data mining techniques (decision trees, neural networks, logit regression) and the ensemble model, which combines the three techniques, to predict whether a particular customer defaulted or paid off his/her loan. The paper then compares the effectiveness of each technique and analyzes the risk of default inherent in each loan and group of loans. The data mining classification techniques and analysis can enable banks to more precisely classify consumers into various credit risk groups. Knowing what risk group a consumer falls into would allow a bank to fine tune its lending policies by recognizing high risk groups of consumers to whom loans should not be issued, and identifying safer loans that should be issued, on terms commensurate with the risk of default
Comparison Of The Performance Of Several Data Mining Methods For Bad Debt Recovery In The Healthcare Industry
The healthcare industry, specifically hospitals and clinical organizations, are often plagued by unpaid bills and collection agency fees. These unpaid bills contribute significantly to the rising cost of healthcare. Unlike financial institutions, health care providers typically do not collect financial information about their patients. This lack of information makes it difficult to evaluate whether a particular patient-debtor is likely to pay his/her bill. In recent years, the industry has started to apply data mining tools to reduce bad-debt balance. This paper compares the effectiveness of five such tools - neural networks, decision trees, logistic regression, memory-based reasoning, and the ensemble model in evaluating whether a debt is likely to be repaid. The data analysis and evaluation of the performance of the models are based on a fairly large unbalanced data sample provided by a healthcare company, in which cases with recovered bad debts are underrepresented. Computer simulation shows that the neural network, logistic regression, and the combined model produced the best classification accuracy. More thorough interpretation of the results is obtained by analyzing the lift and receiver operating characteristic charts. We used the models to score all “unknown” cases, which were not pursued by a company. The best model classified about 34.8% of these cases into “good” cases. To collect bad debts more effectively, we recommend that a company first deploy and use the models, before it refers unrecovered cases to a collection agency.  
A hybrid XGBoost-MLP model for credit risk assessment on Digital Supply Chain Finance
Supply Chain Finance (SCF) has gradually taken on digital characteristics with the rapid development of electronic information technology. Business audit information has become more abundant and complex, which has increased the efficiency and increased the potential risk of commercial banks, with credit risk being the biggest risk they face. Therefore, credit risk assessment based on the application of digital SCF is of great importance to commercial banks’ financial decisions. This paper uses a hybrid Extreme Gradient Boosting Multi-Layer Perceptron (XGBoost-MLP) model to assess the credit risk of Digital SCF (DSCF). In this paper, 1357 observations from 85 Chinese-listed SMEs over the period 2016–2019 are selected as the empirical sample, and the important features of credit risk assessment in DSCF are automatically selected through the feature selection of the XGBoost model in the first stage, then followed by credit risk assessment through the MLP in the second stage. Based on the empirical results, we find that the XGBoost-MLP model has good performance in credit risk assessment, where XGBoost feature selection is important for the credit risk assessment model. From the perspective of DSCF, the results show that the inclusion of digital features improves the accuracy of credit risk assessment in SCF
Predicting prepayment in home loans: Modelling full and partial prepayment in the Portuguese banking sector using machine learning methods
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceExiste um pré-pagamento quando ocorre um reembolso antecipado de um empréstimo por parte do
tomador, i.e., o tomador paga mais que o montante contratual acordado. Tal pode ocorrer como parte
do principal em dívida (reembolso parcial) ou o valor total do principal em dívida (reembolso total). Do
ponto de vista de um banco, o estudo do reembolso antecipado - seja total ou parcial - é importante,
pois resulta numa mudança nos fluxos de caixa calendarizados. Em particular, há uma diminuição nos
fluxos de caixa futuros resultantes de um evento futuro desconhecido.
Assim, o principal objetivo deste estudo é a modelação dos eventos de pré-pagamento no crédito à
habitação de um grande banco português, através de uma abordagem de machine learning, avaliando
o seu desempenho através da utilização de técnicas como a Area Under the Receiver Operating
Characteristic Curve (ROC), o gain or lift e Kolmogorov-Smirnov. Tal permite o estudo do fenómeno
das amortizações antecipadas (ou pré-pagamentos) no mercado Português, utilizando dados reais, e
através de modelos de machine learning.
Uma vez que foram utilizados dados reais, a primeira parte deste estudo prendeu-se com o préprocessamento
dos dados, de modo a garantir que os modelos não incluíam ruído e problemas de
qualidade de dados. A segunda parte prendeu-se com a computação dos modelos de machine learning,
testando modelos de artificial neural network e random forest, com a comparação da performance
destes através de métricas como o ROC, gain or lift e Kolmogorov-Smirnov.
Os resultados obtidos revelam que os modelos de pré-pagamento total e parcial apresentam bom
desempenho nas três métricas de desempenho analisadas. Ambos os modelos apresentam resultados
positivos e demonstram que os modelos apresentam bons resultados preditivos e capacidade
discriminatória, sendo o modelo de amortização parcial superior ao modelo de amortização total, com
uma diferença que, embora não muito grande, merece destaque.
Este estudo é particularmente relevante dada a sua análise num banco português, e a aplicação de
modelos de machine learning na modelação de pré-pagamento, para os quais os estudos são escassos.
Por outro lado, têm recentemente ocorrido esforços (por parte do banco onde o estudo se encontra
incluído) para a atualização dos modelos tradicionais atualmente em vigor.There is a loan prepayment when there is an early repayment of a loan from the borrower, i.e. the
borrower pays more than the contractual amount due. The repayment may be part of the outstanding
principal (partial repayment) or the total principal outstanding (full repayment). From a Bank’s
perspective, the study of early repayment – be it full or partial – is relevant as they result in a change
in the schedule cash flows. In particular, there is a decrease in the future cash flows resulting from an
unknown future event.
Hence, the primary purpose of this study is the modelling of the prepayment events in the mortgage
loans of a large Portuguese bank, through a machine learning approach, assessing its performance
through the use of techniques such as the Area Under the Receiver Operating Characteristic Curve
(ROC), the Gain or Lift, and Kolmogorov-Smirnov statistic. This allows for the test of the prepayment
phenomena in the Portuguese reality, using real Bank data, and through the use of machine learning
models.
As there was a use of real-life data, the first part of this study implied the pre-processing of the data,
to ensure that the noise and data quality problems were not part of the models. The second stage
implied the computation of the machine learning models, which occurred through the testing of
Artificial Neural Network and Random Forest models, with the comparison of its performance using
the ROC, Gain or Lift and Kolmogorov-Smirnov statistic.
The results obtained reveal that both the total and partial prepayment models perform well in all the
three performance metrics analysed. Both models present positive results and demonstrate that the
models have good predictive results and discriminatory capacity. The partial repayment model is
superior to the full repayment model, with a difference that is worthy of mention although not very
large.
This study is particularly relevant given its analysis in a Portuguese bank and the application of machine
learning models in modelling prepayment, for which studies are scarce. Furthermore, there have been
occurring efforts (in the bank where this study is framed) to update the traditional models currently in
force