22 research outputs found

    Rule Induction Methods For Credit Scoring

    Get PDF
    Credit scoring is the term used by the credit industry to describe methods used for classifying applicants for credit into risk classes according to their likely repayment behavior (e.g. “default” and “non-default”).  The credit industry has been using such methods as logistic regression, discriminant analysis, and various machine learning techniques to more precisely identify creditworthy applicants who are granted credit, and non-creditworthy applicants who are denied credit.  Accurate classification is of benefit both to the creditor (in terms of increased profit or reduced loss) and to the loan applicant (avoiding overcommitment).  This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution deemed to be qualified customers.  The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon.  The paper uses rule induction methods (decision trees) to predict whether a particular applicant paid off or defaulted upon his/her loan.  The main advantage of decision trees is their ability to generate if-then classification rules which are intuitive and easy to understand. Rules could be explained to business managers who would need to approve their implementation as well as loan applicants as the reason for denying a loan.  The paper compares the correct classification accuracy rates of several decision tree algorithms with other data mining methods proposed in earlier works

    A Comparison of Linear and Nonlinear Models in Forecasting Market Risk: The Evidence from Turkish Derivative Exchange

    Get PDF
    This paper aims to compare the volatility forecasting performance of linear and nonlinear models for ISE-30 future index which is traded in Turkish Derivatives Exchangefor the period between 04.02.2005-17.06.2011. As a result of analyses, we conclude that ANN model has better forecasting performance than traditional ARCH-GARCH models. This result is important in many fields of finance such as investment decisions, asset pricing, portfolio allocation and risk managemen

    Using Memory-Based Reasoning For Predicting Default Rates On Consumer Loans

    Get PDF
    In recent years, financial institutions have struggled with high default rates for consumer lending. An ability to reliably predict the probability of consumer loan defaults would have a significant impact of the profitability of that lending for these institutions. In response to this need, the financial institutions have employed loan analysis techniques such as logistic regression, discriminant analysis, and various machine learning techniques to improve the accuracy of detecting loan defaults.  The objective of these techniques is to more precisely identify creditworthy applicants who are granted credit, thereby increasing profits, from non-creditworthy applicants who would be then denied credit, thus decreasing losses. The objective of this article is to employ an emergent data analysis technique, memory-based or case-based reasoning method, to this problem to test its accuracy in discriminating between good and bad loans. This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution considered to be qualified customers.  The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper then compares the performance of this technique to other data mining techniques proposed in earlier works and analyzes the risk of default inherent in each loan for each technique

    How Secure Are Good Loans: Validating Loan-Granting Decisions And Predicting Default Rates On Consumer Loans

    Get PDF
    The failure or success of the banking industry depends largely on the industrys ability to properly evaluate credit risk. In the consumer-lending context, the banks goal is to maximize income by issuing as many good loans to consumers as possible while avoiding losses associated with bad loans. Mistakes could severely affect profits because the losses associated with one bad loan may undermine the income earned on many good loans. Therefore banks carefully evaluate the financial status of each customer as well as their credit worthiness and weigh them against the banks internal loan-granting policies. Recognizing that even a small improvement in credit scoring accuracy translates into significant future savings, the banking industry and the scientific community have been employing various machine learning and traditional statistical techniques to improve credit risk prediction accuracy.This paper examines historical data from consumer loans issued by a financial institution to individuals that the financial institution deemed to be qualified customers. The data consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off and defaulted upon. The paper uses three different data mining techniques (decision trees, neural networks, logit regression) and the ensemble model, which combines the three techniques, to predict whether a particular customer defaulted or paid off his/her loan. The paper then compares the effectiveness of each technique and analyzes the risk of default inherent in each loan and group of loans. The data mining classification techniques and analysis can enable banks to more precisely classify consumers into various credit risk groups. Knowing what risk group a consumer falls into would allow a bank to fine tune its lending policies by recognizing high risk groups of consumers to whom loans should not be issued, and identifying safer loans that should be issued, on terms commensurate with the risk of default

    Comparison Of The Performance Of Several Data Mining Methods For Bad Debt Recovery In The Healthcare Industry

    Get PDF
    The healthcare industry, specifically hospitals and clinical organizations, are often plagued by unpaid bills and collection agency fees. These unpaid bills contribute significantly to the rising cost of healthcare. Unlike financial institutions, health care providers typically do not collect financial information about their patients.  This lack of information makes it difficult to evaluate whether a particular patient-debtor is likely to pay his/her bill.  In recent years, the industry has started to apply data mining tools to reduce bad-debt balance. This paper compares the effectiveness of five such tools - neural networks, decision trees, logistic regression, memory-based reasoning, and the ensemble model in evaluating whether a debt is likely to be repaid. The data analysis and evaluation of the performance of the models are based on a fairly large unbalanced data sample provided by a healthcare company, in which cases with recovered bad debts are underrepresented. Computer simulation shows that the neural network, logistic regression, and the combined model produced the best classification accuracy. More thorough interpretation of the results is obtained by analyzing the lift and receiver operating characteristic charts. We used the models to score all “unknown” cases, which were not pursued by a company. The best model classified about 34.8% of these cases into “good” cases. To collect bad debts more effectively, we recommend that a company first deploy and use the models, before it refers unrecovered cases to a collection agency.   &nbsp

    A hybrid XGBoost-MLP model for credit risk assessment on Digital Supply Chain Finance

    Get PDF
    Supply Chain Finance (SCF) has gradually taken on digital characteristics with the rapid development of electronic information technology. Business audit information has become more abundant and complex, which has increased the efficiency and increased the potential risk of commercial banks, with credit risk being the biggest risk they face. Therefore, credit risk assessment based on the application of digital SCF is of great importance to commercial banks’ financial decisions. This paper uses a hybrid Extreme Gradient Boosting Multi-Layer Perceptron (XGBoost-MLP) model to assess the credit risk of Digital SCF (DSCF). In this paper, 1357 observations from 85 Chinese-listed SMEs over the period 2016–2019 are selected as the empirical sample, and the important features of credit risk assessment in DSCF are automatically selected through the feature selection of the XGBoost model in the first stage, then followed by credit risk assessment through the MLP in the second stage. Based on the empirical results, we find that the XGBoost-MLP model has good performance in credit risk assessment, where XGBoost feature selection is important for the credit risk assessment model. From the perspective of DSCF, the results show that the inclusion of digital features improves the accuracy of credit risk assessment in SCF

    Predicting prepayment in home loans: Modelling full and partial prepayment in the Portuguese banking sector using machine learning methods

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceExiste um pré-pagamento quando ocorre um reembolso antecipado de um empréstimo por parte do tomador, i.e., o tomador paga mais que o montante contratual acordado. Tal pode ocorrer como parte do principal em dívida (reembolso parcial) ou o valor total do principal em dívida (reembolso total). Do ponto de vista de um banco, o estudo do reembolso antecipado - seja total ou parcial - é importante, pois resulta numa mudança nos fluxos de caixa calendarizados. Em particular, há uma diminuição nos fluxos de caixa futuros resultantes de um evento futuro desconhecido. Assim, o principal objetivo deste estudo é a modelação dos eventos de pré-pagamento no crédito à habitação de um grande banco português, através de uma abordagem de machine learning, avaliando o seu desempenho através da utilização de técnicas como a Area Under the Receiver Operating Characteristic Curve (ROC), o gain or lift e Kolmogorov-Smirnov. Tal permite o estudo do fenómeno das amortizações antecipadas (ou pré-pagamentos) no mercado Português, utilizando dados reais, e através de modelos de machine learning. Uma vez que foram utilizados dados reais, a primeira parte deste estudo prendeu-se com o préprocessamento dos dados, de modo a garantir que os modelos não incluíam ruído e problemas de qualidade de dados. A segunda parte prendeu-se com a computação dos modelos de machine learning, testando modelos de artificial neural network e random forest, com a comparação da performance destes através de métricas como o ROC, gain or lift e Kolmogorov-Smirnov. Os resultados obtidos revelam que os modelos de pré-pagamento total e parcial apresentam bom desempenho nas três métricas de desempenho analisadas. Ambos os modelos apresentam resultados positivos e demonstram que os modelos apresentam bons resultados preditivos e capacidade discriminatória, sendo o modelo de amortização parcial superior ao modelo de amortização total, com uma diferença que, embora não muito grande, merece destaque. Este estudo é particularmente relevante dada a sua análise num banco português, e a aplicação de modelos de machine learning na modelação de pré-pagamento, para os quais os estudos são escassos. Por outro lado, têm recentemente ocorrido esforços (por parte do banco onde o estudo se encontra incluído) para a atualização dos modelos tradicionais atualmente em vigor.There is a loan prepayment when there is an early repayment of a loan from the borrower, i.e. the borrower pays more than the contractual amount due. The repayment may be part of the outstanding principal (partial repayment) or the total principal outstanding (full repayment). From a Bank’s perspective, the study of early repayment – be it full or partial – is relevant as they result in a change in the schedule cash flows. In particular, there is a decrease in the future cash flows resulting from an unknown future event. Hence, the primary purpose of this study is the modelling of the prepayment events in the mortgage loans of a large Portuguese bank, through a machine learning approach, assessing its performance through the use of techniques such as the Area Under the Receiver Operating Characteristic Curve (ROC), the Gain or Lift, and Kolmogorov-Smirnov statistic. This allows for the test of the prepayment phenomena in the Portuguese reality, using real Bank data, and through the use of machine learning models. As there was a use of real-life data, the first part of this study implied the pre-processing of the data, to ensure that the noise and data quality problems were not part of the models. The second stage implied the computation of the machine learning models, which occurred through the testing of Artificial Neural Network and Random Forest models, with the comparison of its performance using the ROC, Gain or Lift and Kolmogorov-Smirnov statistic. The results obtained reveal that both the total and partial prepayment models perform well in all the three performance metrics analysed. Both models present positive results and demonstrate that the models have good predictive results and discriminatory capacity. The partial repayment model is superior to the full repayment model, with a difference that is worthy of mention although not very large. This study is particularly relevant given its analysis in a Portuguese bank and the application of machine learning models in modelling prepayment, for which studies are scarce. Furthermore, there have been occurring efforts (in the bank where this study is framed) to update the traditional models currently in force
    corecore