2,593 research outputs found
Profit Maximizing Logistic Regression Modeling for Credit Scoring
Multiple classification techniques have been employed for different business applications. In the particular case of credit scoring, a classifier which maximizes the total profit is preferable. The recently proposed expected maximum profit (EMP) measure for credit scoring allows to select the most profitable classifier. Taking the idea of the EMP one step further, it is desirable to integrate the measure into model construction, and thus obtain a profit maximizing model. Therefore, in this work we propose a method based on the ProfLogit classifier, which optimizes the coefficients of a logistic regression model using a genetic algorithm. The proposed implemented technique shows a significant improvement compared to regular maximum likelihood based logistic regression models on real-life data sets in terms of total profit, which is the ultimate goal for most businesses.</p
Ensemble of Example-Dependent Cost-Sensitive Decision Trees
Several real-world classification problems are example-dependent
cost-sensitive in nature, where the costs due to misclassification vary between
examples and not only within classes. However, standard classification methods
do not take these costs into account, and assume a constant cost of
misclassification errors. In previous works, some methods that take into
account the financial costs into the training of different algorithms have been
proposed, with the example-dependent cost-sensitive decision tree algorithm
being the one that gives the highest savings. In this paper we propose a new
framework of ensembles of example-dependent cost-sensitive decision-trees. The
framework consists in creating different example-dependent cost-sensitive
decision trees on random subsamples of the training set, and then combining
them using three different combination approaches. Moreover, we propose two new
cost-sensitive combination approaches; cost-sensitive weighted voting and
cost-sensitive stacking, the latter being based on the cost-sensitive logistic
regression method. Finally, using five different databases, from four
real-world applications: credit card fraud detection, churn modeling, credit
scoring and direct marketing, we evaluate the proposed method against
state-of-the-art example-dependent cost-sensitive techniques, namely,
cost-proportionate sampling, Bayes minimum risk and cost-sensitive decision
trees. The results show that the proposed algorithms have better results for
all databases, in the sense of higher savings.Comment: 13 pages, 6 figures, Submitted for possible publicatio
Bankruptcy Prediction of Small and Medium Enterprises Using a Flexible Binary Generalized Extreme Value Model
We introduce a binary regression accounting-based model for bankruptcy
prediction of small and medium enterprises (SMEs). The main advantage of the
model lies in its predictive performance in identifying defaulted SMEs. Another
advantage, which is especially relevant for banks, is that the relationship
between the accounting characteristics of SMEs and response is not assumed a
priori (e.g., linear, quadratic or cubic) and can be determined from the data.
The proposed approach uses the quantile function of the generalized extreme
value distribution as link function as well as smooth functions of accounting
characteristics to flexibly model covariate effects. Therefore, the usual
assumptions in scoring models of symmetric link function and linear or
pre-specied covariate-response relationships are relaxed. Out-of-sample and
out-of-time validation on Italian data shows that our proposal outperforms the
commonly used (logistic) scoring model for different default horizons
Support Vector Machines for Credit Scoring and discovery of significant features
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1
Survival Analysis in LGD Modeling
The paper proposes an application of the survival time analysis methodology to estimations of the Loss Given Default (LGD) parameter. The main advantage of the survival analysis approach compared to classical regression methods is that it allows exploiting partial recovery data. The model is also modified in order to improve performance of the appropriate goodness of fit measures. The empirical testing shows that the Cox proportional model applied to LGD modeling performs better than the linear and logistic regressions. In addition a significant improvement is achieved with the modified âpseudoâ Cox LGD model.credit risk, recovery rate, loss given default, correlation, regulatory capital
Development and application of consumer credit scoring models using profit-based classification measures
This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers' objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses -- driven by the exposure of the loan and the loss given default -- and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment
Development and application of consumer credit scoring models using profit-based classification measures
This paper presents a new approach for consumer credit scoring, by tailoring a profit-based classification performance measure to credit risk modeling. This performance measure takes into account the expected profits and losses of credit granting and thereby better aligns the model developers' objectives with those of the lending company. It is based on the Expected Maximum Profit (EMP) measure and is used to find a trade-off between the expected losses -- driven by the exposure of the loan and the loss given default -- and the operational income given by the loan. Additionally, one of the major advantages of using the proposed measure is that it permits to calculate the optimal cutoff value, which is necessary for model implementation. To test the proposed approach, we use a dataset of loans granted by a government institution, and benchmarked the accuracy and monetary gain of using EMP, accuracy, and the area under the ROC curve as measures for selecting model parameters, and for determining the respective cutoff values. The results show that our proposed profit-based classification measure outperforms the alternative approaches in terms of both accuracy and monetary value in the test set, and that it facilitates model deployment
"Can Banks Learn to Be Rational?"
Can banks learn to be rational in their lending activities? The answer depends on the institutionally bounded constraints to learning. From an evolutionary perspective the functionality (for survival) of "learning to be rational" creates strong incentives for such learning without, however, guaranteeing that each member of the particular economic species actually achieves increased fitness. I investigate this issue for a particular economic species, namely, commrercial banks. The purpose of this paper is to illustrate the key issues related to learning in an economic model by proposing a new screening model for bank commercial loans that uses the neuro fuzzy technique. The technical modeling aspect is integrally connected in a rigorous way to the key conceptual and theoretical aspects of the capabilities for learning to be rational in a broad but precise sense. This paper also compares the relative predictability of loan default among three methods of prediction--- discriminant analysis, logit type regression, and neuro fuzzy--- based on the real data obtained from one of the banks in Taiwan.The neuro fuzzy model, in contrast with the other two, incorporates recursive learning in a real world, imprecise linguistic environment. The empirical results show that in addition to its better screening ability, the neuro fuzzy model is superior in explaining the relationship among the variables as well. With further modifications,this model could be used by bank regulatory agencies for loan examination and by bank loan officers for loan review. The main theoretical conclusion to draw from this demonstration is that non-linear learning in a vague semantic world is both possible and useful. Therefore the search for alternatives to the full neoclassical rationality and its equivalent under uncertainty---rational expectations--- is a plausible and desirable search, especially when the probability for convergence to a rational expectations equilibrium is low.
An Analysis of Accuracy using Logistic Regression and Time Series
This paper analyzes the accuracy rates for logistic regression and time series models. It also examines a relatively new performance index that takes into consideration the business assumptions of credit markets. Although prior research has focused on evaluation metrics, such as AUC and Gini index, this new measure has a more intuitive interpretation for various managers and decision makers and can be applied to both Logistic and Time Series models
- âŠ