2,046 research outputs found
Support Vector Machines for Credit Scoring and discovery of significant features
The assessment of risk of default on credit is important for financial institutions. Logistic regression and discriminant analysis are techniques traditionally used in credit scoring for determining likelihood to default based on consumer application and credit reference agency data. We test support vector machines against these traditional methods on a large credit card database. We find that they are competitive and can be used as the basis of a feature selection method to discover those features that are most significant in determining risk of default. 1
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Statistical aspects of credit scoring
This thesis is concerned with statistical aspects of credit scoring, the process of determining how likely an applicant for credit is to default with repayments. In Chapters 1-4 a detailed introduction to credit scoring methodology is presented, including evaluation of previous published work on credit scoring and a review of discrimination and classification techniques.
In Chapter 5 we describe different approaches to measuring the absolute and relative performance of credit scoring models. Two significance tests are proposed for comparing the bad rate amongst the accepts (or the error rate) from two classifiers.
In Chapter 6 we consider different approaches to reject inference, the procedure of allocating class membership probabilities to the rejects. One reason for needing reject inference is to reduce the sample selection bias that results from using a sample consisting only of accepted applicants to build new scorecards. We show that the characteristic vectors for the rejects do not contain information about the parameters of the observed data likelihood, unless extra information or assumptions are included. Methods of reject inference which incorporate additional information are proposed.
In Chapter 7 we make comparisons of a range of different parametric and nonparametric classification techniques for credit scoring: linear regression, logistic regression, projection pursuit regression, Poisson regression, decision trees and decision graphs. We conclude that classifier performance is fairly insensitive to the particular technique adopted.
In Chapter 8 we describe the application of the k-NN method to credit scoring. We propose using an adjusted version of the Eucidean distance metric, which is designed to incorporate knowledge of class separation contained in the data. We evaluate properties of the k-NN classifier through empirical studies and make comparisons with existing techniques
- …