2 research outputs found
Mathematical programming models for classification problems with applications to credit scoring
Mathematical programming (MP) can be used for developing
classification models for the two–group classification problem. An MP
model can be used to generate a discriminant function that separates the
observations in a training sample of known group membership into the
specified groups optimally in terms of a group separation criterion. The
simplest models for MP discriminant analysis are linear programming
models in which the group separation measure is generally based on the
deviations of misclassified observations from the discriminant function.
MP discriminant analysis models have been tested extensively over the
last 30 years in developing classifiers for the two–group classification
problem. However, in the comparative studies that have included MP
models for classifier development, the MP discriminant analysis models
either lack appropriate normalisation constraints or they do not use the
proper data transformation. In addition, these studies have generally been
based on relatively small datasets. This thesis investigates the development
of MP discriminant analysis models that incorporate appropriate
normalisation constraints and data transformations. These MP models are
tested on binary classification problems, with an emphasis on credit scoring
problems, particularly application scoring, i.e. a two–group classification
problem concerned with distinguishing between good and bad applicants for
credit based on information from application forms and other relevant data.
The performance of these MP models is compared with the performance of
statistical techniques and machine learning methods and it is shown that MP
discriminant analysis models can be useful tools for developing classifiers.
Another topic covered in this thesis is feature selection. In order to make
classification models easier to understand, it is desirable to develop
parsimonious classification models with a limited number of features.
Features should ideally be selected based on their impact on classification
accuracy. Although MP discriminant analysis models can be extended for
feature selection based on classification accuracy, there are computational
difficulties in applying these models to large datasets. A new MP heuristic
for selecting features is suggested based on a feature selection MP
discriminant analysis model in which maximisation of classification
accuracy is the objective. The results of the heuristic are promising in
comparison with other feature selection methods.
Classifiers should ideally be developed from datasets with
approximately the same number of observations in each class, but in practice
classifiers must often be developed from imbalanced datasets. New MP
formulations are proposed to overcome the difficulties associated with
generating discriminant functions from imbalanced datasets. These
formulations are tested using datasets from financial institutions and the
performance of the MP-generated classifiers is compared with classifiers
generated by other methods. Finally, the ordinal classification problem is
considered. MP methods for the ordinal classification problem are outlined
and a new MP formulation is tested on a small dataset