Search CORE

2 research outputs found

Improving Risk Predictions by Preprocessing Imbalanced Credit Data

Author: B. Tian
C. Bunkhumpornpat
C. Phua
D.L. Wilson
G.E.A.P.A. Batista
I. Brown
J. Demšar
J. Laurikkala
K. Kennedy
L.C. Thomas
N. Japkowicz
N.M. Kiefer
N.V. Chawla
P.E. Hart
S.J. Yen
V. Vinciotti
Y.M. Huang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Imbalanced credit data sets refer to databases in which the class of defaulters is heavily under-represented in comparison to the class of non-defaulters. This is a very common situation in real-life credit scoring applications, but it has still received little attention. This paper investigates whether data resampling can be used to improve the performance of learners built from imbalanced credit data sets, and whether the effectiveness of resampling is related to the type of classifier. Experimental results demonstrate that learning with the resampled sets consistently outperforms the use of the original imbalanced credit data, independently of the classifier used

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

On the suitability of resampling techniques for the class imbalance problem in credit scoring

Author: A I Marqués
Abrahams CR
Chawla NV
Demšar J
Hochberg Y
J S Sánchez
Japkowicz N
Pluto K
Thomas LC
V García
Vinciotti V
Yen S-J
Zar JH
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

In real-life credit scoring applications, the case in which the class of defaulters is under-represented in comparison with the class of non-defaulters is a very common situation, but it has still received little attention. The present paper investigates the suitability and performance of several resampling techniques when applied in conjunction with statistical and artificial intelligence prediction models over five real-world credit data sets, which have artificially been modified to derive different imbalance ratios (proportion of defaulters and non-defaulters examples). Experimental results demonstrate that the use of resampling methods consistently improves the performance given by the original imbalanced data. Besides, it is also important to note that in general, over-sampling techniques perform better than any under-sampling approach.This work has partially been supported by the Spanish Ministry of Education and Science under grant TIN2009– 14205 and the Generalitat Valenciana under grant PROMETEO/2010/ 028

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Repositori Institucional de la Universitat Jaume I