The interaction of sampling ratio and modelling method in prediction of binary target with rare target class

Hirschowitz, Steven

research

The interaction of sampling ratio and modelling method in prediction of binary target with rare target class

Authors: Steven Hirschowitz
Publication date: 14 September 2009
Publisher

Abstract

In many practical predictive data mining problems with a binary target, one of the target classes is rare. In such a situation it is common practice to decrease the ratio of common to rare class cases in the training set by under-sampling the common class. The relationship between the ratio of common to rare class cases in the training set and model performance was investigated empirically on three artificial and three real-world data sets. The results indicated that a flexible modelling method without regularisation benefits in both mean and variance of performance from a larger ratio when evaluated on a criterion sensitive to overfitting, and benefits in mean but not variance of performance when evaluated on a criterion less sensitive to overfitting. For an inflexible modelling method and a flexible method with regularisation, the effects of a larger ratio were less consistent. In no circumstances, however, was a larger ratio found to be detrimental to model performance, however measured

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Wits Institutional Repository on DSPACE

oai:wiredspace.wits.ac.za:1053...

Last time updated on 14/06/2016