A performance comparison of oversampling methods for data generation in imbalanced learning tasks

Abstract

Dissertation presented as the partial requirement for obtaining a Master's degree in Statistics and Information Management, specialization in Marketing Research e CRMClass Imbalance problem is one of the most fundamental challenges faced by the machine learning community. The imbalance refers to number of instances in the class of interest being relatively low, as compared to the rest of the data. Sampling is a common technique for dealing with this problem. A number of over - sampling approaches have been applied in an attempt to balance the classes. This study provides an overview of the issue of class imbalance and attempts to examine some common oversampling approaches for dealing with this problem. In order to illustrate the differences, an experiment is conducted using multiple simulated data sets for comparing the performance of these oversampling methods on different classifiers based on various evaluation criteria. In addition, the effect of different parameters, such as number of features and imbalance ratio, on the classifier performance is also evaluated

    Similar works