Search CORE

2 research outputs found

CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification

Author: Koziarski Michał
Publication venue
Publication date: 07/04/2020
Field of study

In this paper we propose two novel data-level algorithms for handling data imbalance in the classification task: first of all a Synthetic Minority Undersampling Technique (SMUTE), which leverages the concept of interpolation of nearby instances, previously introduced in the oversampling setting in SMOTE, and secondly a Combined Synthetic Oversampling and Undersampling Technique (CSMOUTE), which integrates SMOTE oversampling with SMUTE undersampling. The results of the conducted experimental study demonstrate the usefulness of both the SMUTE and the CSMOUTE algorithms, especially when combined with a more complex classifiers, namely MLP and SVM, and when applied on a datasets consisting of a large number of outliers. This leads us to a conclusion that the proposed approach shows promise for further extensions accommodating local data characteristics, a direction discussed in more detail in the paper

arXiv.org e-Print Archive

A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem

Author: Ahmed Shakil
Hasib Khan Md.
Iqbal Md. Sadiq
Mahmud Jubayer Al
Popel Mahmudul Hasan
Rahman Obaidur
Shah Faisal Muhammad
Showrov Md. Imran Hossain
Publication venue: 'Science Publications'
Publication date: 22/12/2020
Field of study

The problem of class imbalance is extensive for focusing on numerous applications in the real world. In such a situation, nearly all of the examples are labeled as one class called majority class, while far fewer examples are labeled as the other class usually, the more important class is called minority. Over the last few years, several types of research have been carried out on the issue of class imbalance, including data sampling, cost-sensitive analysis, Genetic Programming based models, bagging, boosting, etc. Nevertheless, in this survey paper, we enlisted the 24 related studies in the years 2003, 2008, 2010, 2012 and 2014 to 2019, focusing on the architecture of single, hybrid, and ensemble method design to understand the current status of improving classification output in machine learning techniques to fix problems with class imbalances. This survey paper also includes a statistical analysis of the classification algorithms under various methods and several other experimental conditions, as well as datasets used in different research papers.Comment: 12 Pages, 2 Figure

arXiv.org e-Print Archive