2 research outputs found
CSMOUTE: Combined Synthetic Oversampling and Undersampling Technique for Imbalanced Data Classification
In this paper we propose two novel data-level algorithms for handling data
imbalance in the classification task: first of all a Synthetic Minority
Undersampling Technique (SMUTE), which leverages the concept of interpolation
of nearby instances, previously introduced in the oversampling setting in
SMOTE, and secondly a Combined Synthetic Oversampling and Undersampling
Technique (CSMOUTE), which integrates SMOTE oversampling with SMUTE
undersampling. The results of the conducted experimental study demonstrate the
usefulness of both the SMUTE and the CSMOUTE algorithms, especially when
combined with a more complex classifiers, namely MLP and SVM, and when applied
on a datasets consisting of a large number of outliers. This leads us to a
conclusion that the proposed approach shows promise for further extensions
accommodating local data characteristics, a direction discussed in more detail
in the paper
A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem
The problem of class imbalance is extensive for focusing on numerous
applications in the real world. In such a situation, nearly all of the examples
are labeled as one class called majority class, while far fewer examples are
labeled as the other class usually, the more important class is called
minority. Over the last few years, several types of research have been carried
out on the issue of class imbalance, including data sampling, cost-sensitive
analysis, Genetic Programming based models, bagging, boosting, etc.
Nevertheless, in this survey paper, we enlisted the 24 related studies in the
years 2003, 2008, 2010, 2012 and 2014 to 2019, focusing on the architecture of
single, hybrid, and ensemble method design to understand the current status of
improving classification output in machine learning techniques to fix problems
with class imbalances. This survey paper also includes a statistical analysis
of the classification algorithms under various methods and several other
experimental conditions, as well as datasets used in different research papers.Comment: 12 Pages, 2 Figure