384,979 research outputs found
Misclassification analysis for the class imbalance problem
In classification, the class imbalance issue normally causes the learning algorithm to be dominated by the majority classes and the features of the minority classes are sometimes ignored. This will indirectly affect how human visualise the data. Therefore, special care is needed to take care of the learning algorithm in order to enhance the accuracy for the minority classes. In this study, the use of misclassification analysis is investigated for data re-distribution. Several under-sampling techniques and hybrid techniques using misclassification analysis are proposed in the paper. The benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository are used to investigate the performance of the proposed techniques. The results show that the proposed hybrid technique presents the best performance in the experiment
MEBoost: Mixing Estimators with Boosting for Imbalanced Data Classification
Class imbalance problem has been a challenging research problem in the fields
of machine learning and data mining as most real life datasets are imbalanced.
Several existing machine learning algorithms try to maximize the accuracy
classification by correctly identifying majority class samples while ignoring
the minority class. However, the concept of the minority class instances
usually represents a higher interest than the majority class. Recently, several
cost sensitive methods, ensemble models and sampling techniques have been used
in literature in order to classify imbalance datasets. In this paper, we
propose MEBoost, a new boosting algorithm for imbalanced datasets. MEBoost
mixes two different weak learners with boosting to improve the performance on
imbalanced datasets. MEBoost is an alternative to the existing techniques such
as SMOTEBoost, RUSBoost, Adaboost, etc. The performance of MEBoost has been
evaluated on 12 benchmark imbalanced datasets with state of the art ensemble
methods like SMOTEBoost, RUSBoost, Easy Ensemble, EUSBoost, DataBoost.
Experimental results show significant improvement over the other methods and it
can be concluded that MEBoost is an effective and promising algorithm to deal
with imbalance datasets. The python version of the code is available here:
https://github.com/farshidrayhanuiu/Comment: SKIMA-201
A systematic study of the class imbalance problem in convolutional neural networks
In this study, we systematically investigate the impact of class imbalance on
classification performance of convolutional neural networks (CNNs) and compare
frequently used methods to address the issue. Class imbalance is a common
problem that has been comprehensively studied in classical machine learning,
yet very limited systematic research is available in the context of deep
learning. In our study, we use three benchmark datasets of increasing
complexity, MNIST, CIFAR-10 and ImageNet, to investigate the effects of
imbalance on classification and perform an extensive comparison of several
methods to address the issue: oversampling, undersampling, two-phase training,
and thresholding that compensates for prior class probabilities. Our main
evaluation metric is area under the receiver operating characteristic curve
(ROC AUC) adjusted to multi-class tasks since overall accuracy metric is
associated with notable difficulties in the context of imbalanced data. Based
on results from our experiments we conclude that (i) the effect of class
imbalance on classification performance is detrimental; (ii) the method of
addressing class imbalance that emerged as dominant in almost all analyzed
scenarios was oversampling; (iii) oversampling should be applied to the level
that completely eliminates the imbalance, whereas the optimal undersampling
ratio depends on the extent of imbalance; (iv) as opposed to some classical
machine learning models, oversampling does not cause overfitting of CNNs; (v)
thresholding should be applied to compensate for prior class probabilities when
overall number of properly classified cases is of interest
- …
