11,997 research outputs found
An empirical evaluation of imbalanced data strategies from a practitioner's point of view
This research tested the following well known strategies to deal with binary
imbalanced data on 82 different real life data sets (sampled to imbalance rates
of 5%, 3%, 1%, and 0.1%): class weight, SMOTE, Underbagging, and a baseline
(just the base classifier). As base classifiers we used SVM with RBF kernel,
random forests, and gradient boosting machines and we measured the quality of
the resulting classifier using 6 different metrics (Area under the curve,
Accuracy, F-measure, G-mean, Matthew's correlation coefficient and Balanced
accuracy). The best strategy strongly depends on the metric used to measure the
quality of the classifier. For AUC and accuracy class weight and the baseline
perform better; for F-measure and MCC, SMOTE performs better; and for G-mean
and balanced accuracy, underbagging
Heartbeat Anomaly Detection using Adversarial Oversampling
Cardiovascular diseases are one of the most common causes of death in the
world. Prevention, knowledge of previous cases in the family, and early
detection is the best strategy to reduce this fact. Different machine learning
approaches to automatic diagnostic are being proposed to this task. As in most
health problems, the imbalance between examples and classes is predominant in
this problem and affects the performance of the automated solution. In this
paper, we address the classification of heartbeats images in different
cardiovascular diseases. We propose a two-dimensional Convolutional Neural
Network for classification after using a InfoGAN architecture for generating
synthetic images to unbalanced classes. We call this proposal Adversarial
Oversampling and compare it with the classical oversampling methods as SMOTE,
ADASYN, and RandomOversampling. The results show that the proposed approach
improves the classifier performance for the minority classes without harming
the performance in the balanced classes
- …