507,471 research outputs found
Hellinger Distance Trees for Imbalanced Streams
Classifiers trained on data sets possessing an imbalanced class distribution
are known to exhibit poor generalisation performance. This is known as the
imbalanced learning problem. The problem becomes particularly acute when we
consider incremental classifiers operating on imbalanced data streams,
especially when the learning objective is rare class identification. As
accuracy may provide a misleading impression of performance on imbalanced data,
existing stream classifiers based on accuracy can suffer poor minority class
performance on imbalanced streams, with the result being low minority class
recall rates. In this paper we address this deficiency by proposing the use of
the Hellinger distance measure, as a very fast decision tree split criterion.
We demonstrate that by using Hellinger a statistically significant improvement
in recall rates on imbalanced data streams can be achieved, with an acceptable
increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International
Conference on Pattern Recognition (ICPR) 201
Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations
Deep-learning has proved in recent years to be a powerful tool for image
analysis and is now widely used to segment both 2D and 3D medical images.
Deep-learning segmentation frameworks rely not only on the choice of network
architecture but also on the choice of loss function. When the segmentation
process targets rare observations, a severe class imbalance is likely to occur
between candidate labels, thus resulting in sub-optimal performance. In order
to mitigate this issue, strategies such as the weighted cross-entropy function,
the sensitivity function or the Dice loss function, have been proposed. In this
work, we investigate the behavior of these loss functions and their sensitivity
to learning rate tuning in the presence of different rates of label imbalance
across 2D and 3D segmentation tasks. We also propose to use the class
re-balancing properties of the Generalized Dice overlap, a known metric for
segmentation assessment, as a robust and accurate deep-learning loss function
for unbalanced tasks
CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification
Training deep learning models on medical datasets that perform well for all
classes is a challenging task. It is often the case that a suboptimal
performance is obtained on some classes due to the natural class imbalance
issue that comes with medical data. An effective way to tackle this problem is
by using targeted active learning, where we iteratively add data points to the
training data that belong to the rare classes. However, existing active
learning methods are ineffective in targeting rare classes in medical datasets.
In this work, we propose Clinical (targeted aCtive Learning for ImbalaNced
medICal imAge cLassification) a framework that uses submodular mutual
information functions as acquisition functions to mine critical data points
from rare classes. We apply our framework to a wide-array of medical imaging
datasets on a variety of real-world class imbalance scenarios - namely, binary
imbalance and long-tail imbalance. We show that Clinical outperforms the
state-of-the-art active learning methods by acquiring a diverse set of data
points that belong to the rare classes.Comment: Accepted to MICCAI 2022 MILLanD Worksho
Improving traffic sign recognition by active search
We describe an iterative active-learning algorithm to recognise rare traffic
signs. A standard ResNet is trained on a training set containing only a single
sample of the rare class. We demonstrate that by sorting the samples of a
large, unlabeled set by the estimated probability of belonging to the rare
class, we can efficiently identify samples from the rare class. This works
despite the fact that this estimated probability is usually quite low. A
reliable active-learning loop is obtained by labeling these candidate samples,
including them in the training set, and iterating the procedure. Further, we
show that we get similar results starting from a single synthetic sample. Our
results are important as they indicate a straightforward way of improving
traffic-sign recognition for automated driving systems. In addition, they show
that we can make use of the information hidden in low confidence outputs, which
is usually ignored.Comment: 6 pages, 7 Figure
- …