507,471 research outputs found

    Hellinger Distance Trees for Imbalanced Streams

    Get PDF
    Classifiers trained on data sets possessing an imbalanced class distribution are known to exhibit poor generalisation performance. This is known as the imbalanced learning problem. The problem becomes particularly acute when we consider incremental classifiers operating on imbalanced data streams, especially when the learning objective is rare class identification. As accuracy may provide a misleading impression of performance on imbalanced data, existing stream classifiers based on accuracy can suffer poor minority class performance on imbalanced streams, with the result being low minority class recall rates. In this paper we address this deficiency by proposing the use of the Hellinger distance measure, as a very fast decision tree split criterion. We demonstrate that by using Hellinger a statistically significant improvement in recall rates on imbalanced data streams can be achieved, with an acceptable increase in the false positive rate.Comment: 6 Pages, 2 figures, to be published in Proceedings 22nd International Conference on Pattern Recognition (ICPR) 201

    Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations

    Get PDF
    Deep-learning has proved in recent years to be a powerful tool for image analysis and is now widely used to segment both 2D and 3D medical images. Deep-learning segmentation frameworks rely not only on the choice of network architecture but also on the choice of loss function. When the segmentation process targets rare observations, a severe class imbalance is likely to occur between candidate labels, thus resulting in sub-optimal performance. In order to mitigate this issue, strategies such as the weighted cross-entropy function, the sensitivity function or the Dice loss function, have been proposed. In this work, we investigate the behavior of these loss functions and their sensitivity to learning rate tuning in the presence of different rates of label imbalance across 2D and 3D segmentation tasks. We also propose to use the class re-balancing properties of the Generalized Dice overlap, a known metric for segmentation assessment, as a robust and accurate deep-learning loss function for unbalanced tasks

    CLINICAL: Targeted Active Learning for Imbalanced Medical Image Classification

    Full text link
    Training deep learning models on medical datasets that perform well for all classes is a challenging task. It is often the case that a suboptimal performance is obtained on some classes due to the natural class imbalance issue that comes with medical data. An effective way to tackle this problem is by using targeted active learning, where we iteratively add data points to the training data that belong to the rare classes. However, existing active learning methods are ineffective in targeting rare classes in medical datasets. In this work, we propose Clinical (targeted aCtive Learning for ImbalaNced medICal imAge cLassification) a framework that uses submodular mutual information functions as acquisition functions to mine critical data points from rare classes. We apply our framework to a wide-array of medical imaging datasets on a variety of real-world class imbalance scenarios - namely, binary imbalance and long-tail imbalance. We show that Clinical outperforms the state-of-the-art active learning methods by acquiring a diverse set of data points that belong to the rare classes.Comment: Accepted to MICCAI 2022 MILLanD Worksho

    Improving traffic sign recognition by active search

    Full text link
    We describe an iterative active-learning algorithm to recognise rare traffic signs. A standard ResNet is trained on a training set containing only a single sample of the rare class. We demonstrate that by sorting the samples of a large, unlabeled set by the estimated probability of belonging to the rare class, we can efficiently identify samples from the rare class. This works despite the fact that this estimated probability is usually quite low. A reliable active-learning loop is obtained by labeling these candidate samples, including them in the training set, and iterating the procedure. Further, we show that we get similar results starting from a single synthetic sample. Our results are important as they indicate a straightforward way of improving traffic-sign recognition for automated driving systems. In addition, they show that we can make use of the information hidden in low confidence outputs, which is usually ignored.Comment: 6 pages, 7 Figure
    • …
    corecore