55 research outputs found
A systematic study of the foreground-background imbalance problem in deep learning for object detection
The class imbalance problem in deep learning has been explored in several
studies, but there has yet to be a systematic analysis of this phenomenon in
object detection. Here, we present comprehensive analyses and experiments of
the foreground-background (F-B) imbalance problem in object detection, which is
very common and caused by small, infrequent objects of interest. We
experimentally study the effects of different aspects of F-B imbalance (object
size, number of objects, dataset size, object type) on detection performance.
In addition, we also compare 9 leading methods for addressing this problem,
including Faster-RCNN, SSD, OHEM, Libra-RCNN, Focal-Loss, GHM, PISA, YOLO-v3,
and GFL with a range of datasets from different imaging domains. We conclude
that (1) the F-B imbalance can indeed cause a significant drop in detection
performance, (2) The detection performance is more affected by F-B imbalance
when fewer training data are available, (3) in most cases, decreasing object
size leads to larger performance drop than decreasing number of objects, given
the same change in the ratio of object pixels to non-object pixels, (6) among
all selected methods, Libra-RCNN and PISA demonstrate the best performance in
addressing the issue of F-B imbalance. (7) When the training dataset size is
large, the choice of method is not impactful (8) Soft-sampling methods,
including focal-loss, GHM, and GFL, perform fairly well on average but are
relatively unstable
The Intrinsic Manifolds of Radiological Images and their Role in Deep Learning
The manifold hypothesis is a core mechanism behind the success of deep
learning, so understanding the intrinsic manifold structure of image data is
central to studying how neural networks learn from the data. Intrinsic dataset
manifolds and their relationship to learning difficulty have recently begun to
be studied for the common domain of natural images, but little such research
has been attempted for radiological images. We address this here. First, we
compare the intrinsic manifold dimensionality of radiological and natural
images. We also investigate the relationship between intrinsic dimensionality
and generalization ability over a wide range of datasets. Our analysis shows
that natural image datasets generally have a higher number of intrinsic
dimensions than radiological images. However, the relationship between
generalization ability and intrinsic dimensionality is much stronger for
medical images, which could be explained as radiological images having
intrinsic features that are more difficult to learn. These results give a more
principled underpinning for the intuition that radiological images can be more
challenging to apply deep learning to than natural image datasets common to
machine learning research. We believe rather than directly applying models
developed for natural images to the radiological imaging domain, more care
should be taken to developing architectures and algorithms that are more
tailored to the specific characteristics of this domain. The research shown in
our paper, demonstrating these characteristics and the differences from natural
images, is an important first step in this direction.Comment: preprint version, accepted for MICCAI 2022 (25th International
Conference on Medical Image Computing and Computer Assisted Intervention). 8
pages (+ author names + references + supplementary), 4 figures. Code
available at https://github.com/mazurowski-lab/radiologyintrinsicmanifold
Domain Generalization for Medical Image Analysis: A Survey
Medical Image Analysis (MedIA) has become an essential tool in medicine and
healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and
recent successes in deep learning (DL) have made significant contributions to
its advances. However, DL models for MedIA remain challenging to deploy in
real-world situations, failing for generalization under the distributional gap
between training and testing samples, known as a distribution shift problem.
Researchers have dedicated their efforts to developing various DL methods to
adapt and perform robustly on unknown and out-of-distribution data
distributions. This paper comprehensively reviews domain generalization studies
specifically tailored for MedIA. We provide a holistic view of how domain
generalization techniques interact within the broader MedIA system, going
beyond methodologies to consider the operational implications on the entire
MedIA workflow. Specifically, we categorize domain generalization methods into
data-level, feature-level, model-level, and analysis-level methods. We show how
those methods can be used in various stages of the MedIA workflow with DL
equipped from data acquisition to model prediction and analysis. Furthermore,
we include benchmark datasets and applications used to evaluate these
approaches and analyze the strengths and weaknesses of various methods,
unveiling future research opportunities
Segment Anything Model for Medical Image Analysis: an Experimental Study
Training segmentation models for medical images continues to be challenging
due to the limited availability and acquisition expense of data annotations.
Segment Anything Model (SAM) is a foundation model trained on over 1 billion
annotations, predominantly for natural images, that is intended to be able to
segment the user-defined object of interest in an interactive manner. Despite
its impressive performance on natural images, it is unclear how the model is
affected when shifting to medical image domains. Here, we perform an extensive
evaluation of SAM's ability to segment medical images on a collection of 11
medical imaging datasets from various modalities and anatomies. In our
experiments, we generated point prompts using a standard method that simulates
interactive segmentation. Experimental results show that SAM's performance
based on single prompts highly varies depending on the task and the dataset,
i.e., from 0.1135 for a spine MRI dataset to 0.8650 for a hip x-ray dataset,
evaluated by IoU. Performance appears to be high for tasks including
well-circumscribed objects with unambiguous prompts and poorer in many other
scenarios such as segmentation of tumors. When multiple prompts are provided,
performance improves only slightly overall, but more so for datasets where the
object is not contiguous. An additional comparison to RITM showed a much better
performance of SAM for one prompt but a similar performance of the two methods
for a larger number of prompts. We conclude that SAM shows impressive
performance for some datasets given the zero-shot learning setup but poor to
moderate performance for multiple other datasets. While SAM as a model and as a
learning paradigm might be impactful in the medical imaging domain, extensive
research is needed to identify the proper ways of adapting it in this domain.Comment: Link to our code:
https://github.com/mazurowski-lab/segment-anything-medica
- …