9,041 research outputs found
Label-Efficient Deep Learning in Medical Image Analysis: Challenges and Future Directions
Deep learning has seen rapid growth in recent years and achieved
state-of-the-art performance in a wide range of applications. However, training
models typically requires expensive and time-consuming collection of large
quantities of labeled data. This is particularly true within the scope of
medical imaging analysis (MIA), where data are limited and labels are expensive
to be acquired. Thus, label-efficient deep learning methods are developed to
make comprehensive use of the labeled data as well as the abundance of
unlabeled and weak-labeled data. In this survey, we extensively investigated
over 300 recent papers to provide a comprehensive overview of recent progress
on label-efficient learning strategies in MIA. We first present the background
of label-efficient learning and categorize the approaches into different
schemes. Next, we examine the current state-of-the-art methods in detail
through each scheme. Specifically, we provide an in-depth investigation,
covering not only canonical semi-supervised, self-supervised, and
multi-instance learning schemes, but also recently emerged active and
annotation-efficient learning strategies. Moreover, as a comprehensive
contribution to the field, this survey not only elucidates the commonalities
and unique features of the surveyed methods but also presents a detailed
analysis of the current challenges in the field and suggests potential avenues
for future research.Comment: Update Few-shot Method
SS-CPGAN: Self-Supervised Cut-and-Pasting Generative Adversarial Network for Object Segmentation
This paper proposes a novel self-supervised based Cut-and-Paste GAN to
perform foreground object segmentation and generate realistic composite images
without manual annotations. We accomplish this goal by a simple yet effective
self-supervised approach coupled with the U-Net based discriminator. The
proposed method extends the ability of the standard discriminators to learn not
only the global data representations via classification (real/fake) but also
learn semantic and structural information through pseudo labels created using
the self-supervised task. The proposed method empowers the generator to create
meaningful masks by forcing it to learn informative per-pixel as well as global
image feedback from the discriminator. Our experiments demonstrate that our
proposed method significantly outperforms the state-of-the-art methods on the
standard benchmark datasets
Towards generalizable machine learning models for computer-aided diagnosis in medicine
Hidden stratification represents a phenomenon in which a training dataset contains unlabeled (hidden) subsets of cases that may affect machine learning model performance. Machine learning models that ignore the hidden stratification phenomenon--despite promising overall performance measured as accuracy and sensitivity--often fail at predicting the low prevalence cases, but those cases remain important. In the medical domain, patients with diseases are often less common than healthy patients, and a misdiagnosis of a patient with a disease can have significant clinical impacts. Therefore, to build a robust and trustworthy CAD system and a reliable treatment effect prediction model, we cannot only pursue machine learning models with high overall accuracy, but we also need to discover any hidden stratification in the data and evaluate the proposing machine learning models with respect to both overall performance and the performance on certain subsets (groups) of the data, such as the ‘worst group’.
In this study, I investigated three approaches for data stratification: a novel algorithmic deep learning (DL) approach that learns similarities among cases and two schema completion approaches that utilize domain expert knowledge. I further proposed an innovative way to integrate the discovered latent groups into the loss functions of DL models to allow for better model generalizability under the domain shift scenario caused by the data heterogeneity.
My results on lung nodule Computed Tomography (CT) images and breast cancer histopathology images demonstrate that learning homogeneous groups within heterogeneous data significantly improves the performance of the computer-aided diagnosis (CAD) system, particularly for low-prevalence or worst-performing cases. This study emphasizes the importance of discovering and learning the latent stratification within the data, as it is a critical step towards building ML models that are generalizable and reliable. Ultimately, this discovery can have a profound impact on clinical decision-making, particularly for low-prevalence cases
Data efficient deep learning for medical image analysis: A survey
The rapid evolution of deep learning has significantly advanced the field of
medical image analysis. However, despite these achievements, the further
enhancement of deep learning models for medical image analysis faces a
significant challenge due to the scarcity of large, well-annotated datasets. To
address this issue, recent years have witnessed a growing emphasis on the
development of data-efficient deep learning methods. This paper conducts a
thorough review of data-efficient deep learning methods for medical image
analysis. To this end, we categorize these methods based on the level of
supervision they rely on, encompassing categories such as no supervision,
inexact supervision, incomplete supervision, inaccurate supervision, and only
limited supervision. We further divide these categories into finer
subcategories. For example, we categorize inexact supervision into multiple
instance learning and learning with weak annotations. Similarly, we categorize
incomplete supervision into semi-supervised learning, active learning, and
domain-adaptive learning and so on. Furthermore, we systematically summarize
commonly used datasets for data efficient deep learning in medical image
analysis and investigate future research directions to conclude this survey.Comment: Under Revie
- …