6,528 research outputs found
Self-supervised Representation Learning for Ultrasound Video
Recent advances in deep learning have achieved promising performance for
medical image analysis, while in most cases ground-truth annotations from human
experts are necessary to train the deep model. In practice, such annotations
are expensive to collect and can be scarce for medical imaging applications.
Therefore, there is significant interest in learning representations from
unlabelled raw data. In this paper, we propose a self-supervised learning
approach to learn meaningful and transferable representations from medical
imaging video without any type of human annotation. We assume that in order to
learn such a representation, the model should identify anatomical structures
from the unlabelled data. Therefore we force the model to address anatomy-aware
tasks with free supervision from the data itself. Specifically, the model is
designed to correct the order of a reshuffled video clip and at the same time
predict the geometric transformation applied to the video clip. Experiments on
fetal ultrasound video show that the proposed approach can effectively learn
meaningful and strong representations, which transfer well to downstream tasks
like standard plane detection and saliency prediction.Comment: ISBI 202
Self-supervised Contrastive Video-Speech Representation Learning for Ultrasound
In medical imaging, manual annotations can be expensive to acquire and
sometimes infeasible to access, making conventional deep learning-based models
difficult to scale. As a result, it would be beneficial if useful
representations could be derived from raw data without the need for manual
annotations. In this paper, we propose to address the problem of
self-supervised representation learning with multi-modal ultrasound
video-speech raw data. For this case, we assume that there is a high
correlation between the ultrasound video and the corresponding narrative speech
audio of the sonographer. In order to learn meaningful representations, the
model needs to identify such correlation and at the same time understand the
underlying anatomical features. We designed a framework to model the
correspondence between video and audio without any kind of human annotations.
Within this framework, we introduce cross-modal contrastive learning and an
affinity-aware self-paced learning scheme to enhance correlation modelling.
Experimental evaluations on multi-modal fetal ultrasound video and audio show
that the proposed approach is able to learn strong representations and
transfers well to downstream tasks of standard plane detection and eye-gaze
prediction.Comment: MICCAI 2020 (early acceptance
Show from Tell: Audio-Visual Modelling in Clinical Settings
Auditory and visual signals usually present together and correlate with each
other, not only in natural environments but also in clinical settings. However,
the audio-visual modelling in the latter case can be more challenging, due to
the different sources of audio/video signals and the noise (both signal-level
and semantic-level) in auditory signals -- usually speech. In this paper, we
consider audio-visual modelling in a clinical setting, providing a solution to
learn medical representations that benefit various clinical tasks, without
human expert annotation. A simple yet effective multi-modal self-supervised
learning framework is proposed for this purpose. The proposed approach is able
to localise anatomical regions of interest during ultrasound imaging, with only
speech audio as a reference. Experimental evaluations on a large-scale clinical
multi-modal ultrasound video dataset show that the proposed self-supervised
method learns good transferable anatomical representations that boost the
performance of automated downstream clinical tasks, even outperforming
fully-supervised solutions
Detecting Heart Disease from Multi-View Ultrasound Images via Supervised Attention Multiple Instance Learning
Aortic stenosis (AS) is a degenerative valve condition that causes
substantial morbidity and mortality. This condition is under-diagnosed and
under-treated. In clinical practice, AS is diagnosed with expert review of
transthoracic echocardiography, which produces dozens of ultrasound images of
the heart. Only some of these views show the aortic valve. To automate
screening for AS, deep networks must learn to mimic a human expert's ability to
identify views of the aortic valve then aggregate across these relevant images
to produce a study-level diagnosis. We find previous approaches to AS detection
yield insufficient accuracy due to relying on inflexible averages across
images. We further find that off-the-shelf attention-based multiple instance
learning (MIL) performs poorly. We contribute a new end-to-end MIL approach
with two key methodological innovations. First, a supervised attention
technique guides the learned attention mechanism to favor relevant views.
Second, a novel self-supervised pretraining strategy applies contrastive
learning on the representation of the whole study instead of individual images
as commonly done in prior literature. Experiments on an open-access dataset and
an external validation set show that our approach yields higher accuracy while
reducing model size.Comment: multiple-instance learning; self-supervised learning; semi-supervised
learning; medical imagin
A Survey of the Impact of Self-Supervised Pretraining for Diagnostic Tasks with Radiological Images
Self-supervised pretraining has been observed to be effective at improving
feature representations for transfer learning, leveraging large amounts of
unlabelled data. This review summarizes recent research into its usage in
X-ray, computed tomography, magnetic resonance, and ultrasound imaging,
concentrating on studies that compare self-supervised pretraining to fully
supervised learning for diagnostic tasks such as classification and
segmentation. The most pertinent finding is that self-supervised pretraining
generally improves downstream task performance compared to full supervision,
most prominently when unlabelled examples greatly outnumber labelled examples.
Based on the aggregate evidence, recommendations are provided for practitioners
considering using self-supervised learning. Motivated by limitations identified
in current research, directions and practices for future study are suggested,
such as integrating clinical knowledge with theoretically justified
self-supervised learning methods, evaluating on public datasets, growing the
modest body of evidence for ultrasound, and characterizing the impact of
self-supervised pretraining on generalization.Comment: 32 pages, 6 figures, a literature survey submitted to BMC Medical
Imagin
FUSC: Fetal Ultrasound Semantic Clustering of Second Trimester Scans Using Deep Self-supervised Learning
Ultrasound is the primary imaging modality in clinical practice during
pregnancy. More than 140M fetuses are born yearly, resulting in numerous scans.
The availability of a large volume of fetal ultrasound scans presents the
opportunity to train robust machine learning models. However, the abundance of
scans also has its challenges, as manual labeling of each image is needed for
supervised methods. Labeling is typically labor-intensive and requires
expertise to annotate the images accurately. This study presents an
unsupervised approach for automatically clustering ultrasound images into a
large range of fetal views, reducing or eliminating the need for manual
labeling. Our Fetal Ultrasound Semantic Clustering (FUSC) method is developed
using a large dataset of 88,063 images and further evaluated on an additional
unseen dataset of 8,187 images achieving over 92% clustering purity. The result
of our investigation hold the potential to significantly impact the field of
fetal ultrasound imaging and pave the way for more advanced automated labeling
solutions. Finally, we make the code and the experimental setup publicly
available to help advance the field
Self-supervised contrastive learning of echocardiogram videos enables label-efficient cardiac disease diagnosis
Advances in self-supervised learning (SSL) have shown that self-supervised
pretraining on medical imaging data can provide a strong initialization for
downstream supervised classification and segmentation. Given the difficulty of
obtaining expert labels for medical image recognition tasks, such an
"in-domain" SSL initialization is often desirable due to its improved label
efficiency over standard transfer learning. However, most efforts toward SSL of
medical imaging data are not adapted to video-based medical imaging modalities.
With this progress in mind, we developed a self-supervised contrastive learning
approach, EchoCLR, catered to echocardiogram videos with the goal of learning
strong representations for efficient fine-tuning on downstream cardiac disease
diagnosis. EchoCLR leverages (i) distinct videos of the same patient as
positive pairs for contrastive learning and (ii) a frame re-ordering pretext
task to enforce temporal coherence. When fine-tuned on small portions of
labeled data (as few as 51 exams), EchoCLR pretraining significantly improved
classification performance for left ventricular hypertrophy (LVH) and aortic
stenosis (AS) over other transfer learning and SSL approaches across internal
and external test sets. For example, when fine-tuning on 10% of available
training data (519 studies), an EchoCLR-pretrained model achieved 0.72 AUROC
(95% CI: [0.69, 0.75]) on LVH classification, compared to 0.61 AUROC (95% CI:
[0.57, 0.64]) with a standard transfer learning approach. Similarly, using 1%
of available training data (53 studies), EchoCLR pretraining achieved 0.82
AUROC (95% CI: [0.79, 0.84]) on severe AS classification, compared to 0.61
AUROC (95% CI: [0.58, 0.65]) with transfer learning. EchoCLR is unique in its
ability to learn representations of medical videos and demonstrates that SSL
can enable label-efficient disease classification from small, labeled datasets
- …