157 research outputs found
Learning Disentangled Representations in the Imaging Domain
Disentangled representation learning has been proposed as an approach to
learning general representations even in the absence of, or with limited,
supervision. A good general representation can be fine-tuned for new target
tasks using modest amounts of data, or used directly in unseen domains
achieving remarkable performance in the corresponding task. This alleviation of
the data and annotation requirements offers tantalising prospects for
applications in computer vision and healthcare. In this tutorial paper, we
motivate the need for disentangled representations, present key theory, and
detail practical building blocks and criteria for learning such
representations. We discuss applications in medical imaging and computer vision
emphasising choices made in exemplar key works. We conclude by presenting
remaining challenges and opportunities.Comment: Submitted. This paper follows a tutorial style but also surveys a
considerable (more than 200 citations) number of work
Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map
Deep learning classifiers provide the most accurate means of automatically
diagnosing diabetic retinopathy (DR) based on optical coherence tomography
(OCT) and its angiography (OCTA). The power of these models is attributable in
part to the inclusion of hidden layers that provide the complexity required to
achieve a desired task. However, hidden layers also render algorithm outputs
difficult to interpret. Here we introduce a novel biomarker activation map
(BAM) framework based on generative adversarial learning that allows clinicians
to verify and understand classifiers decision-making. A data set including 456
macular scans were graded as non-referable or referable DR based on current
clinical standards. A DR classifier that was used to evaluate our BAM was first
trained based on this data set. The BAM generation framework was designed by
combing two U-shaped generators to provide meaningful interpretability to this
classifier. The main generator was trained to take referable scans as input and
produce an output that would be classified by the classifier as non-referable.
The BAM is then constructed as the difference image between the output and
input of the main generator. To ensure that the BAM only highlights
classifier-utilized biomarkers an assistant generator was trained to do the
opposite, producing scans that would be classified as referable by the
classifier from non-referable scans. The generated BAMs highlighted known
pathologic features including nonperfusion area and retinal fluid. A fully
interpretable classifier based on these highlights could help clinicians better
utilize and verify automated DR diagnosis.Comment: 12 pages, 8 figure
Deep generative models for medical image synthesis and strategies to utilise them
Medical imaging has revolutionised the diagnosis and treatments of diseases since the first
medical image was taken using X-rays in 1895. As medical imaging became an essential tool
in a modern healthcare system, more medical imaging techniques have been invented, such
as Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Computed
Tomography (CT), Ultrasound, etc. With the advance of medical imaging techniques, the
demand for processing and analysing these complex medical images is increasing rapidly.
Efforts have been put on developing approaches that can automatically analyse medical images. With the recent success of deep learning (DL) in computer vision, researchers have
applied and proposed many DL-based methods in the field of medical image analysis. However, one problem with data-driven DL-based methods is the lack of data. Unlike natural
images, medical images are more expensive to acquire and label. One way to alleviate the
lack of medical data is medical image synthesis.
In this thesis, I first start with pseudo healthy synthesis, which is to create a ‘healthy’ looking
medical image from a pathological one. The synthesised pseudo healthy images can be used
for the detection of pathology, segmentation, etc. Several challenges exist with this task. The
first challenge is the lack of ground-truth data, as a subject cannot be healthy and diseased at
the same time. The second challenge is how to evaluate the generated images. In this thesis,
I propose a deep learning method to learn to generate pseudo healthy images with adversarial
and cycle consistency losses to overcome the lack of ground-truth data. I also propose several
metrics to evaluate the quality of synthetic ‘healthy’ images. Pseudo healthy synthesis can be
viewed as transforming images between discrete domains, e.g. from pathological domain to
healthy domain. However, there are some changes in medical data that are continuous, e.g.
brain ageing progression.
Brain changes as age increases. With the ageing global population, research on brain ageing
has attracted increasing attention. In this thesis, I propose a deep learning method that can
simulate such brain ageing progression. Specifically, longitudinal brain data are not easy to
acquire; if some exist, they only cover several years. Thus, the proposed method focuses on
learning subject-specific brain ageing progression without training on longitudinal data. As
there are other factors, such as neurodegenerative diseases, that can affect brain ageing, the
proposed model also considers health status, i.e. the existence of Alzheimer’s Disease (AD).
Furthermore, to evaluate the quality of synthetic aged images, I define several metrics and
conducted a series of experiments.
Suppose we have a pre-trained deep generative model and a downstream tasks model, say
a classifier. One question is how to make the best of the generative model to improve the
performance of the classifier. In this thesis, I propose a simple procedure that can discover
the ‘weakness’ of the classifier and guide the generator to synthesise counterfactuals (synthetic
data) that are hard for the classifier. The proposed procedure constructs an adversarial
game between generative factors of the generator and the classifier. We demonstrate the effectiveness
of this proposed procedure through a series of experiments. Furthermore, we
consider the application of generative models in a continual learning context and investigate
the usefulness of them to alleviate spurious correlation.
This thesis creates new avenues for further research in the area of medical image synthesis
and how to utilise the medical generative models, which we believe could be important for
future studies in medical image analysis with deep learning
Towards generalizable machine learning models for computer-aided diagnosis in medicine
Hidden stratification represents a phenomenon in which a training dataset contains unlabeled (hidden) subsets of cases that may affect machine learning model performance. Machine learning models that ignore the hidden stratification phenomenon--despite promising overall performance measured as accuracy and sensitivity--often fail at predicting the low prevalence cases, but those cases remain important. In the medical domain, patients with diseases are often less common than healthy patients, and a misdiagnosis of a patient with a disease can have significant clinical impacts. Therefore, to build a robust and trustworthy CAD system and a reliable treatment effect prediction model, we cannot only pursue machine learning models with high overall accuracy, but we also need to discover any hidden stratification in the data and evaluate the proposing machine learning models with respect to both overall performance and the performance on certain subsets (groups) of the data, such as the ‘worst group’.
In this study, I investigated three approaches for data stratification: a novel algorithmic deep learning (DL) approach that learns similarities among cases and two schema completion approaches that utilize domain expert knowledge. I further proposed an innovative way to integrate the discovered latent groups into the loss functions of DL models to allow for better model generalizability under the domain shift scenario caused by the data heterogeneity.
My results on lung nodule Computed Tomography (CT) images and breast cancer histopathology images demonstrate that learning homogeneous groups within heterogeneous data significantly improves the performance of the computer-aided diagnosis (CAD) system, particularly for low-prevalence or worst-performing cases. This study emphasizes the importance of discovering and learning the latent stratification within the data, as it is a critical step towards building ML models that are generalizable and reliable. Ultimately, this discovery can have a profound impact on clinical decision-making, particularly for low-prevalence cases
Deep Learning in Single-Cell Analysis
Single-cell technologies are revolutionizing the entire field of biology. The
large volumes of data generated by single-cell technologies are
high-dimensional, sparse, heterogeneous, and have complicated dependency
structures, making analyses using conventional machine learning approaches
challenging and impractical. In tackling these challenges, deep learning often
demonstrates superior performance compared to traditional machine learning
methods. In this work, we give a comprehensive survey on deep learning in
single-cell analysis. We first introduce background on single-cell technologies
and their development, as well as fundamental concepts of deep learning
including the most popular deep architectures. We present an overview of the
single-cell analytic pipeline pursued in research applications while noting
divergences due to data sources or specific applications. We then review seven
popular tasks spanning through different stages of the single-cell analysis
pipeline, including multimodal integration, imputation, clustering, spatial
domain identification, cell-type deconvolution, cell segmentation, and
cell-type annotation. Under each task, we describe the most recent developments
in classical and deep learning methods and discuss their advantages and
disadvantages. Deep learning tools and benchmark datasets are also summarized
for each task. Finally, we discuss the future directions and the most recent
challenges. This survey will serve as a reference for biologists and computer
scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi
Multimodal and disentangled representation learning for medical image analysis
Automated medical image analysis is a growing research field with various applications in
modern healthcare. Furthermore, a multitude of imaging techniques (or modalities) have been
developed, such as Magnetic Resonance (MR) and Computed Tomography (CT), to attenuate
different organ characteristics. Research on image analysis is predominately driven by deep
learning methods due to their demonstrated performance. In this thesis, we argue that their success and generalisation relies on learning good latent representations. We propose methods for
learning spatial representations that are suitable for medical image data, and can combine information coming from different modalities. Specifically, we aim to improve cardiac MR segmentation, a challenging task due to varied images and limited expert annotations, by considering
complementary information present in (potentially unaligned) images of other modalities.
In order to evaluate the benefit of multimodal learning, we initially consider a synthesis task
on spatially aligned multimodal brain MR images. We propose a deep network of multiple
encoders and decoders, which we demonstrate outperforms existing approaches. The encoders
(one per input modality) map the multimodal images into modality invariant spatial feature
maps. Common and unique information is combined into a fused representation, that is robust
to missing modalities, and can be decoded into synthetic images of the target modalities. Different experimental settings demonstrate the benefit of multimodal over unimodal synthesis,
although input and output image pairs are required for training. The need for paired images can
be overcome with the cycle consistency principle, which we use in conjunction with adversarial
training to transform images from one modality (e.g. MR) to images in another (e.g. CT). This
is useful especially in cardiac datasets, where different spatial and temporal resolutions make
image pairing difficult, if not impossible.
Segmentation can also be considered as a form of image synthesis, if one modality consists of
semantic maps. We consider the task of extracting segmentation masks for cardiac MR images,
and aim to overcome the challenge of limited annotations, by taking into account unannanotated images which are commonly ignored. We achieve this by defining suitable latent spaces,
which represent the underlying anatomies (spatial latent variable), as well as the imaging characteristics (non-spatial latent variable). Anatomical information is required for tasks such as
segmentation and regression, whereas imaging information can capture variability in intensity
characteristics for example due to different scanners. We propose two models that disentangle
cardiac images at different levels: the first extracts the myocardium from the surrounding information, whereas the second fully separates the anatomical from the imaging characteristics.
Experimental analysis confirms the utility of disentangled representations in semi-supervised
segmentation, and in regression of cardiac indices, while maintaining robustness to intensity
variations such as the ones induced by different modalities.
Finally, our prior research is aggregated into one framework that encodes multimodal images
into disentangled anatomical and imaging factors. Several challenges of multimodal cardiac
imaging, such as input misalignments and the lack of expert annotations, are successfully handled in the shared anatomy space. Furthermore, we demonstrate that this approach can be used
to combine complementary anatomical information for the purpose of multimodal segmentation. This can be achieved even when no annotations are provided for one of the modalities.
This thesis creates new avenues for further research in the area of multimodal and disentangled learning with spatial representations, which we believe are key to more generalised deep
learning solutions in healthcare
Representation learning for generalisation in medical image analysis
To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an
increasingly crucial role in modern health care. In particular, using machine learning (ML) and
deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays
scans has been a research hot topic. Accurate and generalisable medical image segmentation
using ML and DL is one of the most challenging medical image analysis tasks. The challenges
are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data.
To tackle the above challenges, one of the best ways is to learn disentangled representations.
Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident
that learning disentangled representations is well suited for the challenges, there are several
open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement.
In the medical domain, content-style disentanglement is one of the most effective frameworks
to learn disentangled presentations. It disentangles and encodes image “content” into a spatial
tensor, and image appearance or “style” into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models.
Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting.
Humans can quickly learn to accurately recognise anatomy of interest from medical images
with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning
ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings.
Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future
- …