157 research outputs found

    Learning Disentangled Representations in the Imaging Domain

    Full text link
    Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, present key theory, and detail practical building blocks and criteria for learning such representations. We discuss applications in medical imaging and computer vision emphasising choices made in exemplar key works. We conclude by presenting remaining challenges and opportunities.Comment: Submitted. This paper follows a tutorial style but also surveys a considerable (more than 200 citations) number of work

    Interpretable Diabetic Retinopathy Diagnosis based on Biomarker Activation Map

    Full text link
    Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.Comment: 12 pages, 8 figure

    Deep generative models for medical image synthesis and strategies to utilise them

    Get PDF
    Medical imaging has revolutionised the diagnosis and treatments of diseases since the first medical image was taken using X-rays in 1895. As medical imaging became an essential tool in a modern healthcare system, more medical imaging techniques have been invented, such as Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Computed Tomography (CT), Ultrasound, etc. With the advance of medical imaging techniques, the demand for processing and analysing these complex medical images is increasing rapidly. Efforts have been put on developing approaches that can automatically analyse medical images. With the recent success of deep learning (DL) in computer vision, researchers have applied and proposed many DL-based methods in the field of medical image analysis. However, one problem with data-driven DL-based methods is the lack of data. Unlike natural images, medical images are more expensive to acquire and label. One way to alleviate the lack of medical data is medical image synthesis. In this thesis, I first start with pseudo healthy synthesis, which is to create a ‘healthy’ looking medical image from a pathological one. The synthesised pseudo healthy images can be used for the detection of pathology, segmentation, etc. Several challenges exist with this task. The first challenge is the lack of ground-truth data, as a subject cannot be healthy and diseased at the same time. The second challenge is how to evaluate the generated images. In this thesis, I propose a deep learning method to learn to generate pseudo healthy images with adversarial and cycle consistency losses to overcome the lack of ground-truth data. I also propose several metrics to evaluate the quality of synthetic ‘healthy’ images. Pseudo healthy synthesis can be viewed as transforming images between discrete domains, e.g. from pathological domain to healthy domain. However, there are some changes in medical data that are continuous, e.g. brain ageing progression. Brain changes as age increases. With the ageing global population, research on brain ageing has attracted increasing attention. In this thesis, I propose a deep learning method that can simulate such brain ageing progression. Specifically, longitudinal brain data are not easy to acquire; if some exist, they only cover several years. Thus, the proposed method focuses on learning subject-specific brain ageing progression without training on longitudinal data. As there are other factors, such as neurodegenerative diseases, that can affect brain ageing, the proposed model also considers health status, i.e. the existence of Alzheimer’s Disease (AD). Furthermore, to evaluate the quality of synthetic aged images, I define several metrics and conducted a series of experiments. Suppose we have a pre-trained deep generative model and a downstream tasks model, say a classifier. One question is how to make the best of the generative model to improve the performance of the classifier. In this thesis, I propose a simple procedure that can discover the ‘weakness’ of the classifier and guide the generator to synthesise counterfactuals (synthetic data) that are hard for the classifier. The proposed procedure constructs an adversarial game between generative factors of the generator and the classifier. We demonstrate the effectiveness of this proposed procedure through a series of experiments. Furthermore, we consider the application of generative models in a continual learning context and investigate the usefulness of them to alleviate spurious correlation. This thesis creates new avenues for further research in the area of medical image synthesis and how to utilise the medical generative models, which we believe could be important for future studies in medical image analysis with deep learning

    Simulation and Synthesis for Cardiac Magnetic Resonance Image Analysis

    Get PDF

    Towards generalizable machine learning models for computer-aided diagnosis in medicine

    Get PDF
    Hidden stratification represents a phenomenon in which a training dataset contains unlabeled (hidden) subsets of cases that may affect machine learning model performance. Machine learning models that ignore the hidden stratification phenomenon--despite promising overall performance measured as accuracy and sensitivity--often fail at predicting the low prevalence cases, but those cases remain important. In the medical domain, patients with diseases are often less common than healthy patients, and a misdiagnosis of a patient with a disease can have significant clinical impacts. Therefore, to build a robust and trustworthy CAD system and a reliable treatment effect prediction model, we cannot only pursue machine learning models with high overall accuracy, but we also need to discover any hidden stratification in the data and evaluate the proposing machine learning models with respect to both overall performance and the performance on certain subsets (groups) of the data, such as the ‘worst group’. In this study, I investigated three approaches for data stratification: a novel algorithmic deep learning (DL) approach that learns similarities among cases and two schema completion approaches that utilize domain expert knowledge. I further proposed an innovative way to integrate the discovered latent groups into the loss functions of DL models to allow for better model generalizability under the domain shift scenario caused by the data heterogeneity. My results on lung nodule Computed Tomography (CT) images and breast cancer histopathology images demonstrate that learning homogeneous groups within heterogeneous data significantly improves the performance of the computer-aided diagnosis (CAD) system, particularly for low-prevalence or worst-performing cases. This study emphasizes the importance of discovering and learning the latent stratification within the data, as it is a critical step towards building ML models that are generalizable and reliable. Ultimately, this discovery can have a profound impact on clinical decision-making, particularly for low-prevalence cases

    Deep Learning in Single-Cell Analysis

    Full text link
    Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developments in classical and deep learning methods and discuss their advantages and disadvantages. Deep learning tools and benchmark datasets are also summarized for each task. Finally, we discuss the future directions and the most recent challenges. This survey will serve as a reference for biologists and computer scientists, encouraging collaborations.Comment: 77 pages, 11 figures, 15 tables, deep learning, single-cell analysi

    Multimodal and disentangled representation learning for medical image analysis

    Get PDF
    Automated medical image analysis is a growing research field with various applications in modern healthcare. Furthermore, a multitude of imaging techniques (or modalities) have been developed, such as Magnetic Resonance (MR) and Computed Tomography (CT), to attenuate different organ characteristics. Research on image analysis is predominately driven by deep learning methods due to their demonstrated performance. In this thesis, we argue that their success and generalisation relies on learning good latent representations. We propose methods for learning spatial representations that are suitable for medical image data, and can combine information coming from different modalities. Specifically, we aim to improve cardiac MR segmentation, a challenging task due to varied images and limited expert annotations, by considering complementary information present in (potentially unaligned) images of other modalities. In order to evaluate the benefit of multimodal learning, we initially consider a synthesis task on spatially aligned multimodal brain MR images. We propose a deep network of multiple encoders and decoders, which we demonstrate outperforms existing approaches. The encoders (one per input modality) map the multimodal images into modality invariant spatial feature maps. Common and unique information is combined into a fused representation, that is robust to missing modalities, and can be decoded into synthetic images of the target modalities. Different experimental settings demonstrate the benefit of multimodal over unimodal synthesis, although input and output image pairs are required for training. The need for paired images can be overcome with the cycle consistency principle, which we use in conjunction with adversarial training to transform images from one modality (e.g. MR) to images in another (e.g. CT). This is useful especially in cardiac datasets, where different spatial and temporal resolutions make image pairing difficult, if not impossible. Segmentation can also be considered as a form of image synthesis, if one modality consists of semantic maps. We consider the task of extracting segmentation masks for cardiac MR images, and aim to overcome the challenge of limited annotations, by taking into account unannanotated images which are commonly ignored. We achieve this by defining suitable latent spaces, which represent the underlying anatomies (spatial latent variable), as well as the imaging characteristics (non-spatial latent variable). Anatomical information is required for tasks such as segmentation and regression, whereas imaging information can capture variability in intensity characteristics for example due to different scanners. We propose two models that disentangle cardiac images at different levels: the first extracts the myocardium from the surrounding information, whereas the second fully separates the anatomical from the imaging characteristics. Experimental analysis confirms the utility of disentangled representations in semi-supervised segmentation, and in regression of cardiac indices, while maintaining robustness to intensity variations such as the ones induced by different modalities. Finally, our prior research is aggregated into one framework that encodes multimodal images into disentangled anatomical and imaging factors. Several challenges of multimodal cardiac imaging, such as input misalignments and the lack of expert annotations, are successfully handled in the shared anatomy space. Furthermore, we demonstrate that this approach can be used to combine complementary anatomical information for the purpose of multimodal segmentation. This can be achieved even when no annotations are provided for one of the modalities. This thesis creates new avenues for further research in the area of multimodal and disentangled learning with spatial representations, which we believe are key to more generalised deep learning solutions in healthcare

    Representation learning for generalisation in medical image analysis

    Get PDF
    To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an increasingly crucial role in modern health care. In particular, using machine learning (ML) and deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays scans has been a research hot topic. Accurate and generalisable medical image segmentation using ML and DL is one of the most challenging medical image analysis tasks. The challenges are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data. To tackle the above challenges, one of the best ways is to learn disentangled representations. Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident that learning disentangled representations is well suited for the challenges, there are several open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement. In the medical domain, content-style disentanglement is one of the most effective frameworks to learn disentangled presentations. It disentangles and encodes image “content” into a spatial tensor, and image appearance or “style” into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models. Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting. Humans can quickly learn to accurately recognise anatomy of interest from medical images with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings. Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future
    corecore