Representation learning for generalisation in medical image analysis

Abstract

To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an increasingly crucial role in modern health care. In particular, using machine learning (ML) and deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays scans has been a research hot topic. Accurate and generalisable medical image segmentation using ML and DL is one of the most challenging medical image analysis tasks. The challenges are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data. To tackle the above challenges, one of the best ways is to learn disentangled representations. Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident that learning disentangled representations is well suited for the challenges, there are several open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement. In the medical domain, content-style disentanglement is one of the most effective frameworks to learn disentangled presentations. It disentangles and encodes image “content” into a spatial tensor, and image appearance or “style” into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models. Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting. Humans can quickly learn to accurately recognise anatomy of interest from medical images with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings. Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future

    Similar works