27 research outputs found
Multimodal and disentangled representation learning for medical image analysis
Automated medical image analysis is a growing research field with various applications in
modern healthcare. Furthermore, a multitude of imaging techniques (or modalities) have been
developed, such as Magnetic Resonance (MR) and Computed Tomography (CT), to attenuate
different organ characteristics. Research on image analysis is predominately driven by deep
learning methods due to their demonstrated performance. In this thesis, we argue that their success and generalisation relies on learning good latent representations. We propose methods for
learning spatial representations that are suitable for medical image data, and can combine information coming from different modalities. Specifically, we aim to improve cardiac MR segmentation, a challenging task due to varied images and limited expert annotations, by considering
complementary information present in (potentially unaligned) images of other modalities.
In order to evaluate the benefit of multimodal learning, we initially consider a synthesis task
on spatially aligned multimodal brain MR images. We propose a deep network of multiple
encoders and decoders, which we demonstrate outperforms existing approaches. The encoders
(one per input modality) map the multimodal images into modality invariant spatial feature
maps. Common and unique information is combined into a fused representation, that is robust
to missing modalities, and can be decoded into synthetic images of the target modalities. Different experimental settings demonstrate the benefit of multimodal over unimodal synthesis,
although input and output image pairs are required for training. The need for paired images can
be overcome with the cycle consistency principle, which we use in conjunction with adversarial
training to transform images from one modality (e.g. MR) to images in another (e.g. CT). This
is useful especially in cardiac datasets, where different spatial and temporal resolutions make
image pairing difficult, if not impossible.
Segmentation can also be considered as a form of image synthesis, if one modality consists of
semantic maps. We consider the task of extracting segmentation masks for cardiac MR images,
and aim to overcome the challenge of limited annotations, by taking into account unannanotated images which are commonly ignored. We achieve this by defining suitable latent spaces,
which represent the underlying anatomies (spatial latent variable), as well as the imaging characteristics (non-spatial latent variable). Anatomical information is required for tasks such as
segmentation and regression, whereas imaging information can capture variability in intensity
characteristics for example due to different scanners. We propose two models that disentangle
cardiac images at different levels: the first extracts the myocardium from the surrounding information, whereas the second fully separates the anatomical from the imaging characteristics.
Experimental analysis confirms the utility of disentangled representations in semi-supervised
segmentation, and in regression of cardiac indices, while maintaining robustness to intensity
variations such as the ones induced by different modalities.
Finally, our prior research is aggregated into one framework that encodes multimodal images
into disentangled anatomical and imaging factors. Several challenges of multimodal cardiac
imaging, such as input misalignments and the lack of expert annotations, are successfully handled in the shared anatomy space. Furthermore, we demonstrate that this approach can be used
to combine complementary anatomical information for the purpose of multimodal segmentation. This can be achieved even when no annotations are provided for one of the modalities.
This thesis creates new avenues for further research in the area of multimodal and disentangled learning with spatial representations, which we believe are key to more generalised deep
learning solutions in healthcare
Data synthesis and adversarial networks: A review and meta-analysis in cancer imaging
Despite technological and medical advances, the detection, interpretation, and treatment of cancer based on imaging data continue to pose significant challenges. These include inter-observer variability, class imbalance, dataset shifts, inter- and intra-tumour heterogeneity, malignancy determination, and treatment effect uncertainty. Given the recent advancements in image synthesis, Generative Adversarial Networks (GANs), and adversarial training, we assess the potential of these technologies to address a number of key challenges of cancer imaging. We categorise these challenges into (a) data scarcity and imbalance, (b) data access and privacy, (c) data annotation and segmentation, (d) cancer detection and diagnosis, and (e) tumour profiling, treatment planning and monitoring. Based on our analysis of 164 publications that apply adversarial training techniques in the context of cancer imaging, we highlight multiple underexplored solutions with research potential. We further contribute the Synthesis Study Trustworthiness Test (SynTRUST), a meta-analysis framework for assessing the validation rigour of medical image synthesis studies. SynTRUST is based on 26 concrete measures of thoroughness, reproducibility, usefulness, scalability, and tenability. Based on SynTRUST, we analyse 16 of the most promising cancer imaging challenge solutions and observe a high validation rigour in general, but also several desirable improvements. With this work, we strive to bridge the gap between the needs of the clinical cancer imaging community and the current and prospective research on data synthesis and adversarial networks in the artificial intelligence community
Deep generative models for medical image synthesis and strategies to utilise them
Medical imaging has revolutionised the diagnosis and treatments of diseases since the first
medical image was taken using X-rays in 1895. As medical imaging became an essential tool
in a modern healthcare system, more medical imaging techniques have been invented, such
as Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), Computed
Tomography (CT), Ultrasound, etc. With the advance of medical imaging techniques, the
demand for processing and analysing these complex medical images is increasing rapidly.
Efforts have been put on developing approaches that can automatically analyse medical images. With the recent success of deep learning (DL) in computer vision, researchers have
applied and proposed many DL-based methods in the field of medical image analysis. However, one problem with data-driven DL-based methods is the lack of data. Unlike natural
images, medical images are more expensive to acquire and label. One way to alleviate the
lack of medical data is medical image synthesis.
In this thesis, I first start with pseudo healthy synthesis, which is to create a ‘healthy’ looking
medical image from a pathological one. The synthesised pseudo healthy images can be used
for the detection of pathology, segmentation, etc. Several challenges exist with this task. The
first challenge is the lack of ground-truth data, as a subject cannot be healthy and diseased at
the same time. The second challenge is how to evaluate the generated images. In this thesis,
I propose a deep learning method to learn to generate pseudo healthy images with adversarial
and cycle consistency losses to overcome the lack of ground-truth data. I also propose several
metrics to evaluate the quality of synthetic ‘healthy’ images. Pseudo healthy synthesis can be
viewed as transforming images between discrete domains, e.g. from pathological domain to
healthy domain. However, there are some changes in medical data that are continuous, e.g.
brain ageing progression.
Brain changes as age increases. With the ageing global population, research on brain ageing
has attracted increasing attention. In this thesis, I propose a deep learning method that can
simulate such brain ageing progression. Specifically, longitudinal brain data are not easy to
acquire; if some exist, they only cover several years. Thus, the proposed method focuses on
learning subject-specific brain ageing progression without training on longitudinal data. As
there are other factors, such as neurodegenerative diseases, that can affect brain ageing, the
proposed model also considers health status, i.e. the existence of Alzheimer’s Disease (AD).
Furthermore, to evaluate the quality of synthetic aged images, I define several metrics and
conducted a series of experiments.
Suppose we have a pre-trained deep generative model and a downstream tasks model, say
a classifier. One question is how to make the best of the generative model to improve the
performance of the classifier. In this thesis, I propose a simple procedure that can discover
the ‘weakness’ of the classifier and guide the generator to synthesise counterfactuals (synthetic
data) that are hard for the classifier. The proposed procedure constructs an adversarial
game between generative factors of the generator and the classifier. We demonstrate the effectiveness
of this proposed procedure through a series of experiments. Furthermore, we
consider the application of generative models in a continual learning context and investigate
the usefulness of them to alleviate spurious correlation.
This thesis creates new avenues for further research in the area of medical image synthesis
and how to utilise the medical generative models, which we believe could be important for
future studies in medical image analysis with deep learning
Generating virtual patients of high-resolution MR angiography from non-angiographic multi-contrast MRIs for In-silico trials
Despite success on multi-contrast MR image synthesis, generating specific modalities remains challenging. Those include Magnetic Resonance Angiography (MRA) that highlights details of vascular anatomy using specialised imaging sequences for emphasising inflow effect. This work proposes an end-to-end generative adversarial network that can synthesise anatomically plausible, high-resolution 3D MRA images using commonly acquired multi-contrast MR images (e.g. T1/T2/PD-weighted MR images) for the same subject whilst preserving the continuity of vascular anatomy. A reliable technique for MRA synthesis would unleash the research potential of very few population databases with imaging modalities (such as MRA) that enable quantitative characterisation of whole-brain vasculature. Our work is motivated by the need to generate digital twins and virtual patients of cerebrovascular anatomy for in-silico studies and/or in-silico trials. We propose a dedicated generator and discriminator that leverage the shared and complementary features of multi-source images. We design a composite loss function for emphasising vascular properties by minimising the statistical difference between the feature representations of the target images and the synthesised outputs in both 3D volumetric and 2D projection domains. Experimental results show that the proposed method can synthesise high-quality MRA images and outperform the state-of-the-art generative models both qualitatively and quantitatively. The importance assessment reveals that T2 and PD-weighted images are better predictors of MRA images than T1; and PD-weighted images contribute to better visibility of small vessel branches towards the peripheral regions. In addition, the proposed approach can generalise to unseen data acquired at different imaging centres with different scanners, whilst synthesising MRAs and vascular geometries that maintain vessel continuity. The results show the potential for use of the proposed approach to generating digital twin cohorts of cerebrovascular anatomy at scale from structural MR images typically acquired in population imaging initiatives
A systematic literature review: deep learning techniques for synthetic medical image generation and their applications in radiotherapy
The aim of this systematic review is to determine whether Deep Learning (DL) algorithms can provide a clinically feasible alternative to classic algorithms for synthetic Computer Tomography (sCT). The following categories are presented in this study: ∙ MR-based treatment planning and synthetic CT generation techniques. ∙ Generation of synthetic CT images based on Cone Beam CT images. ∙ Low-dose CT to High-dose CT generation. ∙ Attenuation correction for PET images. To perform appropriate database searches, we reviewed journal articles published between January 2018 and June 2023. Current methodology, study strategies, and results with relevant clinical applications were analyzed as we outlined the state-of-the-art of deep learning based approaches to inter-modality and intra-modality image synthesis. This was accomplished by contrasting the provided methodologies with traditional research approaches. The key contributions of each category were highlighted, specific challenges were identified, and accomplishments were summarized. As a final step, the statistics of all the cited works from various aspects were analyzed, which revealed that DL-based sCTs have achieved considerable popularity, while also showing the potential of this technology. In order to assess the clinical readiness of the presented methods, we examined the current status of DL-based sCT generation
Ea-GANs: Edge-Aware Generative Adversarial Networks for Cross-Modality MR Image Synthesis
Magnetic resonance (MR) imaging is a widely used medical imaging protocol that can be configured to provide different contrasts between the tissues in human body. By setting different scanning parameters, each MR imaging modality reflects the unique visual characteristic of scanned body part, benefiting the subsequent analysis from multiple perspectives. To utilize the complementary information from multiple imaging modalities, cross-modality MR image synthesis has aroused increasing research interest recently. However, most existing methods only focus on minimizing pixel/voxel-wise intensity difference but ignore the textural details of image content structure, which affects the quality of synthesized images. In this paper, we propose edge-aware generative adversarial networks (Ea-GANs) for cross-modality MR image synthesis. Specifically, we integrate edge information, which reflects the textural structure of image content and depicts the boundaries of different objects in images, to reduce this gap. Corresponding to different learning strategies, two frameworks are proposed, i.e., a generator-induced Ea-GAN (gEa-GAN) and a discriminator-induced Ea-GAN (dEa-GAN). The gEa-GAN incorporates the edge information via its generator, while the dEa-GAN further does this from both the generator and the discriminator so that the edge similarity is also adversarially learned. In addition, the proposed Ea-GANs are 3D-based and utilize hierarchical features to capture contextual information. The experimental results demonstrate that the proposed Ea-GANs, especially the dEa-GAN, outperform multiple state-of-the-art methods for cross-modality MR image synthesis in both qualitative and quantitative measures. Moreover, the dEa-GAN also shows excellent generality to generic image synthesis tasks on benchmark datasets about facades, maps, and cityscapes
Generative AI for computational pathology
This thesis investigates generative AI algorithms for enhancing diagnostic accuracy and research in computational histopathology, covering image generation techniques including high-resolution tissue images, synthetic histology data, and challenges in large-scale image manipulation. In the field of computational pathology, where histology images are large in size and visual context is crucial, synthesis of large high-resolution images via generative modeling is a challenging task due to memory and computational constraints. To address this challenge, we propose a novel framework called SAFRON (Stitching Across the FROntiers Network). We use the proposed framework for generating, to the best of our knowledge, the largest-sized synthetic histology images to date (up to 11K×8K pixels). We quantitatively assess these images through Frechet Inception Distance and pathologist realism scores. We also propose SynCLay, an interactive framework that generates histology images from cellular layouts, aiding the study of cell roles present in the tumor microenvironment. Coupled with a parametric model, it generates colon images and cellular counts based on the grade of differentiation and cellularities of different cells. We also show that augmenting limited real data with the synthetic data generated by our framework can significantly boost prediction performance of the cellular composition prediction task. Next, we propose TIGGLay (Tissue Image Generation from Glandular Layout) framework that generates realistic colorectal cancer histology images and masks from glandular layouts. We show the appearance of glands can be controlled by user inputs such as number of glands, their locations and sizes. Moreover, the thesis provides a methodology for constructing realistic glandular masks using vector quantized variational autoencoder based frameworks such as latent diffusion models. Finally, we discuss limitations and future research directions. These involve extending the frameworks to generate synthetic images for varied carcinoma types and tissues, as well as generating whole-slide images based on known parameters
Multimodal Adversarial Learning
Deep Convolutional Neural Networks (DCNN) have proven to be an exceptional tool for object recognition, generative modelling, and multi-modal learning in various computer vision applications. However, recent findings have shown that such state-of-the-art models can be easily deceived by inserting slight imperceptible perturbations to key pixels in the input. A good target detection systems can accurately identify targets by localizing their coordinates on the input image of interest. This is ideally achieved by labeling each pixel in an image as a background or a potential target pixel. However, prior research still confirms that such state of the art targets models are susceptible to adversarial attacks. In the case of generative models, facial sketches drawn by artists mostly used by law enforcement agencies depend on the ability of the artist to clearly replicate all the key facial features that aid in capturing the true identity of a subject. Recent works have attempted to synthesize these sketches into plausible visual images to improve visual recognition and identification. However, synthesizing photo-realistic images from sketches proves to be an even more challenging task, especially for sensitive applications such as suspect identification. However, the incorporation of hybrid discriminators, which perform attribute classification of multiple target attributes, a quality guided encoder that minimizes the perceptual dissimilarity of the latent space embedding of the synthesized and real image at different layers in the network have shown to be powerful tools towards better multi modal learning techniques. In general, our overall approach was aimed at improving target detection systems and the visual appeal of synthesized images while incorporating multiple attribute assignment to the generator without compromising the identity of the synthesized image. We synthesized sketches using XDOG filter for the CelebA, Multi-modal and CelebA-HQ datasets and from an auxiliary generator trained on sketches from CUHK, IIT-D and FERET datasets. Our results overall for different model applications are impressive compared to current state of the art
Deep Learning for Music Information Retrieval in Limited Data Scenarios.
PhD ThesisWhile deep learning (DL) models have achieved impressive results in settings
where large amounts of annotated training data are available, over tting often
degrades performance when data is more limited. To improve the generalisation
of DL models, we investigate \data-driven priors" that exploit additional unlabelled
data or labelled data from related tasks. Unlike techniques such as data
augmentation, these priors are applicable across a range of machine listening
tasks, since their design does not rely on problem-speci c knowledge.
We rst consider scenarios in which parts of samples can be missing, aiming to
make more datasets available for model training. In an initial study focusing on
audio source separation (ASS), we exploit additionally available unlabelled music
and solo source recordings by using generative adversarial networks (GANs),
resulting in higher separation quality. We then present a fully adversarial
framework for learning generative models with missing data. Our discriminator
consists of separately trainable components that can be combined to train the
generator with the same objective as in the original GAN framework. We apply
our framework to image generation, image segmentation and ASS, demonstrating
superior performance compared to the original GAN.
To improve performance on any given MIR task, we also aim to leverage
datasets which are annotated for similar tasks. We use multi-task learning (MTL)
to perform singing voice detection and singing voice separation with one model,
improving performance on both tasks. Furthermore, we employ meta-learning
on a diverse collection of ten MIR tasks to nd a weight initialisation for a
\universal MIR model" so that training the model on any MIR task with this
initialisation quickly leads to good performance.
Since our data-driven priors encode knowledge shared across tasks and
datasets, they are suited for high-dimensional, end-to-end models, instead of small
models relying on task-speci c feature engineering, such as xed spectrogram
representations of audio commonly used in machine listening. To this end, we
propose \Wave-U-Net", an adaptation of the U-Net, which can perform ASS
directly on the raw waveform while performing favourably to its spectrogrambased
counterpart. Finally, we derive \Seq-U-Net" as a causal variant of Wave-
U-Net, which performs comparably to Wavenet and Temporal Convolutional
Network (TCN) on a variety of sequence modelling tasks, while being more
computationally e cient.
Image Data Augmentation from Small Training Datasets Using Generative Adversarial Networks (GANs)
The scarcity of labelled data is a serious problem since deep models generally require a large amount of training data to achieve desired performance. Data augmentation is widely adopted to enhance the diversity of original datasets and further improve the performance of deep learning models. Learning-based methods, compared to traditional techniques, are specialized in feature extraction, which enhances the effectiveness of data augmentation.
Generative adversarial networks (GANs), one of the learning-based generative models, have made remarkable advances in data synthesis. However, GANs still face many challenges in generating high-quality augmented images from small datasets because learning-based generative methods are difficult to create reliable outcomes without sufficient training data. This difficulty deteriorates the data augmentation applications using learning-based methods. In this thesis, to tackle the problem of labelled data scarcity and the training difficulty of augmenting image data from small datasets, three novel GAN models suitable for training with a small number of training samples have been proposed based on three different mapping relationships between the input and output images, including one-to-many mapping, one-to-one mapping, and many-to-many mapping. The proposed GANs employ limited training data, such as a small number of images and limited conditional features, and the synthetic images generated by the proposed GANs are expected to generate images of not only high generative quality but also desirable data diversity.
To evaluate the effectiveness of the augmented images generated by the proposed models, inception distances and human perception methods are adopted. Additionally, different image classification tasks were carried out and accuracies from using the original datasets and the augmented datasets were compared. Experimental results illustrate the image classification performance based on convolutional neural networks, i.e., AlexNet, GoogLeNet, ResNet and VGGNet, is comprehensively enhanced, and the scale of improvement is significant when a small number of training samples are involved