638 research outputs found
A Style-Based Generator Architecture for Generative Adversarial Networks
We propose an alternative generator architecture for generative adversarial
networks, borrowing from style transfer literature. The new architecture leads
to an automatically learned, unsupervised separation of high-level attributes
(e.g., pose and identity when trained on human faces) and stochastic variation
in the generated images (e.g., freckles, hair), and it enables intuitive,
scale-specific control of the synthesis. The new generator improves the
state-of-the-art in terms of traditional distribution quality metrics, leads to
demonstrably better interpolation properties, and also better disentangles the
latent factors of variation. To quantify interpolation quality and
disentanglement, we propose two new, automated methods that are applicable to
any generator architecture. Finally, we introduce a new, highly varied and
high-quality dataset of human faces.Comment: CVPR 2019 final versio
Learning Disentangled Representations in the Imaging Domain
Disentangled representation learning has been proposed as an approach to
learning general representations even in the absence of, or with limited,
supervision. A good general representation can be fine-tuned for new target
tasks using modest amounts of data, or used directly in unseen domains
achieving remarkable performance in the corresponding task. This alleviation of
the data and annotation requirements offers tantalising prospects for
applications in computer vision and healthcare. In this tutorial paper, we
motivate the need for disentangled representations, present key theory, and
detail practical building blocks and criteria for learning such
representations. We discuss applications in medical imaging and computer vision
emphasising choices made in exemplar key works. We conclude by presenting
remaining challenges and opportunities.Comment: Submitted. This paper follows a tutorial style but also surveys a
considerable (more than 200 citations) number of work
Representation learning for generalisation in medical image analysis
To help diagnose, treat, manage, prevent and predict diseases, medical image analysis plays an
increasingly crucial role in modern health care. In particular, using machine learning (ML) and
deep learning (DL) techniques to process medical imaging data such as MRI, CT and X-Rays
scans has been a research hot topic. Accurate and generalisable medical image segmentation
using ML and DL is one of the most challenging medical image analysis tasks. The challenges
are mainly caused by two key reasons: a) the variations of data statistics across different clinical centres or hospitals, and b) the lack of extensive annotations of medical data.
To tackle the above challenges, one of the best ways is to learn disentangled representations.
Learning disentangled representations aims to separate out, or disentangle, the underlying explanatory generative factors into disjoint subsets. Importantly, disentangled representations can be efficiently learnt from raw training data with limited annotations. Although, it is evident
that learning disentangled representations is well suited for the challenges, there are several
open problems in this area. First, there is no work to systematically study how much disentanglement is achieved with different learning and design biases and how different biases affect the task performance for medical data. Second, the benefit of leveraging disentanglement to design models that generalise well on new data has not been well studied especially in medical domain. Finally, the independence prior for disentanglement is a too strong assumption that does not approximate well the true generative factors. According to these problems, this thesis focuses on understanding the role of disentanglement in medical image analysis, measuring how different biases affect disentanglement and the task performance, and then finally using disentangled representations to improve generalisation performance and exploring better representations beyond disentanglement.
In the medical domain, content-style disentanglement is one of the most effective frameworks
to learn disentangled presentations. It disentangles and encodes image âcontentâ into a spatial
tensor, and image appearance or âstyleâ into a vector that contains information on imaging characteristics. Based on an extensive review of disentanglement, I conclude that it is unclear how different design and learning biases affect the performance of content-style disentanglement methods. Hence, two metrics are proposed to measure the degree of content-style disentanglement by evaluating the informativeness and correlation of representations. By modifying the design and learning biases in three popular content-style disentanglement models, the degree of disentanglement and task performance of different model variants have been evaluated. A key conclusion is that there exists a sweet spot between task performance and the degree of disentanglement; achieving this sweet spot is the key to design disentanglement models.
Generalising deep models to new data from new centres (termed here domains) remains a challenge. This is largely attributed to shifts in data statistics (domain shifts) between source and unseen domains. With the findings of aforementioned disentanglement metrics study, I design two content-style disentanglement approaches for generalisation. First, I propose two data augmentation methods that improve generalisation. The Resolution Augmentation method generates more diverse data by rescaling images to different resolutions. Subsequently, the Factor-based Augmentation method generates more diverse data by projecting the original samples onto disentangled latent spaces, and combining the learned content and style factors from different domains. To learn more generalisable representations, I integrate gradient-based meta-learning in disentanglement. Gradient-based meta-learning splits the training data into meta-train and meta-test sets to simulate and handle the domain shifts during training, which has shown superior generalisation performance. Considering limited annotations of data, I propose a novel semi-supervised meta-learning framework with disentanglement. I explicitly model the representations related to domain shifts. Disentangling the representations and combining them to reconstruct the input image, allows unlabeled data to be used to better approximate the true domain shifts within a meta-learning setting.
Humans can quickly learn to accurately recognise anatomy of interest from medical images
with limited guidance. Such recognition ability can easily generalise to new images from different clinical centres and new tasks in other contexts. This rapid and generalisable learning
ability is mostly due to the compositional structure of image patterns in the human brain, which is less incorporated in the medical domain. In this thesis, I explore how compositionality can be applied to learning more interpretable and generalisable representations. Overall, I propose that the ground-truth generative factors that generate the medical images satisfy the compositional equivariance property. Hence, a good representation that approximates well the ground-truth factor has to be compositionally equivariant. By modelling the compositional representations with the learnable von-Mises-Fisher kernels, I explore how different design and learning biases can be used to enforce the representations to be more compositionally equivariant under different learning settings.
Overall, this thesis creates new avenues for further research in the area of generalisable representation learning in medical image analysis, which we believe are key to more generalised machine learning and deep learning solutions in healthcare. In particular, the proposed metrics can be used to guide future work on designing better content-style frameworks. The disentanglement-based meta-learning approach sheds light on leveraging meta-learning for better model generalisation in a low-data regime. Finally, compositional representation learning we believe will play an increasingly important role in designing more generalisable and interpretable models in the future
Unsupervised Controllable Generation with Self-Training
Recent generative adversarial networks (GANs) are able to generate impressive
photo-realistic images. However, controllable generation with GANs remains a
challenging research problem. Achieving controllable generation requires
semantically interpretable and disentangled factors of variation. It is
challenging to achieve this goal using simple fixed distributions such as
Gaussian distribution. Instead, we propose an unsupervised framework to learn a
distribution of latent codes that control the generator through self-training.
Self-training provides an iterative feedback in the GAN training, from the
discriminator to the generator, and progressively improves the proposal of the
latent codes as training proceeds. The latent codes are sampled from a latent
variable model that is learned in the feature space of the discriminator. We
consider a normalized independent component analysis model and learn its
parameters through tensor factorization of the higher-order moments. Our
framework exhibits better disentanglement compared to other variants such as
the variational autoencoder, and is able to discover semantically meaningful
latent codes without any supervision. We demonstrate empirically on both cars
and faces datasets that each group of elements in the learned code controls a
mode of variation with a semantic meaning, e.g. pose or background change. We
also demonstrate with quantitative metrics that our method generates better
results compared to other approaches
Unsupervised Controllable Generation with Self-Training
Recent generative adversarial networks (GANs) are able to generate impressive photo-realistic images. However, controllable generation with GANs remains a challenging research problem. Achieving controllable generation requires semantically interpretable and disentangled factors of variation. It is challenging to achieve this goal using simple fixed distributions such as Gaussian distribution. Instead, we propose an unsupervised framework to learn a distribution of latent codes that control the generator through self-training. Self-training provides an iterative feedback in the GAN training, from the discriminator to the generator, and progressively improves the proposal of the latent codes as training proceeds. The latent codes are sampled from a latent variable model that is learned in the feature space of the discriminator. We consider a normalized independent component analysis model and learn its parameters through tensor factorization of the higher-order moments. Our framework exhibits better disentanglement compared to other variants such as the variational autoencoder, and is able to discover semantically meaningful latent codes without any supervision. We demonstrate empirically on both cars and faces datasets that each group of elements in the learned code controls a mode of variation with a semantic meaning, e.g. pose or background change. We also demonstrate with quantitative metrics that our method generates better results compared to other approaches
-Flow: Joint Semantic and Style Editing of Facial Images
The high-quality images yielded by generative adversarial networks (GANs)
have motivated investigations into their application for image editing.
However, GANs are often limited in the control they provide for performing
specific edits. One of the principal challenges is the entangled latent space
of GANs, which is not directly suitable for performing independent and detailed
edits. Recent editing methods allow for either controlled style edits or
controlled semantic edits. In addition, methods that use semantic masks to edit
images have difficulty preserving the identity and are unable to perform
controlled style edits. We propose a method to disentangle a GANs
latent space into semantic and style spaces, enabling controlled semantic and
style edits for face images independently within the same framework. To achieve
this, we design an encoder-decoder based network architecture (-Flow),
which incorporates two proposed inductive biases. We show the suitability of
-Flow quantitatively and qualitatively by performing various semantic and
style edits.Comment: Accepted to BMVC 202
Compositionally Equivariant Representation Learning
Deep learning models often need sufficient supervision (i.e. labelled data)
in order to be trained effectively. By contrast, humans can swiftly learn to
identify important anatomy in medical images like MRI and CT scans, with
minimal guidance. This recognition capability easily generalises to new images
from different medical facilities and to new tasks in different settings. This
rapid and generalisable learning ability is largely due to the compositional
structure of image patterns in the human brain, which are not well represented
in current medical models. In this paper, we study the utilisation of
compositionality in learning more interpretable and generalisable
representations for medical image segmentation. Overall, we propose that the
underlying generative factors that are used to generate the medical images
satisfy compositional equivariance property, where each factor is compositional
(e.g. corresponds to the structures in human anatomy) and also equivariant to
the task. Hence, a good representation that approximates well the ground truth
factor has to be compositionally equivariant. By modelling the compositional
representations with learnable von-Mises-Fisher (vMF) kernels, we explore how
different design and learning biases can be used to enforce the representations
to be more compositionally equivariant under un-, weakly-, and semi-supervised
settings. Extensive results show that our methods achieve the best performance
over several strong baselines on the task of semi-supervised domain-generalised
medical image segmentation. Code will be made publicly available upon
acceptance at https://github.com/vios-s.Comment: Submitted. 10 pages. arXiv admin note: text overlap with
arXiv:2206.1453
- âŠ