2,267 research outputs found
Challenges in Disentangling Independent Factors of Variation
We study the problem of building models that disentangle independent factors
of variation. Such models could be used to encode features that can efficiently
be used for classification and to transfer attributes between different images
in image synthesis. As data we use a weakly labeled training set. Our weak
labels indicate what single factor has changed between two data samples,
although the relative value of the change is unknown. This labeling is of
particular interest as it may be readily available without annotation costs. To
make use of weak labels we introduce an autoencoder model and train it through
constraints on image pairs and triplets. We formally prove that without
additional knowledge there is no guarantee that two images with the same factor
of variation will be mapped to the same feature. We call this issue the
reference ambiguity. Moreover, we show the role of the feature dimensionality
and adversarial training. We demonstrate experimentally that the proposed model
can successfully transfer attributes on several datasets, but show also cases
when the reference ambiguity occurs.Comment: Submitted to ICLR 201
Disentangling Factors of Variation by Mixing Them
We propose an approach to learn image representations that consist of
disentangled factors of variation without exploiting any manual labeling or
data domain knowledge. A factor of variation corresponds to an image attribute
that can be discerned consistently across a set of images, such as the pose or
color of objects. Our disentangled representation consists of a concatenation
of feature chunks, each chunk representing a factor of variation. It supports
applications such as transferring attributes from one image to another, by
simply mixing and unmixing feature chunks, and classification or retrieval
based on one or several attributes, by considering a user-specified subset of
feature chunks. We learn our representation without any labeling or knowledge
of the data domain, using an autoencoder architecture with two novel training
objectives: first, we propose an invariance objective to encourage that
encoding of each attribute, and decoding of each chunk, are invariant to
changes in other attributes and chunks, respectively; second, we include a
classification objective, which ensures that each chunk corresponds to a
consistently discernible attribute in the represented image, hence avoiding
degenerate feature mappings where some chunks are completely ignored. We
demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA
datasets.Comment: CVPR 201
Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders
Generative models that learn disentangled representations for different
factors of variation in an image can be very useful for targeted data
augmentation. By sampling from the disentangled latent subspace of interest, we
can efficiently generate new data necessary for a particular task. Learning
disentangled representations is a challenging problem, especially when certain
factors of variation are difficult to label. In this paper, we introduce a
novel architecture that disentangles the latent space into two complementary
subspaces by using only weak supervision in form of pairwise similarity labels.
Inspired by the recent success of cycle-consistent adversarial architectures,
we use cycle-consistency in a variational auto-encoder framework. Our
non-adversarial approach is in contrast with the recent works that combine
adversarial training with auto-encoders to disentangle representations. We show
compelling results of disentangled latent subspaces on three datasets and
compare with recent works that leverage adversarial training
Learning Disentangled Representations with Reference-Based Variational Autoencoders
Learning disentangled representations from visual data, where different
high-level generative factors are independently encoded, is of importance for
many computer vision tasks. Solving this problem, however, typically requires
to explicitly label all the factors of interest in training images. To
alleviate the annotation cost, we introduce a learning setting which we refer
to as "reference-based disentangling". Given a pool of unlabeled images, the
goal is to learn a representation where a set of target factors are
disentangled from others. The only supervision comes from an auxiliary
"reference set" containing images where the factors of interest are constant.
In order to address this problem, we propose reference-based variational
autoencoders, a novel deep generative model designed to exploit the
weak-supervision provided by the reference set. By addressing tasks such as
feature learning, conditional image generation or attribute transfer, we
validate the ability of the proposed model to learn disentangled
representations from this minimal form of supervision
Factorised spatial representation learning: application in semi-supervised myocardial segmentation
The success and generalisation of deep learning algorithms heavily depend on
learning good feature representations. In medical imaging this entails
representing anatomical information, as well as properties related to the
specific imaging setting. Anatomical information is required to perform further
analysis, whereas imaging information is key to disentangle scanner variability
and potential artefacts. The ability to factorise these would allow for
training algorithms only on the relevant information according to the task. To
date, such factorisation has not been attempted. In this paper, we propose a
methodology of latent space factorisation relying on the cycle-consistency
principle. As an example application, we consider cardiac MR segmentation,
where we separate information related to the myocardium from other features
related to imaging and surrounding substructures. We demonstrate the proposed
method's utility in a semi-supervised setting: we use very few labelled images
together with many unlabelled images to train a myocardium segmentation neural
network. Specifically, we achieve comparable performance to fully supervised
networks using a fraction of labelled images in experiments on ACDC and a
dataset from Edinburgh Imaging Facility QMRI. Code will be made available at
https://github.com/agis85/spatial_factorisation.Comment: Accepted in MICCAI 201
- …