289 research outputs found
Disentangling Factors of Variation by Mixing Them
We propose an approach to learn image representations that consist of
disentangled factors of variation without exploiting any manual labeling or
data domain knowledge. A factor of variation corresponds to an image attribute
that can be discerned consistently across a set of images, such as the pose or
color of objects. Our disentangled representation consists of a concatenation
of feature chunks, each chunk representing a factor of variation. It supports
applications such as transferring attributes from one image to another, by
simply mixing and unmixing feature chunks, and classification or retrieval
based on one or several attributes, by considering a user-specified subset of
feature chunks. We learn our representation without any labeling or knowledge
of the data domain, using an autoencoder architecture with two novel training
objectives: first, we propose an invariance objective to encourage that
encoding of each attribute, and decoding of each chunk, are invariant to
changes in other attributes and chunks, respectively; second, we include a
classification objective, which ensures that each chunk corresponds to a
consistently discernible attribute in the represented image, hence avoiding
degenerate feature mappings where some chunks are completely ignored. We
demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA
datasets.Comment: CVPR 201
Geometry-Aware Latent Representation Learning for Modeling Disease Progression of Barrett's Esophagus
Barrett's Esophagus (BE) is the only precursor known to Esophageal
Adenocarcinoma (EAC), a type of esophageal cancer with poor prognosis upon
diagnosis. Therefore, diagnosing BE is crucial in preventing and treating
esophageal cancer. While supervised machine learning supports BE diagnosis,
high interobserver variability in histopathological training data limits these
methods. Unsupervised representation learning via Variational Autoencoders
(VAEs) shows promise, as they map input data to a lower-dimensional manifold
with only useful features, characterizing BE progression for improved
downstream tasks and insights. However, the VAE's Euclidean latent space
distorts point relationships, hindering disease progression modeling. Geometric
VAEs provide additional geometric structure to the latent space, with RHVAE
assuming a Riemannian manifold and -VAE a hyperspherical manifold.
Our study shows that -VAE outperforms vanilla VAE with better
reconstruction losses, representation classification accuracies, and
higher-quality generated images and interpolations in lower-dimensional
settings. By disentangling rotation information from the latent space, we
improve results further using a group-based architecture. Additionally, we take
initial steps towards -AE, a novel autoencoder model generating
qualitative images without a variational framework, but retaining benefits of
autoencoders such as stability and reconstruction quality
Neural Fourier Transform: A General Approach to Equivariant Representation Learning
Symmetry learning has proven to be an effective approach for extracting the
hidden structure of data, with the concept of equivariance relation playing the
central role. However, most of the current studies are built on architectural
theory and corresponding assumptions on the form of data. We propose Neural
Fourier Transform (NFT), a general framework of learning the latent linear
action of the group without assuming explicit knowledge of how the group acts
on data. We present the theoretical foundations of NFT and show that the
existence of a linear equivariant feature, which has been assumed ubiquitously
in equivariance learning, is equivalent to the existence of a group invariant
kernel on the dataspace. We also provide experimental results to demonstrate
the application of NFT in typical scenarios with varying levels of knowledge
about the acting group
Learning Geometric Representations of Objects via Interaction
We address the problem of learning representations from observations of a
scene involving an agent and an external object the agent interacts with. To
this end, we propose a representation learning framework extracting the
location in physical space of both the agent and the object from unstructured
observations of arbitrary nature. Our framework relies on the actions performed
by the agent as the only source of supervision, while assuming that the object
is displaced by the agent via unknown dynamics. We provide a theoretical
foundation and formally prove that an ideal learner is guaranteed to infer an
isometric representation, disentangling the agent from the object and correctly
extracting their locations. We evaluate empirically our framework on a variety
of scenarios, showing that it outperforms vision-based approaches such as a
state-of-the-art keypoint extractor. We moreover demonstrate how the extracted
representations enable the agent to solve downstream tasks via reinforcement
learning in an efficient manner
Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey
While Deep Neural Networks (DNNs) achieve state-of-the-art results in many
different problem settings, they are affected by some crucial weaknesses. On
the one hand, DNNs depend on exploiting a vast amount of training data, whose
labeling process is time-consuming and expensive. On the other hand, DNNs are
often treated as black box systems, which complicates their evaluation and
validation. Both problems can be mitigated by incorporating prior knowledge
into the DNN.
One promising field, inspired by the success of convolutional neural networks
(CNNs) in computer vision tasks, is to incorporate knowledge about symmetric
geometrical transformations of the problem to solve. This promises an increased
data-efficiency and filter responses that are interpretable more easily. In
this survey, we try to give a concise overview about different approaches to
incorporate geometrical prior knowledge into DNNs. Additionally, we try to
connect those methods to the field of 3D object detection for autonomous
driving, where we expect promising results applying those methods.Comment: Survey Pape
- …