289 research outputs found

    Disentangling Factors of Variation by Mixing Them

    Full text link
    We propose an approach to learn image representations that consist of disentangled factors of variation without exploiting any manual labeling or data domain knowledge. A factor of variation corresponds to an image attribute that can be discerned consistently across a set of images, such as the pose or color of objects. Our disentangled representation consists of a concatenation of feature chunks, each chunk representing a factor of variation. It supports applications such as transferring attributes from one image to another, by simply mixing and unmixing feature chunks, and classification or retrieval based on one or several attributes, by considering a user-specified subset of feature chunks. We learn our representation without any labeling or knowledge of the data domain, using an autoencoder architecture with two novel training objectives: first, we propose an invariance objective to encourage that encoding of each attribute, and decoding of each chunk, are invariant to changes in other attributes and chunks, respectively; second, we include a classification objective, which ensures that each chunk corresponds to a consistently discernible attribute in the represented image, hence avoiding degenerate feature mappings where some chunks are completely ignored. We demonstrate the effectiveness of our approach on the MNIST, Sprites, and CelebA datasets.Comment: CVPR 201

    Geometry-Aware Latent Representation Learning for Modeling Disease Progression of Barrett's Esophagus

    Full text link
    Barrett's Esophagus (BE) is the only precursor known to Esophageal Adenocarcinoma (EAC), a type of esophageal cancer with poor prognosis upon diagnosis. Therefore, diagnosing BE is crucial in preventing and treating esophageal cancer. While supervised machine learning supports BE diagnosis, high interobserver variability in histopathological training data limits these methods. Unsupervised representation learning via Variational Autoencoders (VAEs) shows promise, as they map input data to a lower-dimensional manifold with only useful features, characterizing BE progression for improved downstream tasks and insights. However, the VAE's Euclidean latent space distorts point relationships, hindering disease progression modeling. Geometric VAEs provide additional geometric structure to the latent space, with RHVAE assuming a Riemannian manifold and S\mathcal{S}-VAE a hyperspherical manifold. Our study shows that S\mathcal{S}-VAE outperforms vanilla VAE with better reconstruction losses, representation classification accuracies, and higher-quality generated images and interpolations in lower-dimensional settings. By disentangling rotation information from the latent space, we improve results further using a group-based architecture. Additionally, we take initial steps towards S\mathcal{S}-AE, a novel autoencoder model generating qualitative images without a variational framework, but retaining benefits of autoencoders such as stability and reconstruction quality

    Neural Fourier Transform: A General Approach to Equivariant Representation Learning

    Full text link
    Symmetry learning has proven to be an effective approach for extracting the hidden structure of data, with the concept of equivariance relation playing the central role. However, most of the current studies are built on architectural theory and corresponding assumptions on the form of data. We propose Neural Fourier Transform (NFT), a general framework of learning the latent linear action of the group without assuming explicit knowledge of how the group acts on data. We present the theoretical foundations of NFT and show that the existence of a linear equivariant feature, which has been assumed ubiquitously in equivariance learning, is equivalent to the existence of a group invariant kernel on the dataspace. We also provide experimental results to demonstrate the application of NFT in typical scenarios with varying levels of knowledge about the acting group

    Learning Geometric Representations of Objects via Interaction

    Full text link
    We address the problem of learning representations from observations of a scene involving an agent and an external object the agent interacts with. To this end, we propose a representation learning framework extracting the location in physical space of both the agent and the object from unstructured observations of arbitrary nature. Our framework relies on the actions performed by the agent as the only source of supervision, while assuming that the object is displaced by the agent via unknown dynamics. We provide a theoretical foundation and formally prove that an ideal learner is guaranteed to infer an isometric representation, disentangling the agent from the object and correctly extracting their locations. We evaluate empirically our framework on a variety of scenarios, showing that it outperforms vision-based approaches such as a state-of-the-art keypoint extractor. We moreover demonstrate how the extracted representations enable the agent to solve downstream tasks via reinforcement learning in an efficient manner

    Boosting Deep Neural Networks with Geometrical Prior Knowledge: A Survey

    Full text link
    While Deep Neural Networks (DNNs) achieve state-of-the-art results in many different problem settings, they are affected by some crucial weaknesses. On the one hand, DNNs depend on exploiting a vast amount of training data, whose labeling process is time-consuming and expensive. On the other hand, DNNs are often treated as black box systems, which complicates their evaluation and validation. Both problems can be mitigated by incorporating prior knowledge into the DNN. One promising field, inspired by the success of convolutional neural networks (CNNs) in computer vision tasks, is to incorporate knowledge about symmetric geometrical transformations of the problem to solve. This promises an increased data-efficiency and filter responses that are interpretable more easily. In this survey, we try to give a concise overview about different approaches to incorporate geometrical prior knowledge into DNNs. Additionally, we try to connect those methods to the field of 3D object detection for autonomous driving, where we expect promising results applying those methods.Comment: Survey Pape
    corecore