201 research outputs found

    Structured Disentangling Networks for Learning Deformation Invariant Latent Spaces

    Get PDF
    abstract: Disentangling latent spaces is an important research direction in the interpretability of unsupervised machine learning. Several recent works using deep learning are very effective at producing disentangled representations. However, in the unsupervised setting, there is no way to pre-specify which part of the latent space captures specific factors of variations. While this is generally a hard problem because of the non-existence of analytical expressions to capture these variations, there are certain factors like geometric transforms that can be expressed analytically. Furthermore, in existing frameworks, the disentangled values are also not interpretable. The focus of this work is to disentangle these geometric factors of variations (which turn out to be nuisance factors for many applications) from the semantic content of the signal in an interpretable manner which in turn makes the features more discriminative. Experiments are designed to show the modularity of the approach with other disentangling strategies as well as on multiple one-dimensional (1D) and two-dimensional (2D) datasets, clearly indicating the efficacy of the proposed approach.Dissertation/ThesisMasters Thesis Electrical Engineering 201

    Learning disentangled representations of satellite image time series in a weakly supervised manner

    Get PDF
    Cette thèse se focalise sur l'apprentissage de représentations de séries temporelles d'images satellites via des méthodes d'apprentissage non supervisé. Le but principal est de créer une représentation qui capture l'information la plus pertinente de la série temporelle afin d'effectuer d'autres applications d'imagerie satellite. Cependant, l'extraction d'information à partir de la donnée satellite implique de nombreux défis. D'un côté, les modèles doivent traiter d'énormes volumes d'images fournis par les satellites. D'un autre côté, il est impossible pour les opérateurs humains d'étiqueter manuellement un tel volume d'images pour chaque tâche (par exemple, la classification, la segmentation, la détection de changement, etc.). Par conséquent, les méthodes d'apprentissage supervisé qui ont besoin des étiquettes ne peuvent pas être appliquées pour analyser la donnée satellite. Pour résoudre ce problème, des algorithmes d'apprentissage non supervisé ont été proposés pour apprendre la structure de la donnée au lieu d'apprendre une tâche particulière. L'apprentissage non supervisé est une approche puissante, car aucune étiquette n'est nécessaire et la connaissance acquise sur la donnée peut être transférée vers d'autres tâches permettant un apprentissage plus rapide avec moins d'étiquettes. Dans ce travail, on étudie le problème de l'apprentissage de représentations démêlées de séries temporelles d'images satellites. Le but consiste à créer une représentation partagée qui capture l'information spatiale de la série temporelle et une représentation exclusive qui capture l'information temporelle spécifique à chaque image. On présente les avantages de créer des représentations spatio-temporelles. Par exemple, l'information spatiale est utile pour effectuer la classification ou la segmentation d'images de manière invariante dans le temps tandis que l'information temporelle est utile pour la détection de changement. Pour ce faire, on analyse plusieurs modèles d'apprentissage non supervisé tels que l'auto-encodeur variationnel (VAE) et les réseaux antagonistes génératifs (GANs) ainsi que les extensions de ces modèles pour effectuer le démêlage des représentations. Considérant les résultats impressionnants qui ont été obtenus par les modèles génératifs et reconstructifs, on propose un nouveau modèle qui crée une représentation spatiale et une représentation temporelle de la donnée satellite. On montre que les représentations démêlées peuvent être utilisées pour effectuer plusieurs tâches de vision par ordinateur surpassant d'autres modèles de l'état de l'art. Cependant, nos expériences suggèrent que les modèles génératifs et reconstructifs présentent des inconvénients liés à la dimensionnalité de la représentation, à la complexité de l'architecture et au manque de garanties sur le démêlage. Pour surmonter ces limitations, on étudie une méthode récente basée sur l'estimation et la maximisation de l'informations mutuelle sans compter sur la reconstruction ou la génération d'image. On propose un nouveau modèle qui étend le principe de maximisation de l'information mutuelle pour démêler le domaine de représentation. En plus des expériences réalisées sur la donnée satellite, on montre que notre modèle est capable de traiter différents types de données en étant plus performant que les méthodes basées sur les GANs et les VAEs. De plus, on prouve que notre modèle demande moins de puissance de calcul et pourtant est plus efficace. Enfin, on montre que notre modèle est utile pour créer une représentation qui capture uniquement l'information de classe entre deux images appartenant à la même catégorie. Démêler la classe ou la catégorie d'une image des autres facteurs de variation permet de calculer la similarité entre pixels et effectuer la segmentation d'image d'une manière faiblement supervisée.This work focuses on learning data representations of satellite image time series via an unsupervised learning approach. The main goal is to enforce the data representation to capture the relevant information from the time series to perform other applications of satellite imagery. However, extracting information from satellite data involves many challenges since models need to deal with massive amounts of images provided by Earth observation satellites. Additionally, it is impossible for human operators to label such amount of images manually for each individual task (e.g. classification, segmentation, change detection, etc.). Therefore, we cannot use the supervised learning framework which achieves state-of-the-art results in many tasks.To address this problem, unsupervised learning algorithms have been proposed to learn the data structure instead of performing a specific task. Unsupervised learning is a powerful approach since no labels are required during training and the knowledge acquired can be transferred to other tasks enabling faster learning with few labels.In this work, we investigate the problem of learning disentangled representations of satellite image time series where a shared representation captures the spatial information across the images of the time series and an exclusive representation captures the temporal information which is specific to each image. We present the benefits of disentangling the spatio-temporal information of time series, e.g. the spatial information is useful to perform time-invariant image classification or segmentation while the knowledge about the temporal information is useful for change detection. To accomplish this, we analyze some of the most prevalent unsupervised learning models such as the variational autoencoder (VAE) and the generative adversarial networks (GANs) as well as the extensions of these models to perform representation disentanglement. Encouraged by the successful results achieved by generative and reconstructive models, we propose a novel framework to learn spatio-temporal representations of satellite data. We prove that the learned disentangled representations can be used to perform several computer vision tasks such as classification, segmentation, information retrieval and change detection outperforming other state-of-the-art models. Nevertheless, our experiments suggest that generative and reconstructive models present some drawbacks related to the dimensionality of the data representation, architecture complexity and the lack of disentanglement guarantees. In order to overcome these limitations, we explore a recent method based on mutual information estimation and maximization for representation learning without relying on image reconstruction or image generation. We propose a new model that extends the mutual information maximization principle to disentangle the representation domain into two parts. In addition to the experiments performed on satellite data, we show that our model is able to deal with different kinds of datasets outperforming the state-of-the-art methods based on GANs and VAEs. Furthermore, we show that our mutual information based model is less computationally demanding yet more effective. Finally, we show that our model is useful to create a data representation that only captures the class information between two images belonging to the same category. Disentangling the class or category of an image from other factors of variation provides a powerful tool to compute the similarity between pixels and perform image segmentation in a weakly-supervised manner

    Physics and Chemistry from Parsimonious Representations: Image Analysis via Invariant Variational Autoencoders

    Full text link
    Electron, optical, and scanning probe microscopy methods are generating ever increasing volume of image data containing information on atomic and mesoscale structures and functionalities. This necessitates the development of the machine learning methods for discovery of physical and chemical phenomena from the data, such as manifestations of symmetry breaking in electron and scanning tunneling microscopy images, variability of the nanoparticles. Variational autoencoders (VAEs) are emerging as a powerful paradigm for the unsupervised data analysis, allowing to disentangle the factors of variability and discover optimal parsimonious representation. Here, we summarize recent developments in VAEs, covering the basic principles and intuition behind the VAEs. The invariant VAEs are introduced as an approach to accommodate scale and translation invariances present in imaging data and separate known factors of variations from the ones to be discovered. We further describe the opportunities enabled by the control over VAE architecture, including conditional, semi-supervised, and joint VAEs. Several case studies of VAE applications for toy models and experimental data sets in Scanning Transmission Electron Microscopy are discussed, emphasizing the deep connection between VAE and basic physical principles. All the codes used here are available at https://github.com/saimani5/VAE-tutorials and this article can be used as an application guide when applying these to own data sets.Comment: 55 pages, 16 figure

    Reconstructing continuous distributions of 3D protein structure from cryo-EM images

    Full text link
    Cryo-electron microscopy (cryo-EM) is a powerful technique for determining the structure of proteins and other macromolecular complexes at near-atomic resolution. In single particle cryo-EM, the central problem is to reconstruct the three-dimensional structure of a macromolecule from 104−710^{4-7} noisy and randomly oriented two-dimensional projections. However, the imaged protein complexes may exhibit structural variability, which complicates reconstruction and is typically addressed using discrete clustering approaches that fail to capture the full range of protein dynamics. Here, we introduce a novel method for cryo-EM reconstruction that extends naturally to modeling continuous generative factors of structural heterogeneity. This method encodes structures in Fourier space using coordinate-based deep neural networks, and trains these networks from unlabeled 2D cryo-EM images by combining exact inference over image orientation with variational inference for structural heterogeneity. We demonstrate that the proposed method, termed cryoDRGN, can perform ab initio reconstruction of 3D protein complexes from simulated and real 2D cryo-EM image data. To our knowledge, cryoDRGN is the first neural network-based approach for cryo-EM reconstruction and the first end-to-end method for directly reconstructing continuous ensembles of protein structures from cryo-EM images

    Disentangling Adversarial Robustness and Generalization

    Full text link
    Obtaining deep networks that are robust against adversarial examples and generalize well is an open problem. A recent hypothesis even states that both robust and accurate models are impossible, i.e., adversarial robustness and generalization are conflicting goals. In an effort to clarify the relationship between robustness and generalization, we assume an underlying, low-dimensional data manifold and show that: 1. regular adversarial examples leave the manifold; 2. adversarial examples constrained to the manifold, i.e., on-manifold adversarial examples, exist; 3. on-manifold adversarial examples are generalization errors, and on-manifold adversarial training boosts generalization; 4. regular robustness and generalization are not necessarily contradicting goals. These assumptions imply that both robust and accurate models are possible. However, different models (architectures, training strategies etc.) can exhibit different robustness and generalization characteristics. To confirm our claims, we present extensive experiments on synthetic data (with known manifold) as well as on EMNIST, Fashion-MNIST and CelebA.Comment: Conference on Computer Vision and Pattern Recognition 201
    • …
    corecore