1,292 research outputs found

    Learning disentangled representations of satellite image time series in a weakly supervised manner

    Get PDF
    Cette thèse se focalise sur l'apprentissage de représentations de séries temporelles d'images satellites via des méthodes d'apprentissage non supervisé. Le but principal est de créer une représentation qui capture l'information la plus pertinente de la série temporelle afin d'effectuer d'autres applications d'imagerie satellite. Cependant, l'extraction d'information à partir de la donnée satellite implique de nombreux défis. D'un côté, les modèles doivent traiter d'énormes volumes d'images fournis par les satellites. D'un autre côté, il est impossible pour les opérateurs humains d'étiqueter manuellement un tel volume d'images pour chaque tâche (par exemple, la classification, la segmentation, la détection de changement, etc.). Par conséquent, les méthodes d'apprentissage supervisé qui ont besoin des étiquettes ne peuvent pas être appliquées pour analyser la donnée satellite. Pour résoudre ce problème, des algorithmes d'apprentissage non supervisé ont été proposés pour apprendre la structure de la donnée au lieu d'apprendre une tâche particulière. L'apprentissage non supervisé est une approche puissante, car aucune étiquette n'est nécessaire et la connaissance acquise sur la donnée peut être transférée vers d'autres tâches permettant un apprentissage plus rapide avec moins d'étiquettes. Dans ce travail, on étudie le problème de l'apprentissage de représentations démêlées de séries temporelles d'images satellites. Le but consiste à créer une représentation partagée qui capture l'information spatiale de la série temporelle et une représentation exclusive qui capture l'information temporelle spécifique à chaque image. On présente les avantages de créer des représentations spatio-temporelles. Par exemple, l'information spatiale est utile pour effectuer la classification ou la segmentation d'images de manière invariante dans le temps tandis que l'information temporelle est utile pour la détection de changement. Pour ce faire, on analyse plusieurs modèles d'apprentissage non supervisé tels que l'auto-encodeur variationnel (VAE) et les réseaux antagonistes génératifs (GANs) ainsi que les extensions de ces modèles pour effectuer le démêlage des représentations. Considérant les résultats impressionnants qui ont été obtenus par les modèles génératifs et reconstructifs, on propose un nouveau modèle qui crée une représentation spatiale et une représentation temporelle de la donnée satellite. On montre que les représentations démêlées peuvent être utilisées pour effectuer plusieurs tâches de vision par ordinateur surpassant d'autres modèles de l'état de l'art. Cependant, nos expériences suggèrent que les modèles génératifs et reconstructifs présentent des inconvénients liés à la dimensionnalité de la représentation, à la complexité de l'architecture et au manque de garanties sur le démêlage. Pour surmonter ces limitations, on étudie une méthode récente basée sur l'estimation et la maximisation de l'informations mutuelle sans compter sur la reconstruction ou la génération d'image. On propose un nouveau modèle qui étend le principe de maximisation de l'information mutuelle pour démêler le domaine de représentation. En plus des expériences réalisées sur la donnée satellite, on montre que notre modèle est capable de traiter différents types de données en étant plus performant que les méthodes basées sur les GANs et les VAEs. De plus, on prouve que notre modèle demande moins de puissance de calcul et pourtant est plus efficace. Enfin, on montre que notre modèle est utile pour créer une représentation qui capture uniquement l'information de classe entre deux images appartenant à la même catégorie. Démêler la classe ou la catégorie d'une image des autres facteurs de variation permet de calculer la similarité entre pixels et effectuer la segmentation d'image d'une manière faiblement supervisée.This work focuses on learning data representations of satellite image time series via an unsupervised learning approach. The main goal is to enforce the data representation to capture the relevant information from the time series to perform other applications of satellite imagery. However, extracting information from satellite data involves many challenges since models need to deal with massive amounts of images provided by Earth observation satellites. Additionally, it is impossible for human operators to label such amount of images manually for each individual task (e.g. classification, segmentation, change detection, etc.). Therefore, we cannot use the supervised learning framework which achieves state-of-the-art results in many tasks.To address this problem, unsupervised learning algorithms have been proposed to learn the data structure instead of performing a specific task. Unsupervised learning is a powerful approach since no labels are required during training and the knowledge acquired can be transferred to other tasks enabling faster learning with few labels.In this work, we investigate the problem of learning disentangled representations of satellite image time series where a shared representation captures the spatial information across the images of the time series and an exclusive representation captures the temporal information which is specific to each image. We present the benefits of disentangling the spatio-temporal information of time series, e.g. the spatial information is useful to perform time-invariant image classification or segmentation while the knowledge about the temporal information is useful for change detection. To accomplish this, we analyze some of the most prevalent unsupervised learning models such as the variational autoencoder (VAE) and the generative adversarial networks (GANs) as well as the extensions of these models to perform representation disentanglement. Encouraged by the successful results achieved by generative and reconstructive models, we propose a novel framework to learn spatio-temporal representations of satellite data. We prove that the learned disentangled representations can be used to perform several computer vision tasks such as classification, segmentation, information retrieval and change detection outperforming other state-of-the-art models. Nevertheless, our experiments suggest that generative and reconstructive models present some drawbacks related to the dimensionality of the data representation, architecture complexity and the lack of disentanglement guarantees. In order to overcome these limitations, we explore a recent method based on mutual information estimation and maximization for representation learning without relying on image reconstruction or image generation. We propose a new model that extends the mutual information maximization principle to disentangle the representation domain into two parts. In addition to the experiments performed on satellite data, we show that our model is able to deal with different kinds of datasets outperforming the state-of-the-art methods based on GANs and VAEs. Furthermore, we show that our mutual information based model is less computationally demanding yet more effective. Finally, we show that our model is useful to create a data representation that only captures the class information between two images belonging to the same category. Disentangling the class or category of an image from other factors of variation provides a powerful tool to compute the similarity between pixels and perform image segmentation in a weakly-supervised manner

    Deep Nuisance Disentanglement for Robust Object Detection from Unmanned Aerial Vehicles

    Get PDF
    Object detection from images captured by Unmanned Aerial Vehicles (UAVs) is becoming dramatically useful. Despite the great success of the generic object detection methods trained on ground-to-ground images, a huge performance drop is observed when these methods are directly applied to images captured by UAVs. The unsatisfactory performance is owing to many UAV-specific nuisances, such as varying flying altitudes, adverse weather conditions, dynamically changing viewing angles, etc., constituting a large number of fine-grained domains across which the detection model has to stay robust. Fortunately, UAVs record meta-data corresponding to the same varying attributes, which can either be freely available along with the UAV images, or easily obtained. We propose to utilize the free meta-data in conjunction with the associated UAV images to learn domain-robust features via an adversarial training framework. This model is dubbed Nuisance Disentangled Feature Transforms (NDFT), for the specific challenging problem of object detection in UAV images. It achieves a substantial gain in robustness to these nuisances. This work demonstrates the effectiveness of our proposed algorithm by showing both quantitative improvements on two existing UAV-based object detection benchmarks, as well as qualitative improvements on self-collected UAV imagery. Reprinted with permission from the Abstract section of Deep Nuisance Disentanglement for Robust Object Detection from Unmanned Aerial Vehicles by Zhenyu Wu† , Karthik Suresh† , Priya Narayanan, Hongyu Xu, Heesung Kwon, Zhangyang Wang, 2019, International Conference on Computer Vision (ICCV 2019) Proceedings (Under Review). † indicates equal contributio

    Generative feature extraction from sentinel 1 and 2 data for prediction of forest aboveground biomass in the Italian Alps

    Get PDF
    —Aboveground biomass (AGB) is an important forest attribute directly linked to the forest carbon pool. The use of satellite remote sensing (RS) data has increased for AGB prediction due to their large footprint and low-cost availability. However, they have been limited due to saturation effect that leads to low prediction precision. In this article, we propose an innovative and dynamic architecture based on generative neural network that extracts target oriented generative features for predicting forest AGB using satellite RS data. These features are more resilient to mixed forest types and geographical conditions as compared to the traditional features and models. The effectiveness of the proposed features was assessed by experiments performed using multispectral, synthetic aperture radar, and combined dual-source datasets. The proposed model achieved best performance in terms of precision, model agreement, and overfitting as compared to the other conventional models for all analyzed datasets. The t-distributed stochastic neighbor embedding scatterplots of the generative features clearly show one dimension of the feature space associated with the target AGB. Feature importance analysis indicated that the produced generative features were more significant than the conventional analytical features. Also, the model provided a robust framework for homogeneous fusion of multisensor features from satellite RS data for predicting AG

    DAugNet: Unsupervised, Multi-source, Multi-target, and Life-long Domain Adaptation for Semantic Segmentation of Satellite Images

    Full text link
    The domain adaptation of satellite images has recently gained an increasing attention to overcome the limited generalization abilities of machine learning models when segmenting large-scale satellite images. Most of the existing approaches seek for adapting the model from one domain to another. However, such single-source and single-target setting prevents the methods from being scalable solutions, since nowadays multiple source and target domains having different data distributions are usually available. Besides, the continuous proliferation of satellite images necessitates the classifiers to adapt to continuously increasing data. We propose a novel approach, coined DAugNet, for unsupervised, multi-source, multi-target, and life-long domain adaptation of satellite images. It consists of a classifier and a data augmentor. The data augmentor, which is a shallow network, is able to perform style transfer between multiple satellite images in an unsupervised manner, even when new data are added over the time. In each training iteration, it provides the classifier with diversified data, which makes the classifier robust to large data distribution difference between the domains. Our extensive experiments prove that DAugNet significantly better generalizes to new geographic locations than the existing approaches

    SPARLING: Learning Latent Representations with Extremely Sparse Activations

    Full text link
    Real-world processes often contain intermediate state that can be modeled as an extremely sparse tensor. We introduce Sparling, a technique that allows you to learn models with intermediate layers that match this state from only end-to-end labeled examples (i.e., no supervision on the intermediate state). Sparling uses a new kind of informational bottleneck that enforces levels of activation sparsity unachievable using other techniques. We find that extreme sparsity is necessary to achieve good intermediate state modeling. On our synthetic DigitCircle domain as well as the LaTeX-OCR and Audio-MNIST-Sequence domains, we are able to precisely localize the intermediate states up to feature permutation with > 90% accuracy, even though we only train end-to-end.Comment: 10 pages, 6 figure
    • …
    corecore