74,510 research outputs found
Human-controllable and structured deep generative models
Deep generative models are a class of probabilistic models that attempts to learn the underlying data distribution. These models are usually trained in an unsupervised way and thus, do not require any labels. Generative models such as Variational Autoencoders and Generative Adversarial Networks have made astounding progress over the last years. These models have several benefits: eased sampling and evaluation, efficient learning of low-dimensional representations for downstream tasks, and better understanding through interpretable representations. However, even though the quality of these models has improved immensely, the ability to control their style and structure is limited. Structured and human-controllable representations of generative models are essential for human-machine interaction and other applications, including fairness, creativity, and entertainment. This thesis investigates learning human-controllable and structured representations with deep generative models. In particular, we focus on generative modelling of 2D images. For the first part, we focus on learning clustered representations. We propose semi-parametric hierarchical variational autoencoders to estimate the intensity of facial action units. The semi-parametric model forms a hybrid generative-discriminative model and leverages both parametric Variational Autoencoder and non-parametric Gaussian Process autoencoder. We show superior performance in comparison with existing facial action unit estimation approaches. Based on the results and analysis of the learned representation, we focus on learning Mixture-of-Gaussians representations in an autoencoding framework. We deviate from the conventional autoencoding framework and consider a regularized objective with the Cauchy-Schwarz divergence. The Cauchy-Schwarz divergence allows a closed-form solution for Mixture-of-Gaussian distributions and, thus, efficiently optimizing the autoencoding objective. We show that our model outperforms existing Variational Autoencoders in density estimation, clustering, and semi-supervised facial action detection. We focus on learning disentangled representations for conditional generation and fair facial attribute classification for the second part. Conditional image generation relies on the accessibility to large-scale annotated datasets. Nevertheless, the geometry of visual objects, such as in faces, cannot be learned implicitly and deteriorate image fidelity. We propose incorporating facial landmarks with a statistical shape model and a differentiable piecewise affine transformation to separate the representation for appearance and shape. The goal of incorporating facial landmarks is that generation is controlled and can separate different appearances and geometries. In our last work, we use weak supervision for disentangling groups of variations. Works on learning disentangled representation have been done in an unsupervised fashion. However, recent works have shown that learning disentangled representations is not identifiable without any inductive biases. Since then, there has been a shift towards weakly-supervised disentanglement learning. We investigate using regularization based on the Kullback-Leiber divergence to disentangle groups of variations. The goal is to have consistent and separated subspaces for different groups, e.g., for content-style learning. Our evaluation shows increased disentanglement abilities and competitive performance for image clustering and fair facial attribute classification with weak supervision compared to supervised and semi-supervised approaches.Open Acces
Auto-Encoding Sequential Monte Carlo
We build on auto-encoding sequential Monte Carlo (AESMC): a method for model
and proposal learning based on maximizing the lower bound to the log marginal
likelihood in a broad family of structured probabilistic models. Our approach
relies on the efficiency of sequential Monte Carlo (SMC) for performing
inference in structured probabilistic models and the flexibility of deep neural
networks to model complex conditional probability distributions. We develop
additional theoretical insights and introduce a new training procedure which
improves both model and proposal learning. We demonstrate that our approach
provides a fast, easy-to-implement and scalable means for simultaneous model
learning and proposal adaptation in deep generative models
High fidelity image counterfactuals with probabilistic causal models
We present a general causal generative modelling framework for accurate estimation of high fidelity image counterfactuals with deep structural causal models. Estimation of interventional and counterfactual queries for high-dimensional structured variables, such as images, remains a challenging task. We leverage ideas from causal mediation analysis and advances in generative modelling to design new deep causal mechanisms for structured variables in causal models. Our experiments demonstrate that our proposed mechanisms are capable of accurate abduction and estimation of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals
A Comprehensive Survey on Generative Diffusion Models for Structured Data
In recent years, generative diffusion models have achieved a rapid paradigm
shift in deep generative models by showing groundbreaking performance across
various applications. Meanwhile, structured data, encompassing tabular and time
series data, has been received comparatively limited attention from the deep
learning research community, despite its omnipresence and extensive
applications. Thus, there is still a lack of literature and its reviews on
structured data modelling via diffusion models, compared to other data
modalities such as visual and textual data. To address this gap, we present a
comprehensive review of recently proposed diffusion models in the field of
structured data. First, this survey provides a concise overview of the
score-based diffusion model theory, subsequently proceeding to the technical
descriptions of the majority of pioneering works that used structured data in
both data-driven general tasks and domain-specific applications. Thereafter, we
analyse and discuss the limitations and challenges shown in existing works and
suggest potential research directions. We hope this review serves as a catalyst
for the research community, promoting developments in generative diffusion
models for structured data.Comment: 20 pages, 1 figure, 2 table
Deep Generative Modeling of LiDAR Data
Building models capable of generating structured output is a key challenge
for AI and robotics. While generative models have been explored on many types
of data, little work has been done on synthesizing lidar scans, which play a
key role in robot mapping and localization. In this work, we show that one can
adapt deep generative models for this task by unravelling lidar scans into a 2D
point map. Our approach can generate high quality samples, while simultaneously
learning a meaningful latent representation of the data. We demonstrate
significant improvements against state-of-the-art point cloud generation
methods. Furthermore, we propose a novel data representation that augments the
2D signal with absolute positional information. We show that this helps
robustness to noisy and imputed input; the learned model can recover the
underlying lidar scan from seemingly uninformative dataComment: Presented at IROS 201
- …