325 research outputs found
The Incomplete Rosetta Stone Problem: Identifiability Results for Multi-View Nonlinear ICA
We consider the problem of recovering a common latent source with independent
components from multiple views. This applies to settings in which a variable is
measured with multiple experimental modalities, and where the goal is to
synthesize the disparate measurements into a single unified representation. We
consider the case that the observed views are a nonlinear mixing of
component-wise corruptions of the sources. When the views are considered
separately, this reduces to nonlinear Independent Component Analysis (ICA) for
which it is provably impossible to undo the mixing. We present novel
identifiability proofs that this is possible when the multiple views are
considered jointly, showing that the mixing can theoretically be undone using
function approximators such as deep neural networks. In contrast to known
identifiability results for nonlinear ICA, we prove that independent latent
sources with arbitrary mixing can be recovered as long as multiple,
sufficiently different noisy views are available
Nonlinear Independent Component Analysis for Principled Disentanglement in Unsupervised Deep Learning
A central problem in unsupervised deep learning is how to find useful
representations of high-dimensional data, sometimes called "disentanglement".
Most approaches are heuristic and lack a proper theoretical foundation. In
linear representation learning, independent component analysis (ICA) has been
successful in many applications areas, and it is principled, i.e. based on a
well-defined probabilistic model. However, extension of ICA to the nonlinear
case has been problematic due to the lack of identifiability, i.e. uniqueness
of the representation. Recently, nonlinear extensions that utilize temporal
structure or some auxiliary information have been proposed. Such models are in
fact identifiable, and consequently, an increasing number of algorithms have
been developed. In particular, some self-supervised algorithms can be shown to
estimate nonlinear ICA, even though they have initially been proposed from
heuristic perspectives. This paper reviews the state-of-the-art of nonlinear
ICA theory and algorithms
Advances in identifiability of nonlinear probabilistic models
Identifiability is a highly prized property of statistical models. This thesis investigates this property in nonlinear models encountered in two fields of statistics: representation learning and causal discovery. In representation learning, identifiability leads to learning interpretable and reproducible representations, while in causal discovery, it is necessary for the estimation of correct causal directions.
We begin by leveraging recent advances in nonlinear ICA to show that the latent space of a VAE is identifiable up to a permutation and pointwise nonlinear transformations of its components. A factorized prior distribution over the latent variables conditioned on an auxiliary observed variable, such as a class label or nearly any other observation, is required for our result. We also extend previous identifiability results in nonlinear ICA to the case of noisy or undercomplete observations, and incorporate them into a maximum likelihood framework.
Our second contribution is to develop the Independently Modulated Component Analysis (IMCA) framework, a generalization of nonlinear ICA to non-independent latent variables. We show that we can drop the independence assumption in ICA while maintaining identifiability, resulting in a very flexible and generic framework for principled disentangled representation learning. This finding is predicated on the existence of an auxiliary variable that modulates the joint distribution of the latent variables in a factorizable manner.
As a third contribution, we extend the identifiability theory to a broad family of conditional energy-based models (EBMs). This novel model generalizes earlier results by removing any distributional assumptions on the representations, which are ubiquitous in the latent variable setting. The conditional EBM can learn identifiable overcomplete representations and has universal approximation capabilities/.
Finally, we investigate a connection between the framework of autoregressive normalizing flow models and causal discovery. Causal models derived from affine autoregressive flows are shown to be identifiable, generalizing the wellknown additive noise model. Using normalizing flows, we can compute the exact likelihood of the causal model, which is subsequently used to derive a likelihood ratio measure for causal discovery. They are also invertible, making them perfectly suitable for performing causal inference tasks like interventions and counterfactuals
Learning Linear Causal Representations from Interventions under General Nonlinear Mixing
We study the problem of learning causal representations from unknown, latent
interventions in a general setting, where the latent distribution is Gaussian
but the mixing function is completely general. We prove strong identifiability
results given unknown single-node interventions, i.e., without having access to
the intervention targets. This generalizes prior works which have focused on
weaker classes, such as linear maps or paired counterfactual data. This is also
the first instance of causal identifiability from non-paired interventions for
deep neural network embeddings. Our proof relies on carefully uncovering the
high-dimensional geometric structure present in the data distribution after a
non-linear density transformation, which we capture by analyzing quadratic
forms of precision matrices of the latent distributions. Finally, we propose a
contrastive algorithm to identify the latent variables in practice and evaluate
its performance on various tasks.Comment: 38 page
- …