25 research outputs found
Disentanglement via Latent Quantization
In disentangled representation learning, a model is asked to tease apart a
dataset's underlying sources of variation and represent them independently of
one another. Since the model is provided with no ground truth information about
these sources, inductive biases take a paramount role in enabling
disentanglement. In this work, we construct an inductive bias towards
compositionally encoding and decoding data by enforcing a harsh communication
bottleneck. Concretely, we do this by (i) quantizing the latent space into
learnable discrete codes with a separate scalar codebook per dimension and (ii)
applying strong model regularization via an unusually high weight decay.
Intuitively, the quantization forces the encoder to use a small number of
latent values across many datapoints, which in turn enables the decoder to
assign a consistent meaning to each value. Regularization then serves to drive
the model towards this parsimonious strategy. We demonstrate the broad
applicability of this approach by adding it to both basic data-reconstructing
(vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models. In
order to reliably assess these models, we also propose InfoMEC, new metrics for
disentanglement that are cohesively grounded in information theory and fix
well-established shortcomings in previous metrics. Together with
regularization, latent quantization dramatically improves the modularity and
explicitness of learned representations on a representative suite of benchmark
datasets. In particular, our quantized-latent autoencoder (QLAE) consistently
outperforms strong methods from prior work in these key disentanglement
properties without compromising data reconstruction.Comment: 20 pages, 8 figures, code available at
https://github.com/kylehkhsu/disentangl
On the Transfer of Disentangled Representations in Realistic Settings
Learning meaningful representations that disentangle the underlying structure
of the data generating process is considered to be of key importance in machine
learning. While disentangled representations were found to be useful for
diverse tasks such as abstract reasoning and fair classification, their
scalability and real-world impact remain questionable. We introduce a new
high-resolution dataset with 1M simulated images and over 1,800 annotated
real-world images of the same setup. In contrast to previous work, this new
dataset exhibits correlations, a complex underlying structure, and allows to
evaluate transfer to unseen simulated and real-world settings where the encoder
i) remains in distribution or ii) is out of distribution. We propose new
architectures in order to scale disentangled representation learning to
realistic high-resolution settings and conduct a large-scale empirical study of
disentangled representations on this dataset. We observe that disentanglement
is a good predictor for out-of-distribution (OOD) task performance.Comment: Published at ICLR 202
CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models
Learning disentanglement aims at finding a low dimensional representation which consists of multiple explanatory and generative factors of the observational data. The framework of variational autoencoder (VAE) is commonly used to disentangle independent factors from observations. However, in real scenarios, factors with semantics are not necessarily independent. Instead, there might be an underlying causal structure which renders these factors dependent. We thus propose a new VAE based framework named CausalVAE, which includes a Causal Layer to transform independent exogenous factors into causal endogenous ones that correspond to causally related concepts in data. We further analyze the model identifiabitily, showing that the proposed model learned from observations recovers the true one up to a certain degree. Experiments are conducted on various datasets, including synthetic and real word benchmark CelebA. Results show that the causal representations learned by CausalVAE are semantically interpretable, and their causal relationship as a Directed Acyclic Graph (DAG) is identified with good accuracy. Furthermore, we demonstrate that the proposed CausalVAE model is able to generate counterfactual data through “do-operation” to the causal factors
CausalVAE: Disentangled Representation Learning via Neural Structural Causal Models
Learning disentanglement aims at finding a low dimensional representation
which consists of multiple explanatory and generative factors of the
observational data. The framework of variational autoencoder (VAE) is commonly
used to disentangle independent factors from observations. However, in real
scenarios, factors with semantics are not necessarily independent. Instead,
there might be an underlying causal structure which renders these factors
dependent. We thus propose a new VAE based framework named CausalVAE, which
includes a Causal Layer to transform independent exogenous factors into causal
endogenous ones that correspond to causally related concepts in data. We
further analyze the model identifiabitily, showing that the proposed model
learned from observations recovers the true one up to a certain degree.
Experiments are conducted on various datasets, including synthetic and real
word benchmark CelebA. Results show that the causal representations learned by
CausalVAE are semantically interpretable, and their causal relationship as a
Directed Acyclic Graph (DAG) is identified with good accuracy. Furthermore, we
demonstrate that the proposed CausalVAE model is able to generate
counterfactual data through "do-operation" to the causal factors
IB-UQ: Information bottleneck based uncertainty quantification for neural function regression and neural operator learning
We propose a novel framework for uncertainty quantification via information
bottleneck (IB-UQ) for scientific machine learning tasks, including deep neural
network (DNN) regression and neural operator learning (DeepONet). Specifically,
we incorporate the bottleneck by a confidence-aware encoder, which encodes
inputs into latent representations according to the confidence of the input
data belonging to the region where training data is located, and utilize a
Gaussian decoder to predict means and variances of outputs conditional on
representation variables. Furthermore, we propose a data augmentation based
information bottleneck objective which can enhance the quantification quality
of the extrapolation uncertainty, and the encoder and decoder can be both
trained by minimizing a tractable variational bound of the objective. In
comparison to uncertainty quantification (UQ) methods for scientific learning
tasks that rely on Bayesian neural networks with Hamiltonian Monte Carlo
posterior estimators, the model we propose is computationally efficient,
particularly when dealing with large-scale data sets. The effectiveness of the
IB-UQ model has been demonstrated through several representative examples, such
as regression for discontinuous functions, real-world data set regression,
learning nonlinear operators for partial differential equations, and a
large-scale climate model. The experimental results indicate that the IB-UQ
model can handle noisy data, generate robust predictions, and provide confident
uncertainty evaluation for out-of-distribution data.Comment: 27 pages, 22figure