1,632 research outputs found
The Information Autoencoding Family: A Lagrangian Perspective on Latent Variable Generative Models
A large number of objectives have been proposed to train latent variable
generative models. We show that many of them are Lagrangian dual functions of
the same primal optimization problem. The primal problem optimizes the mutual
information between latent and visible variables, subject to the constraints of
accurately modeling the data distribution and performing correct amortized
inference. By choosing to maximize or minimize mutual information, and choosing
different Lagrange multipliers, we obtain different objectives including
InfoGAN, ALI/BiGAN, ALICE, CycleGAN, beta-VAE, adversarial autoencoders, AVB,
AS-VAE and InfoVAE. Based on this observation, we provide an exhaustive
characterization of the statistical and computational trade-offs made by all
the training objectives in this class of Lagrangian duals. Next, we propose a
dual optimization method where we optimize model parameters as well as the
Lagrange multipliers. This method achieves Pareto optimal solutions in terms of
optimizing information and satisfying the constraints
Semantic Compression of Episodic Memories
Storing knowledge of an agent's environment in the form of a probabilistic
generative model has been established as a crucial ingredient in a multitude of
cognitive tasks. Perception has been formalised as probabilistic inference over
the state of latent variables, whereas in decision making the model of the
environment is used to predict likely consequences of actions. Such generative
models have earlier been proposed to underlie semantic memory but it remained
unclear if this model also underlies the efficient storage of experiences in
episodic memory. We formalise the compression of episodes in the normative
framework of information theory and argue that semantic memory provides the
distortion function for compression of experiences. Recent advances and
insights from machine learning allow us to approximate semantic compression in
naturalistic domains and contrast the resulting deviations in compressed
episodes with memory errors observed in the experimental literature on human
memory.Comment: CogSci201
Online Forecasting Matrix Factorization
In this paper the problem of forecasting high dimensional time series is
considered. Such time series can be modeled as matrices where each column
denotes a measurement. In addition, when missing values are present, low rank
matrix factorization approaches are suitable for predicting future values. This
paper formally defines and analyzes the forecasting problem in the online
setting, i.e. where the data arrives as a stream and only a single pass is
allowed. We present and analyze novel matrix factorization techniques which can
learn low-dimensional embeddings effectively in an online manner. Based on
these embeddings a recursive minimum mean square error estimator is derived,
which learns an autoregressive model on them. Experiments with two real
datasets with tens of millions of measurements show the benefits of the
proposed approach
DAG-GNN: DAG Structure Learning with Graph Neural Networks
Learning a faithful directed acyclic graph (DAG) from samples of a joint
distribution is a challenging combinatorial problem, owing to the intractable
search space superexponential in the number of graph nodes. A recent
breakthrough formulates the problem as a continuous optimization with a
structural constraint that ensures acyclicity (Zheng et al., 2018). The authors
apply the approach to the linear structural equation model (SEM) and the
least-squares loss function that are statistically well justified but
nevertheless limited. Motivated by the widespread success of deep learning that
is capable of capturing complex nonlinear mappings, in this work we propose a
deep generative model and apply a variant of the structural constraint to learn
the DAG. At the heart of the generative model is a variational autoencoder
parameterized by a novel graph neural network architecture, which we coin
DAG-GNN. In addition to the richer capacity, an advantage of the proposed model
is that it naturally handles discrete variables as well as vector-valued ones.
We demonstrate that on synthetic data sets, the proposed method learns more
accurate graphs for nonlinearly generated samples; and on benchmark data sets
with discrete variables, the learned graphs are reasonably close to the global
optima. The code is available at \url{https://github.com/fishmoon1234/DAG-GNN}.Comment: ICML2019. Code is available at
https://github.com/fishmoon1234/DAG-GN
Neural Joint Source-Channel Coding
For reliable transmission across a noisy communication channel, classical
results from information theory show that it is asymptotically optimal to
separate out the source and channel coding processes. However, this
decomposition can fall short in the finite bit-length regime, as it requires
non-trivial tuning of hand-crafted codes and assumes infinite computational
power for decoding. In this work, we propose to jointly learn the encoding and
decoding processes using a new discrete variational autoencoder model. By
adding noise into the latent codes to simulate the channel during training, we
learn to both compress and error-correct given a fixed bit-length and
computational budget. We obtain codes that are not only competitive against
several separation schemes, but also learn useful robust representations of the
data for downstream tasks such as classification. Finally, inference
amortization yields an extremely fast neural decoder, almost an order of
magnitude faster compared to standard decoding methods based on iterative
belief propagation
Monge-Amp\`ere Flow for Generative Modeling
We present a deep generative model, named Monge-Amp\`ere flow, which builds
on continuous-time gradient flow arising from the Monge-Amp\`ere equation in
optimal transport theory. The generative map from the latent space to the data
space follows a dynamical system, where a learnable potential function guides a
compressible fluid to flow towards the target density distribution. Training of
the model amounts to solving an optimal control problem. The Monge-Amp\`ere
flow has tractable likelihoods and supports efficient sampling and inference.
One can easily impose symmetry constraints in the generative model by designing
suitable scalar potential functions. We apply the approach to unsupervised
density estimation of the MNIST dataset and variational calculation of the
two-dimensional Ising model at the critical point. This approach brings
insights and techniques from Monge-Amp\`ere equation, optimal transport, and
fluid dynamics into reversible flow-based generative models
Variational Information Bottleneck on Vector Quantized Autoencoders
In this paper, we provide an information-theoretic interpretation of the
Vector Quantized-Variational Autoencoder (VQ-VAE). We show that the loss
function of the original VQ-VAE can be derived from the variational
deterministic information bottleneck (VDIB) principle. On the other hand, the
VQ-VAE trained by the Expectation Maximization (EM) algorithm can be viewed as
an approximation to the variational information bottleneck(VIB) principle
Deep Component Analysis via Alternating Direction Neural Networks
Despite a lack of theoretical understanding, deep neural networks have
achieved unparalleled performance in a wide range of applications. On the other
hand, shallow representation learning with component analysis is associated
with rich intuition and theory, but smaller capacity often limits its
usefulness. To bridge this gap, we introduce Deep Component Analysis (DeepCA),
an expressive multilayer model formulation that enforces hierarchical structure
through constraints on latent variables in each layer. For inference, we
propose a differentiable optimization algorithm implemented using recurrent
Alternating Direction Neural Networks (ADNNs) that enable parameter learning
using standard backpropagation. By interpreting feed-forward networks as
single-iteration approximations of inference in our model, we provide both a
novel theoretical perspective for understanding them and a practical technique
for constraining predictions with prior knowledge. Experimentally, we
demonstrate performance improvements on a variety of tasks, including
single-image depth prediction with sparse output constraints
Discriminative Nonlinear Analysis Operator Learning: When Cosparse Model Meets Image Classification
Linear synthesis model based dictionary learning framework has achieved
remarkable performances in image classification in the last decade. Behaved as
a generative feature model, it however suffers from some intrinsic
deficiencies. In this paper, we propose a novel parametric nonlinear analysis
cosparse model (NACM) with which a unique feature vector will be much more
efficiently extracted. Additionally, we derive a deep insight to demonstrate
that NACM is capable of simultaneously learning the task adapted feature
transformation and regularization to encode our preferences, domain prior
knowledge and task oriented supervised information into the features. The
proposed NACM is devoted to the classification task as a discriminative feature
model and yield a novel discriminative nonlinear analysis operator learning
framework (DNAOL). The theoretical analysis and experimental performances
clearly demonstrate that DNAOL will not only achieve the better or at least
competitive classification accuracies than the state-of-the-art algorithms but
it can also dramatically reduce the time complexities in both training and
testing phases.Comment: IEEE TIP Accepte
Exact Rate-Distortion in Autoencoders via Echo Noise
Compression is at the heart of effective representation learning. However,
lossy compression is typically achieved through simple parametric models like
Gaussian noise to preserve analytic tractability, and the limitations this
imposes on learning are largely unexplored. Further, the Gaussian prior
assumptions in models such as variational autoencoders (VAEs) provide only an
upper bound on the compression rate in general. We introduce a new noise
channel, \emph{Echo noise}, that admits a simple, exact expression for mutual
information for arbitrary input distributions. The noise is constructed in a
data-driven fashion that does not require restrictive distributional
assumptions. With its complex encoding mechanism and exact rate regularization,
Echo leads to improved bounds on log-likelihood and dominates -VAEs
across the achievable range of rate-distortion trade-offs. Further, we show
that Echo noise can outperform flow-based methods without the need to train
additional distributional transformations.Comment: NeurIPS 2019; updated Gaussian baseline results, added
disentanglemen
- …