952 research outputs found
On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models
This study investigates the effects of Markov chain Monte Carlo (MCMC)
sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is
restricted to the family of unnormalized probability densities for which the
negative log density (or energy function) is a ConvNet. We find that many of
the techniques used to stabilize training in previous studies are not
necessary. ML learning with a ConvNet potential requires only a few
hyper-parameters and no regularization. Using this minimal framework, we
identify a variety of ML learning outcomes that depend solely on the
implementation of MCMC sampling.
On one hand, we show that it is easy to train an energy-based model which can
sample realistic images with short-run Langevin. ML can be effective and stable
even when MCMC samples have much higher energy than true steady-state samples
throughout training. Based on this insight, we introduce an ML method with
purely noise-initialized MCMC, high-quality short-run synthesis, and the same
budget as ML with informative MCMC initialization such as CD or PCD. Unlike
previous models, our energy model can obtain realistic high-diversity samples
from a noise signal after training.
On the other hand, ConvNet potentials learned with non-convergent MCMC do not
have a valid steady-state and cannot be considered approximate unnormalized
densities of the training data because long-run MCMC samples differ greatly
from observed images. We show that it is much harder to train a ConvNet
potential to learn a steady-state over realistic images. To our knowledge,
long-run MCMC samples of all previous models lose the realism of short-run
samples. With correct tuning of Langevin noise, we train the first ConvNet
potentials for which long-run and steady-state MCMC samples are realistic
images.Comment: Code available at: https://github.com/point0bar1/ebm-anatom
Learning Energy-Based Model with Variational Auto-Encoder as Amortized Sampler
Due to the intractable partition function, training energy-based models
(EBMs) by maximum likelihood requires Markov chain Monte Carlo (MCMC) sampling
to approximate the gradient of the Kullback-Leibler divergence between data and
model distributions. However, it is non-trivial to sample from an EBM because
of the difficulty of mixing between modes. In this paper, we propose to learn a
variational auto-encoder (VAE) to initialize the finite-step MCMC, such as
Langevin dynamics that is derived from the energy function, for efficient
amortized sampling of the EBM. With these amortized MCMC samples, the EBM can
be trained by maximum likelihood, which follows an "analysis by synthesis"
scheme; while the variational auto-encoder learns from these MCMC samples via
variational Bayes. We call this joint training algorithm the variational MCMC
teaching, in which the VAE chases the EBM toward data distribution. We
interpret the learning algorithm as a dynamic alternating projection in the
context of information geometry. Our proposed models can generate samples
comparable to GANs and EBMs. Additionally, we demonstrate that our models can
learn effective probabilistic distribution toward supervised conditional
learning experiments
A Tale of Two Latent Flows: Learning Latent Space Normalizing Flow with Short-run Langevin Flow for Approximate Inference
We study a normalizing flow in the latent space of a top-down generator
model, in which the normalizing flow model plays the role of the informative
prior model of the generator. We propose to jointly learn the latent space
normalizing flow prior model and the top-down generator model by a Markov chain
Monte Carlo (MCMC)-based maximum likelihood algorithm, where a short-run
Langevin sampling from the intractable posterior distribution is performed to
infer the latent variables for each observed example, so that the parameters of
the normalizing flow prior and the generator can be updated with the inferred
latent variables. We show that, under the scenario of non-convergent short-run
MCMC, the finite step Langevin dynamics is a flow-like approximate inference
model and the learning objective actually follows the perturbation of the
maximum likelihood estimation (MLE). We further point out that the learning
framework seeks to (i) match the latent space normalizing flow and the
aggregated posterior produced by the short-run Langevin flow, and (ii) bias the
model from MLE such that the short-run Langevin flow inference is close to the
true posterior. Empirical results of extensive experiments validate the
effectiveness of the proposed latent space normalizing flow model in the tasks
of image generation, image reconstruction, anomaly detection, supervised image
inpainting and unsupervised image recovery.Comment: The Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI)
202
Recovery from Linear Measurements with Complexity-Matching Universal Signal Estimation
We study the compressed sensing (CS) signal estimation problem where an input
signal is measured via a linear matrix multiplication under additive noise.
While this setup usually assumes sparsity or compressibility in the input
signal during recovery, the signal structure that can be leveraged is often not
known a priori. In this paper, we consider universal CS recovery, where the
statistics of a stationary ergodic signal source are estimated simultaneously
with the signal itself. Inspired by Kolmogorov complexity and minimum
description length, we focus on a maximum a posteriori (MAP) estimation
framework that leverages universal priors to match the complexity of the
source. Our framework can also be applied to general linear inverse problems
where more measurements than in CS might be needed. We provide theoretical
results that support the algorithmic feasibility of universal MAP estimation
using a Markov chain Monte Carlo implementation, which is computationally
challenging. We incorporate some techniques to accelerate the algorithm while
providing comparable and in many cases better reconstruction quality than
existing algorithms. Experimental results show the promise of universality in
CS, particularly for low-complexity sources that do not exhibit standard
sparsity or compressibility.Comment: 29 pages, 8 figure
Explaining the effects of non-convergent sampling in the training of Energy-Based Models
In this paper, we quantify the impact of using non-convergent Markov chains
to train Energy-Based models (EBMs). In particular, we show analytically that
EBMs trained with non-persistent short runs to estimate the gradient can
perfectly reproduce a set of empirical statistics of the data, not at the level
of the equilibrium measure, but through a precise dynamical process. Our
results provide a first-principles explanation for the observations of recent
works proposing the strategy of using short runs starting from random initial
conditions as an efficient way to generate high-quality samples in EBMs, and
lay the groundwork for using EBMs as diffusion models. After explaining this
effect in generic EBMs, we analyze two solvable models in which the effect of
the non-convergent sampling in the trained parameters can be described in
detail. Finally, we test these predictions numerically on the Boltzmann
machine.Comment: 13 pages, 3 figure
- …