4,530 research outputs found
Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks
Effective training of deep neural networks suffers from two main issues. The
first is that the parameter spaces of these models exhibit pathological
curvature. Recent methods address this problem by using adaptive
preconditioning for Stochastic Gradient Descent (SGD). These methods improve
convergence by adapting to the local geometry of parameter space. A second
issue is overfitting, which is typically addressed by early stopping. However,
recent work has demonstrated that Bayesian model averaging mitigates this
problem. The posterior can be sampled by using Stochastic Gradient Langevin
Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD
methods inefficient. Here, we propose combining adaptive preconditioners with
SGLD. In support of this idea, we give theoretical properties on asymptotic
convergence and predictive risk. We also provide empirical results for Logistic
Regression, Feedforward Neural Nets, and Convolutional Neural Nets,
demonstrating that our preconditioned SGLD method gives state-of-the-art
performance on these models.Comment: AAAI 201
On the Anatomy of MCMC-Based Maximum Likelihood Learning of Energy-Based Models
This study investigates the effects of Markov chain Monte Carlo (MCMC)
sampling in unsupervised Maximum Likelihood (ML) learning. Our attention is
restricted to the family of unnormalized probability densities for which the
negative log density (or energy function) is a ConvNet. We find that many of
the techniques used to stabilize training in previous studies are not
necessary. ML learning with a ConvNet potential requires only a few
hyper-parameters and no regularization. Using this minimal framework, we
identify a variety of ML learning outcomes that depend solely on the
implementation of MCMC sampling.
On one hand, we show that it is easy to train an energy-based model which can
sample realistic images with short-run Langevin. ML can be effective and stable
even when MCMC samples have much higher energy than true steady-state samples
throughout training. Based on this insight, we introduce an ML method with
purely noise-initialized MCMC, high-quality short-run synthesis, and the same
budget as ML with informative MCMC initialization such as CD or PCD. Unlike
previous models, our energy model can obtain realistic high-diversity samples
from a noise signal after training.
On the other hand, ConvNet potentials learned with non-convergent MCMC do not
have a valid steady-state and cannot be considered approximate unnormalized
densities of the training data because long-run MCMC samples differ greatly
from observed images. We show that it is much harder to train a ConvNet
potential to learn a steady-state over realistic images. To our knowledge,
long-run MCMC samples of all previous models lose the realism of short-run
samples. With correct tuning of Langevin noise, we train the first ConvNet
potentials for which long-run and steady-state MCMC samples are realistic
images.Comment: Code available at: https://github.com/point0bar1/ebm-anatom
- …