4,090 research outputs found
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Copula-like Variational Inference
This paper considers a new family of variational distributions motivated by
Sklar's theorem. This family is based on new copula-like densities on the
hypercube with non-uniform marginals which can be sampled efficiently, i.e.
with a complexity linear in the dimension of state space. Then, the proposed
variational densities that we suggest can be seen as arising from these
copula-like densities used as base distributions on the hypercube with Gaussian
quantile functions and sparse rotation matrices as normalizing flows. The
latter correspond to a rotation of the marginals with complexity . We provide some empirical evidence that such a variational family can
also approximate non-Gaussian posteriors and can be beneficial compared to
Gaussian approximations. Our method performs largely comparably to
state-of-the-art variational approximations on standard regression and
classification benchmarks for Bayesian Neural Networks.Comment: 33rd Conference on Neural Information Processing Systems (NeurIPS
2019), Vancouver, Canad
Hierarchical Implicit Models and Likelihood-Free Variational Inference
Implicit probabilistic models are a flexible class of models defined by a
simulation process for data. They form the basis for theories which encompass
our understanding of the physical world. Despite this fundamental nature, the
use of implicit models remains limited due to challenges in specifying complex
latent structure in them, and in performing inferences in such models with
large data sets. In this paper, we first introduce hierarchical implicit models
(HIMs). HIMs combine the idea of implicit densities with hierarchical Bayesian
modeling, thereby defining models via simulators of data with rich hidden
structure. Next, we develop likelihood-free variational inference (LFVI), a
scalable variational inference algorithm for HIMs. Key to LFVI is specifying a
variational family that is also implicit. This matches the model's flexibility
and allows for accurate approximation of the posterior. We demonstrate diverse
applications: a large-scale physical simulator for predator-prey populations in
ecology; a Bayesian generative adversarial network for discrete data; and a
deep implicit model for text generation.Comment: Appears in Neural Information Processing Systems, 201
The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating
prior knowledge or specific properties into machine learning models via
carefully choosing a prior distribution. In this work, we propose a new type of
prior distributions for convolutional neural networks, deep weight prior (DWP),
that exploit generative models to encourage a specific structure of trained
convolutional filters e.g., spatial correlations of weights. We define DWP in
the form of an implicit distribution and propose a method for variational
inference with such type of implicit priors. In experiments, we show that DWP
improves the performance of Bayesian neural networks when training data are
limited, and initialization of weights with samples from DWP accelerates
training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of
convolutional neural networks, that acts as a prior distribution while
training on new dataset
- …