790 research outputs found
Approximate Decentralized Bayesian Inference
This paper presents an approximate method for performing Bayesian inference
in models with conditional independence over a decentralized network of
learning agents. The method first employs variational inference on each
individual learning agent to generate a local approximate posterior, the agents
transmit their local posteriors to other agents in the network, and finally
each agent combines its set of received local posteriors. The key insight in
this work is that, for many Bayesian models, approximate inference schemes
destroy symmetry and dependencies in the model that are crucial to the correct
application of Bayes' rule when combining the local posteriors. The proposed
method addresses this issue by including an additional optimization step in the
combination procedure that accounts for these broken dependencies. Experiments
on synthetic and real data demonstrate that the decentralized method provides
advantages in computational performance and predictive test likelihood over
previous batch and distributed methods.Comment: This paper was presented at UAI 2014. Please use the following BibTeX
citation: @inproceedings{Campbell14_UAI, Author = {Trevor Campbell and
Jonathan P. How}, Title = {Approximate Decentralized Bayesian Inference},
Booktitle = {Uncertainty in Artificial Intelligence (UAI)}, Year = {2014}
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
Truncated Random Measures
Completely random measures (CRMs) and their normalizations are a rich source
of Bayesian nonparametric priors. Examples include the beta, gamma, and
Dirichlet processes. In this paper we detail two major classes of sequential
CRM representations---series representations and superposition
representations---within which we organize both novel and existing sequential
representations that can be used for simulation and posterior inference. These
two classes and their constituent representations subsume existing ones that
have previously been developed in an ad hoc manner for specific processes.
Since a complete infinite-dimensional CRM cannot be used explicitly for
computation, sequential representations are often truncated for tractability.
We provide truncation error analyses for each type of sequential
representation, as well as their normalized versions, thereby generalizing and
improving upon existing truncation error bounds in the literature. We analyze
the computational complexity of the sequential representations, which in
conjunction with our error bounds allows us to directly compare representations
and discuss their relative efficiency. We include numerous applications of our
theoretical results to commonly-used (normalized) CRMs, demonstrating that our
results enable a straightforward representation and analysis of CRMs that has
not previously been available in a Bayesian nonparametric context.Comment: To appear in Bernoulli; 58 pages, 3 figure
The computational asymptotics of Gaussian variational inference
Variational inference is a popular alternative to Markov chain Monte Carlo
methods that constructs a Bayesian posterior approximation by minimizing a
discrepancy to the true posterior within a pre-specified family. This converts
Bayesian inference into an optimization problem, enabling the use of simple and
scalable stochastic optimization algorithms. However, a key limitation of
variational inference is that the optimal approximation is typically not
tractable to compute; even in simple settings the problem is nonconvex. Thus,
recently developed statistical guarantees -- which all involve the (data)
asymptotic properties of the optimal variational distribution -- are not
reliably obtained in practice. In this work, we provide two major
contributions: a theoretical analysis of the asymptotic convexity properties of
variational inference in the popular setting with a Gaussian family; and
consistent stochastic variational inference (CSVI), an algorithm that exploits
these properties to find the optimal approximation in the asymptotic regime.
CSVI consists of a tractable initialization procedure that finds the local
basin of the optimal solution, and a scaled gradient descent algorithm that
stays locally confined to that basin. Experiments on nonconvex synthetic and
real-data examples show that compared with standard stochastic gradient
descent, CSVI improves the likelihood of obtaining the globally optimal
posterior approximation
- …