20 research outputs found
Channel-Driven Monte Carlo Sampling for Bayesian Distributed Learning in Wireless Data Centers
Conventional frequentist learning, as assumed by existing federated learning
protocols, is limited in its ability to quantify uncertainty, incorporate prior
knowledge, guide active learning, and enable continual learning. Bayesian
learning provides a principled approach to address all these limitations, at
the cost of an increase in computational complexity. This paper studies
distributed Bayesian learning in a wireless data center setting encompassing a
central server and multiple distributed workers. Prior work on wireless
distributed learning has focused exclusively on frequentist learning, and has
introduced the idea of leveraging uncoded transmission to enable "over-the-air"
computing. Unlike frequentist learning, Bayesian learning aims at evaluating
approximations or samples from a global posterior distribution in the model
parameter space. This work investigates for the first time the design of
distributed one-shot, or "embarrassingly parallel", Bayesian learning protocols
in wireless data centers via consensus Monte Carlo (CMC). Uncoded transmission
is introduced not only as a way to implement "over-the-air" computing, but also
as a mechanism to deploy channel-driven MC sampling: Rather than treating
channel noise as a nuisance to be mitigated, channel-driven sampling utilizes
channel noise as an integral part of the MC sampling process. A simple wireless
CMC scheme is first proposed that is asymptotically optimal under Gaussian
local posteriors. Then, for arbitrary local posteriors, a variational
optimization strategy is introduced. Simulation results demonstrate that, if
properly accounted for, channel noise can indeed contribute to MC sampling and
does not necessarily decrease the accuracy level.Comment: Under Revisio
Practical bounds on the error of Bayesian posterior approximations: A nonasymptotic approach
Bayesian inference typically requires the computation of an approximation to
the posterior distribution. An important requirement for an approximate
Bayesian inference algorithm is to output high-accuracy posterior mean and
uncertainty estimates. Classical Monte Carlo methods, particularly Markov Chain
Monte Carlo, remain the gold standard for approximate Bayesian inference
because they have a robust finite-sample theory and reliable convergence
diagnostics. However, alternative methods, which are more scalable or apply to
problems where Markov Chain Monte Carlo cannot be used, lack the same
finite-data approximation theory and tools for evaluating their accuracy. In
this work, we develop a flexible new approach to bounding the error of mean and
uncertainty estimates of scalable inference algorithms. Our strategy is to
control the estimation errors in terms of Wasserstein distance, then bound the
Wasserstein distance via a generalized notion of Fisher distance. Unlike
computing the Wasserstein distance, which requires access to the normalized
posterior distribution, the Fisher distance is tractable to compute because it
requires access only to the gradient of the log posterior density. We
demonstrate the usefulness of our Fisher distance approach by deriving bounds
on the Wasserstein error of the Laplace approximation and Hilbert coresets. We
anticipate that our approach will be applicable to many other approximate
inference methods such as the integrated Laplace approximation, variational
inference, and approximate Bayesian computationComment: 22 pages, 2 figure
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Generalized linear models (GLMs) -- such as logistic regression, Poisson
regression, and robust regression -- provide interpretable models for diverse
data types. Probabilistic approaches, particularly Bayesian ones, allow
coherent estimates of uncertainty, incorporation of prior information, and
sharing of power across experiments via hierarchical models. In practice,
however, the approximate Bayesian methods necessary for inference have either
failed to scale to large data sets or failed to provide theoretical guarantees
on the quality of inference. We propose a new approach based on constructing
polynomial approximate sufficient statistics for GLMs (PASS-GLM). We
demonstrate that our method admits a simple algorithm as well as trivial
streaming and distributed extensions that do not compound error across
computations. We provide theoretical guarantees on the quality of point (MAP)
estimates, the approximate posterior, and posterior mean and uncertainty
estimates. We validate our approach empirically in the case of logistic
regression using a quadratic approximation and show competitive performance
with stochastic gradient descent, MCMC, and the Laplace approximation in terms
of speed and multiple measures of accuracy -- including on an advertising data
set with 40 million data points and 20,000 covariates.Comment: In Proceedings of the 31st Annual Conference on Neural Information
Processing Systems (NIPS 2017). v3: corrected typos in Appendix
Global consensus Monte Carlo
To conduct Bayesian inference with large data sets, it is often convenient or
necessary to distribute the data across multiple machines. We consider a
likelihood function expressed as a product of terms, each associated with a
subset of the data. Inspired by global variable consensus optimisation, we
introduce an instrumental hierarchical model associating auxiliary statistical
parameters with each term, which are conditionally independent given the
top-level parameters. One of these top-level parameters controls the
unconditional strength of association between the auxiliary parameters. This
model leads to a distributed MCMC algorithm on an extended state space yielding
approximations of posterior expectations. A trade-off between computational
tractability and fidelity to the original model can be controlled by changing
the association strength in the instrumental model. We further propose the use
of a SMC sampler with a sequence of association strengths, allowing both the
automatic determination of appropriate strengths and for a bias correction
technique to be applied. In contrast to similar distributed Monte Carlo
algorithms, this approach requires few distributional assumptions. The
performance of the algorithms is illustrated with a number of simulated
examples