993 research outputs found

    Federated Stochastic Gradient Langevin Dynamics

    Get PDF
    Publisher Copyright: © 2021 37th Conference on Uncertainty in Artificial Intelligence, UAI 2021. All Rights Reserved.Stochastic gradient MCMC methods, such as stochastic gradient Langevin dynamics (SGLD), employ fast but noisy gradient estimates to enable large-scale posterior sampling. Although we can easily extend SGLD to distributed settings, it suffers from two issues when applied to federated non-IID data. First, the variance of these estimates increases significantly. Second, delaying communication causes the Markov chains to diverge from the true posterior even for very simple models. To alleviate both these problems, we propose conducive gradients, a simple mechanism that combines local likelihood approximations to correct gradient updates. Notably, conducive gradients are easy to compute, and since we only calculate the approximations once, they incur negligible overhead. We apply conducive gradients to distributed stochastic gradient Langevin dynamics (DSGLD) and call the resulting method federated stochastic gradient Langevin dynamics (FSGLD). We demonstrate that our approach can handle delayed communication rounds, converging to the target posterior in cases where DSGLD fails. We also show that FSGLD outperforms DSGLD for non-IID federated data with experiments on metric learning and neural networks.Peer reviewe

    Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)

    Get PDF
    In applications of Gaussian processes where quantification of uncertainty is of primary interest, it is necessary to accurately characterize the posterior distribution over covariance parameters. This paper proposes an adaptation of the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the posterior distribution over covariance parameters with negligible bias and without the need to compute the marginal likelihood. In Gaussian process regression, this has the enormous advantage that stochastic gradients can be computed by solving linear systems only. A novel unbiased linear systems solver based on parallelizable covariance matrix-vector products is developed to accelerate the unbiased estimation of gradients. The results demonstrate the possibility to enable scalable and exact (in a Monte Carlo sense) quantification of uncertainty in Gaussian processes without imposing any special structure on the covariance or reducing the number of input vectors.Comment: 10 pages - paper accepted at ICML 201

    Kinetic energy choice in Hamiltonian/hybrid Monte Carlo

    Full text link
    We consider how different choices of kinetic energy in Hamiltonian Monte Carlo affect algorithm performance. To this end, we introduce two quantities which can be easily evaluated, the composite gradient and the implicit noise. Results are established on integrator stability and geometric convergence, and we show that choices of kinetic energy that result in heavy-tailed momentum distributions can exhibit an undesirable negligible moves property, which we define. A general efficiency-robustness trade off is outlined, and implementations which rely on approximate gradients are also discussed. Two numerical studies illustrate our theoretical findings, showing that the standard choice which results in a Gaussian momentum distribution is not always optimal in terms of either robustness or efficiency.Comment: 15 pages (+7 page supplement, included here as an appendix), 2 figures (+1 in supplement

    Advanced Bayesian Monte Carlo Methods for Inference and Control

    Get PDF
    Monte Carlo methods are are an ubiquitous tool in modern statistics. Under the Bayesian paradigm, they are used for estimating otherwise intractable integrals arising when integrating a function hh with respect to a posterior distribution π\pi. This thesis discusses several aspects of such Monte Carlo methods. The first discussion evolves around the problem of sampling from only almost everywhere differentiable distributions, a class of distributions which includes all log-concave posteriors. A new sampling method based on a second-order diffusion process is proposed, new theoretical results are proved, and extensive numerical illustrations elucidate the benefits and weaknesses of various methods applicable in these settings. In high-dimensional settings, one can exploit local structures of inverse problems to parallelise computations. This will be explored in both fully localisable problems, and problems where conditional independence of variables given some others holds only approximately. This thesis proposes two algorithms using parallelisation techniques, and shows their empirical performance on two localisable imaging problems. Another problem arises when defining function space priors over high-dimensional domains. The commonly used Karhunen-Loève priors suffer from bad dimensional scaling: they require an orthogonal basis of the function space, which can often be obtained as a product of one-dimensional basis functions. This leads to the number of parameters growing exponentially in the dimension dd of the function domain. The trace-class neural network prior, proposed in this thesis, scales more favourably in the dimension of the function's domain. This prior is a Bayesian neural network prior, where each weight and bias has an independent Gaussian prior, but with a key difference to existing Bayesian neural network priors: the variances decrease in the width of the network, such that the variances form a summable sequence and the infinite width limit neural network is well defined. As is shown in this thesis, the resulting posterior of the unknown function is amenable to sampling using Hilbert space Markov chain Monte Carlo methods. These sampling methods are favoured because they are stable under mesh-refinement, in the sense that the acceptance probability does not shrink to 0 as more parameters are introduced to better approximate the well-defined infinite limit. Both numerical illustrations and theoretical results show that these priors are competitive and have distinct advantages over other function space priors. These different function space priors are then used in stochastic control. To this end, a suitable likelihood for continuous value functions in a Bayesian approach to reinforcement learning is defined. This thesis proves that it can be used in conjunction with both the classical Karhunen-Loève prior and the proposed trace-class neural network prior. Numerical examples compare the resulting posteriors, and illustrate the new prior's performance and dimension robustness.Cantab Capital Institute for the Mathematics of Informatio
    • …
    corecore