14 research outputs found
Bayesian Inference of Log Determinants
The log-determinant of a kernel matrix appears in a variety of machine
learning problems, ranging from determinantal point processes and generalized
Markov random fields, through to the training of Gaussian processes. Exact
calculation of this term is often intractable when the size of the kernel
matrix exceeds a few thousand. In the spirit of probabilistic numerics, we
reinterpret the problem of computing the log-determinant as a Bayesian
inference problem. In particular, we combine prior knowledge in the form of
bounds from matrix theory and evidence derived from stochastic trace estimation
to obtain probabilistic estimates for the log-determinant and its associated
uncertainty within a given computational budget. Beyond its novelty and
theoretic appeal, the performance of our proposal is competitive with
state-of-the-art approaches to approximating the log-determinant, while also
quantifying the uncertainty due to budget-constrained evidence.Comment: 12 pages, 3 figure
Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)
In applications of Gaussian processes where quantification of uncertainty is
of primary interest, it is necessary to accurately characterize the posterior
distribution over covariance parameters. This paper proposes an adaptation of
the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the
posterior distribution over covariance parameters with negligible bias and
without the need to compute the marginal likelihood. In Gaussian process
regression, this has the enormous advantage that stochastic gradients can be
computed by solving linear systems only. A novel unbiased linear systems solver
based on parallelizable covariance matrix-vector products is developed to
accelerate the unbiased estimation of gradients. The results demonstrate the
possibility to enable scalable and exact (in a Monte Carlo sense)
quantification of uncertainty in Gaussian processes without imposing any
special structure on the covariance or reducing the number of input vectors.Comment: 10 pages - paper accepted at ICML 201
Preconditioning Kernel Matrices
The computational and storage complexity of kernel machines presents the
primary barrier to their scaling to large, modern, datasets. A common way to
tackle the scalability issue is to use the conjugate gradient algorithm, which
relieves the constraints on both storage (the kernel matrix need not be stored)
and computation (both stochastic gradients and parallelization can be used).
Even so, conjugate gradient is not without its own issues: the conditioning of
kernel matrices is often such that conjugate gradients will have poor
convergence in practice. Preconditioning is a common approach to alleviating
this issue. Here we propose preconditioned conjugate gradients for kernel
machines, and develop a broad range of preconditioners particularly useful for
kernel matrices. We describe a scalable approach to both solving kernel
machines and learning their hyperparameters. We show this approach is exact in
the limit of iterations and outperforms state-of-the-art approximations for a
given computational budget