323 research outputs found
Bayesian Inference of Log Determinants
The log-determinant of a kernel matrix appears in a variety of machine
learning problems, ranging from determinantal point processes and generalized
Markov random fields, through to the training of Gaussian processes. Exact
calculation of this term is often intractable when the size of the kernel
matrix exceeds a few thousand. In the spirit of probabilistic numerics, we
reinterpret the problem of computing the log-determinant as a Bayesian
inference problem. In particular, we combine prior knowledge in the form of
bounds from matrix theory and evidence derived from stochastic trace estimation
to obtain probabilistic estimates for the log-determinant and its associated
uncertainty within a given computational budget. Beyond its novelty and
theoretic appeal, the performance of our proposal is competitive with
state-of-the-art approaches to approximating the log-determinant, while also
quantifying the uncertainty due to budget-constrained evidence.Comment: 12 pages, 3 figure
Reducing the Variance of Gaussian Process Hyperparameter Optimization with Preconditioning
Gaussian processes remain popular as a flexible and expressive model class,
but the computational cost of kernel hyperparameter optimization stands as a
major limiting factor to their scaling and broader adoption. Recent work has
made great strides combining stochastic estimation with iterative numerical
techniques, essentially boiling down GP inference to the cost of (many)
matrix-vector multiplies. Preconditioning -- a highly effective step for any
iterative method involving matrix-vector multiplication -- can be used to
accelerate convergence and thus reduce bias in hyperparameter optimization.
Here, we prove that preconditioning has an additional benefit that has been
previously unexplored. It not only reduces the bias of the -marginal
likelihood estimator and its derivatives, but it also simultaneously can reduce
variance at essentially negligible cost. We leverage this result to derive
sample-efficient algorithms for GP hyperparameter optimization requiring as few
as instead of
samples to achieve error . Our
theoretical results enable provably efficient and scalable optimization of
kernel hyperparameters, which we validate empirically on a set of large-scale
benchmark problems. There, variance reduction via preconditioning results in an
order of magnitude speedup in hyperparameter optimization of exact GPs
Suboptimal subspace construction for log-determinant approximation
Variance reduction is a crucial idea for Monte Carlo simulation and the
stochastic Lanczos quadrature method is a dedicated method to approximate the
trace of a matrix function. Inspired by their advantages, we combine these two
techniques to approximate the log-determinant of large-scale symmetric positive
definite matrices. Key questions to be answered for such a method are how to
construct or choose an appropriate projection subspace and derive guaranteed
theoretical analysis. This paper applies some probabilistic approaches
including the projection-cost-preserving sketch and matrix concentration
inequalities to construct a suboptimal subspace. Furthermore, we provide some
insights on choosing design parameters in the underlying algorithm by deriving
corresponding approximation error and probabilistic error estimations.
Numerical experiments demonstrate our method's effectiveness and illustrate the
quality of the derived error bounds
An analysis on stochastic Lanczos quadrature with asymmetric quadrature nodes
The stochastic Lanczos quadrature method has garnered significant attention
recently. Upon examination of the error analyses given by Ubaru, Chen and Saad
and Cortinovis and Kressner, certain notable inconsistencies arise. It turns
out that the former's results are valid for cases with symmetric quadrature
nodes and may not be adequate for many practical cases such as estimating log
determinant of matrices. This paper analyzes probabilistic error bound of the
stochastic Lanczos quadrature method for cases with asymmetric quadrature
nodes. Besides, an optimized error allocation technique is employed to minimize
the overall number of matrix vector multiplications required by the stochastic
Lanczos quadrature method.Comment: 20 pages, 3 figure
Randomized matrix-free quadrature for spectrum and spectral sum approximation
We study randomized matrix-free quadrature algorithms for spectrum and
spectral sum approximation. The algorithms studied are characterized by the use
of a Krylov subspace method to approximate independent and identically
distributed samples of , where
is an isotropic random vector, is a Hermitian matrix,
and is a matrix function. This class of algorithms includes the
kernel polynomial method and stochastic Lanczos quadrature, two widely used
methods for approximating spectra and spectral sums. Our analysis, discussion,
and numerical examples provide a unified framework for understanding randomized
matrix-free quadrature and shed light on the commonalities and tradeoffs
between them. Moreover, this framework provides new insights into the practical
implementation and use of these algorithms, particularly with regards to
parameter selection in the kernel polynomial method
- …