1 research outputs found
Bayesian Gaussian Process Models: PAC-Bayesian Generalisation Error Bounds and Sparse Approximations
Institute for Adaptive and Neural ComputationNon-parametric models and techniques enjoy a growing popularity in the field of
machine learning, and among these Bayesian inference for Gaussian process (GP)
models has recently received significant attention. We feel that GP priors should
be part of the standard toolbox for constructing models relevant to machine
learning in the same way as parametric linear models are, and the results in this
thesis help to remove some obstacles on the way towards this goal.
In the first main chapter, we provide a distribution-free finite sample bound
on the difference between generalisation and empirical (training) error for GP
classification methods. While the general theorem (the PAC-Bayesian bound)
is not new, we give a much simplified and somewhat generalised derivation and
point out the underlying core technique (convex duality) explicitly. Furthermore,
the application to GP models is novel (to our knowledge). A central feature of
this bound is that its quality depends crucially on task knowledge being encoded
faithfully in the model and prior distributions, so there is a mutual benefit between
a sharp theoretical guarantee and empirically well-established statistical
practices. Extensive simulations on real-world classification tasks indicate an impressive
tightness of the bound, in spite of the fact that many previous bounds
for related kernel machines fail to give non-trivial guarantees in this practically
relevant regime.
In the second main chapter, sparse approximations are developed to address
the problem of the unfavourable scaling of most GP techniques with large training
sets. Due to its high importance in practice, this problem has received a lot of attention
recently. We demonstrate the tractability and usefulness of simple greedy
forward selection with information-theoretic criteria previously used in active
learning (or sequential design) and develop generic schemes for automatic model
selection with many (hyper)parameters. We suggest two new generic schemes and
evaluate some of their variants on large real-world classification and regression
tasks. These schemes and their underlying principles (which are clearly stated
and analysed) can be applied to obtain sparse approximations for a wide regime
of GP models far beyond the special cases we studied here