23,176 research outputs found
Preconditioning Kernel Matrices
The computational and storage complexity of kernel machines presents the
primary barrier to their scaling to large, modern, datasets. A common way to
tackle the scalability issue is to use the conjugate gradient algorithm, which
relieves the constraints on both storage (the kernel matrix need not be stored)
and computation (both stochastic gradients and parallelization can be used).
Even so, conjugate gradient is not without its own issues: the conditioning of
kernel matrices is often such that conjugate gradients will have poor
convergence in practice. Preconditioning is a common approach to alleviating
this issue. Here we propose preconditioned conjugate gradients for kernel
machines, and develop a broad range of preconditioners particularly useful for
kernel matrices. We describe a scalable approach to both solving kernel
machines and learning their hyperparameters. We show this approach is exact in
the limit of iterations and outperforms state-of-the-art approximations for a
given computational budget
Variational Bayesian multinomial probit regression with Gaussian process priors
It is well known in the statistics literature that augmenting binary and polychotomous response models with Gaussian latent variables enables exact Bayesian analysis via Gibbs sampling from the parameter posterior. By adopting such a data augmentation strategy, dispensing with priors over regression coefficients in favour of Gaussian Process (GP) priors over functions, and employing variational approximations to the full posterior we obtain efficient computational methods for Gaussian Process classification in the multi-class setting. The model augmentation with additional latent variables ensures full a posteriori class coupling whilst retaining the simple a priori independent GP covariance structure from which sparse approximations, such as multi-class Informative Vector Machines (IVM), emerge in a very natural and straightforward manner. This is the first time that a fully Variational Bayesian treatment for multi-class GP classification has been developed without having to resort to additional explicit approximations to the non-Gaussian likelihood term. Empirical comparisons with exact analysis via MCMC and Laplace approximations illustrate the utility of the variational approximation as a computationally economic alternative to full MCMC and it is shown to be more accurate than the Laplace approximation
Forecasting of commercial sales with large scale Gaussian Processes
This paper argues that there has not been enough discussion in the field of
applications of Gaussian Process for the fast moving consumer goods industry.
Yet, this technique can be important as it e.g., can provide automatic feature
relevance determination and the posterior mean can unlock insights on the data.
Significant challenges are the large size and high dimensionality of commercial
data at a point of sale. The study reviews approaches in the Gaussian Processes
modeling for large data sets, evaluates their performance on commercial sales
and shows value of this type of models as a decision-making tool for
management.Comment: 1o pages, 5 figure
Randomized Sketches of Convex Programs with Sharp Guarantees
Random projection (RP) is a classical technique for reducing storage and
computational costs. We analyze RP-based approximations of convex programs, in
which the original optimization problem is approximated by the solution of a
lower-dimensional problem. Such dimensionality reduction is essential in
computation-limited settings, since the complexity of general convex
programming can be quite high (e.g., cubic for quadratic programs, and
substantially higher for semidefinite programs). In addition to computational
savings, random projection is also useful for reducing memory usage, and has
useful properties for privacy-sensitive optimization. We prove that the
approximation ratio of this procedure can be bounded in terms of the geometry
of constraint set. For a broad class of random projections, including those
based on various sub-Gaussian distributions as well as randomized Hadamard and
Fourier transforms, the data matrix defining the cost function can be projected
down to the statistical dimension of the tangent cone of the constraints at the
original solution, which is often substantially smaller than the original
dimension. We illustrate consequences of our theory for various cases,
including unconstrained and -constrained least squares, support vector
machines, low-rank matrix estimation, and discuss implications on
privacy-sensitive optimization and some connections with de-noising and
compressed sensing
A Noise-Robust Fast Sparse Bayesian Learning Model
This paper utilizes the hierarchical model structure from the Bayesian Lasso
in the Sparse Bayesian Learning process to develop a new type of probabilistic
supervised learning approach. The hierarchical model structure in this Bayesian
framework is designed such that the priors do not only penalize the unnecessary
complexity of the model but will also be conditioned on the variance of the
random noise in the data. The hyperparameters in the model are estimated by the
Fast Marginal Likelihood Maximization algorithm which can achieve sparsity, low
computational cost and faster learning process. We compare our methodology with
two other popular learning models; the Relevance Vector Machine and the
Bayesian Lasso. We test our model on examples involving both simulated and
empirical data, and the results show that this approach has several performance
advantages, such as being fast, sparse and also robust to the variance in
random noise. In addition, our method can give out a more stable estimation of
variance of random error, compared with the other methods in the study.Comment: 15 page
- …