Search CORE

25,720 research outputs found

Parallelizable sparse inverse formulation Gaussian processes (SpInGP)

Author: Grigorievskiy Alexander
Lawrence Neil
Särkkä Simo
Publication venue
Publication date: 27/09/2017
Field of study

We propose a parallelizable sparse inverse formulation Gaussian process (SpInGP) for temporal models. It uses a sparse precision GP formulation and sparse matrix routines to speed up the computations. Due to the state-space formulation used in the algorithm, the time complexity of the basic SpInGP is linear, and because all the computations are parallelizable, the parallel form of the algorithm is sublinear in the number of data points. We provide example algorithms to implement the sparse matrix routines and experimentally test the method using both simulated and real data.Comment: Presented at Machine Learning in Signal Processing (MLSP2017

arXiv.org e-Print Archive

Crossref

Absolute Moments of Generalized Hyperbolic Distributions and Approximate Scaling of Normal Inverse Gaussian Lévy-Processes

Author: Barndorff-Nielsen Ole Eiler
Stelzer Robert
Publication venue
Publication date: 01/01/2004
Field of study

Expressions for (absolute) moments of generalized hyperbolic (GH) and normal inverse Gaussian (NIG) laws are given in terms of moments of the corresponding symmetric laws. For the (absolute) moments centered at the location parameter mu explicit expressions as series containing Bessel functions are provided. Furthermore the derivatives of the logarithms of (absolute) mu-centered moments with respect to the logarithm of time are calculated explicitly for NIG Levy processes. Computer implementation of the formulae obtained is briefly discussed. Finally some further insight into the apparent scaling behaviour of NIG Levy processes (previously discussed in Barndorff-Nielsen and Prause (2001)) is gained

CiteSeerX

Open Access LMU

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Author: Geiger Mario
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/08/2020
Field of study

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as

n^{-\beta}

where

n

is the number of training examples and

\beta

an exponent that depends on both data and algorithm. In this work we measure

\beta

when applying kernel methods to real datasets. For MNIST we find

\beta\approx 0.4

and for CIFAR10

\beta\approx 0.1

, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically

\beta

for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies,

\beta

depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than

n

. Using this idea we predict relate the exponent

\beta

to an exponent

a

describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract

a

from real data by performing kernel PCA, leading to

\beta\approx0.36

for MNIST and

\beta\approx0.07

for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent

\beta

for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Student-t Processes as Alternatives to Gaussian Processes

Author: Ghahramani Zoubin
Shah Amar
Wilson Andrew Gordon
Publication venue
Publication date: 19/02/2014
Field of study

We investigate the Student-t process as an alternative to the Gaussian process as a nonparametric prior over functions. We derive closed form expressions for the marginal likelihood and predictive distribution of a Student-t process, by integrating away an inverse Wishart process prior over the covariance kernel of a Gaussian process model. We show surprising equivalences between different hierarchical Gaussian process models leading to Student-t processes, and derive a new sampling scheme for the inverse Wishart process, which helps elucidate these equivalences. Overall, we show that a Student-t process can retain the attractive properties of a Gaussian process -- a nonparametric representation, analytic marginal and predictive distributions, and easy model selection through covariance kernels -- but has enhanced flexibility, and predictive covariances that, unlike a Gaussian process, explicitly depend on the values of training observations. We verify empirically that a Student-t process is especially useful in situations where there are changes in covariance structure, or in applications like Bayesian optimization, where accurate predictive covariances are critical for good performance. These advantages come at no additional computational cost over Gaussian processes.Comment: 13 pages, 6 figures, 1 table. To appear in "The Seventeenth International Conference on Artificial Intelligence and Statistics (AISTATS), 2014.

arXiv.org e-Print Archive

CiteSeerX