Search CORE

9,935 research outputs found

Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm

Author: Geiger Mario
Spigler Stefano
Wyart Matthieu
Publication venue: 'IOP Publishing'
Publication date: 18/08/2020
Field of study

How many training data are needed to learn a supervised task? It is often observed that the generalization error decreases as

n^{-\beta}

where

n

is the number of training examples and

\beta

an exponent that depends on both data and algorithm. In this work we measure

\beta

when applying kernel methods to real datasets. For MNIST we find

\beta\approx 0.4

and for CIFAR10

\beta\approx 0.1

, for both regression and classification tasks, and for Gaussian or Laplace kernels. To rationalize the existence of non-trivial exponents that can be independent of the specific kernel used, we study the Teacher-Student framework for kernels. In this scheme, a Teacher generates data according to a Gaussian random field, and a Student learns them via kernel regression. With a simplifying assumption -- namely that the data are sampled from a regular lattice -- we derive analytically

\beta

for translation invariant kernels, using previous results from the kriging literature. Provided that the Student is not too sensitive to high frequencies,

\beta

depends only on the smoothness and dimension of the training data. We confirm numerically that these predictions hold when the training points are sampled at random on a hypersphere. Overall, the test error is found to be controlled by the magnitude of the projection of the true function on the kernel eigenvectors whose rank is larger than

n

. Using this idea we predict relate the exponent

\beta

to an exponent

a

describing how the coefficients of the true function in the eigenbasis of the kernel decay with rank. We extract

a

from real data by performing kernel PCA, leading to

\beta\approx0.36

for MNIST and

\beta\approx0.07

for CIFAR10, in good agreement with observations. We argue that these rather large exponents are possible due to the small effective dimension of the data.Comment: We added (i) the prediction of the exponent

\beta

for real data using kernel PCA; (ii) the generalization of our results to non-Gaussian data from reference [11] (Bordelon et al., "Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks"

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Kernel-Based Just-In-Time Learning for Passing Expectation Propagation Messages

Author: Eslami S. M. Ali
Gretton Arthur
Heess Nicolas
Jitkrittum Wittawat
Lakshminarayanan Balaji
Sejdinovic Dino
Szabó Zoltán
Publication venue
Publication date: 01/01/2015
Field of study

We propose an efficient nonparametric strategy for learning a message operator in expectation propagation (EP), which takes as input the set of incoming messages to a factor node, and produces an outgoing message as output. This learned operator replaces the multivariate integral required in classical EP, which may not have an analytic expression. We use kernel-based regression, which is trained on a set of probability distributions representing the incoming messages, and the associated outgoing messages. The kernel approach has two main advantages: first, it is fast, as it is implemented using a novel two-layer random feature representation of the input message distributions; second, it has principled uncertainty estimates, and can be cheaply updated online, meaning it can request and incorporate new training data when it encounters inputs on which it is uncertain. In experiments, our approach is able to solve learning problems where a single message operator is required for multiple, substantially different data sets (logistic regression for a variety of classification problems), where it is essential to accurately assess uncertainty and to efficiently and robustly update the message operator.Comment: accepted to UAI 2015. Correct typos. Add more content to the appendix. Main results unchange

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Oxford University Research Archive

Learning with Algebraic Invariances, and the Invariant Kernel Trick

Author: Király Franz J.
Müller Klaus-Robert
Ziehe Andreas
Publication venue
Publication date: 28/11/2014
Field of study

When solving data analysis problems it is important to integrate prior knowledge and/or structural invariances. This paper contributes by a novel framework for incorporating algebraic invariance structure into kernels. In particular, we show that algebraic properties such as sign symmetries in data, phase independence, scaling etc. can be included easily by essentially performing the kernel trick twice. We demonstrate the usefulness of our theory in simulations on selected applications such as sign-invariant spectral clustering and underdetermined ICA

arXiv.org e-Print Archive

UCL Discovery

Kernelized Hashcode Representations for Relation Extraction

Author: Cecchi Guillermo
Galstyan Aram
Gao Shuyang
Garg Sahil
Rish Irina
Steeg Greg Ver
Publication venue
Publication date: 20/05/2019
Field of study

Kernel methods have produced state-of-the-art results for a number of NLP tasks such as relation extraction, but suffer from poor scalability due to the high cost of computing kernel similarities between natural language structures. A recently proposed technique, kernelized locality-sensitive hashing (KLSH), can significantly reduce the computational cost, but is only applicable to classifiers operating on kNN graphs. Here we propose to use random subspaces of KLSH codes for efficiently constructing an explicit representation of NLP structures suitable for general classification methods. Further, we propose an approach for optimizing the KLSH model for classification problems by maximizing an approximation of mutual information between the KLSH codes (feature vectors) and the class labels. We evaluate the proposed approach on biomedical relation extraction datasets, and observe significant and robust improvements in accuracy w.r.t. state-of-the-art classifiers, along with drastic (orders-of-magnitude) speedup compared to conventional kernel methods.Comment: To appear in the proceedings of conference, AAAI-1

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications