Search CORE

38 research outputs found

Lepskii Principle in Supervised Learning

Author: Blanchard Gilles
Mathé Peter
Mücke Nicole
Publication venue
Publication date: 26/05/2019
Field of study

In the setting of supervised learning using reproducing kernel methods, we propose a data-dependent regularization parameter selection rule that is adaptive to the unknown regularity of the target function and is optimal both for the least-square (prediction) error and for the reproducing kernel Hilbert space (reconstruction) norm error. It is based on a modified Lepskii balancing principle using a varying family of norms

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces

Author: Cevher Volkan
Lin Junhong
Rosasco Lorenzo
Rudi Alessandro
Publication venue: 'Elsevier BV'
Publication date: 01/10/2018
Field of study

In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral-regularized algorithms, including ridge regression, principal component analysis, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

INRIA a CCSD electronic archive server

Convergence rates of Kernel Conjugate Gradient for random design regression

Author: Blanchard Gilles
Krämer Nicole
Publication venue
Publication date: 08/07/2016
Field of study

We prove statistical rates of convergence for kernel-based least squares regression from i.i.d. data using a conjugate gradient algorithm, where regularization against overfitting is obtained by early stopping. This method is related to Kernel Partial Least Squares, a regression method that combines supervised dimensionality reduction with least squares projection. Following the setting introduced in earlier related literature, we study so-called "fast convergence rates" depending on the regularity of the target regression function (measured by a source condition in terms of the kernel integral operator) and on the effective dimensionality of the data mapped into the kernel space. We obtain upper bounds, essentially matching known minimax lower bounds, for the

\mathcal{L}^2

(prediction) norm as well as for the stronger Hilbert norm, if the true regression function belongs to the reproducing kernel Hilbert space. If the latter assumption is not fulfilled, we obtain similar convergence rates for appropriate norms, provided additional unlabeled data are available

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

Author: Bach Francis
Pillaud-Vivien Loucas
Rudi Alessandro
Publication venue
Publication date: 23/11/2018
Field of study

We consider stochastic gradient descent (SGD) for least-squares regression with potentially several passes over the data. While several passes have been widely reported to perform practically better in terms of predictive performance on unseen data, the existing theoretical analysis of SGD suggests that a single pass is statistically optimal. While this is true for low-dimensional easy problems, we show that for hard problems, multiple passes lead to statistically optimal predictions while single pass does not; we also show that in these hard models, the optimal number of passes over the data increases with sample size. In order to define the notion of hardness and show that our predictive performances are optimal, we consider potentially infinite-dimensional models and notions typically associated to kernel methods, namely, the decay of eigenvalues of the covariance matrix of the features and the complexity of the optimal predictor as measured through the covariance matrix. We illustrate our results on synthetic experiments with non-linear kernel methods and on a classical benchmark with a linear model

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Kernel Conjugate Gradient Methods with Random Projections

Author: Cevher Volkan
Lin Junhong
Publication venue
Publication date: 05/11/2018
Field of study

We propose and study kernel conjugate gradient methods (KCGM) with random projections for least-squares regression over a separable Hilbert space. Considering two types of random projections generated by randomized sketches and Nystr\"{o}m subsampling, we prove optimal statistical results with respect to variants of norms for the algorithms under a suitable stopping rule. Particularly, our results show that if the projection dimension is proportional to the effective dimension of the problem, KCGM with randomized sketches can generalize optimally, while achieving a computational advantage. As a corollary, we derive optimal rates for classic KCGM in the case that the target function may not be in the hypothesis space, filling a theoretical gap.Comment: 43 pages, 2 figure

arXiv.org e-Print Archive