Search CORE

14 research outputs found

Fast learning rates for plug-in classifiers

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than

n^{-1/2}

. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hal-Diderot

Fast learning rates for plug-in classifiers under the margin condition

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue
Publication date: 24/05/2011
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than

n^{-1/2}

. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page

arXiv.org e-Print Archive

Hal-Diderot

HAL-Ecole des Ponts ParisTech

Exponential convergence of testing error for stochastic gradient methods

Author: Bach Francis
Pillaud-Vivien Loucas
Rudi Alessandro
Publication venue
Publication date: 06/07/2018
Field of study

We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise conditions are assumed

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

Author: Nitanda Atsushi
Suzuki Taiji
Publication venue
Publication date: 20/04/2019
Field of study

We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the

L_2

-regularized logistic regression.Comment: 15 pages, 2 figure

arXiv.org e-Print Archive

Fast Convergence on Perfect Classification for Functional Data

Author: Imaizumi Masaaki
Wakayama Tomoya
Publication venue
Publication date: 07/04/2021
Field of study

In this study, we investigate the availability of approaching to perfect classification on functional data with finite samples. The seminal work (Delaigle and Hall (2012)) showed that classification on functional data is easier to define on a perfect classifier than on finite-dimensional data. This result is based on their finding that a sufficient condition for the existence of a perfect classifier, named a Delaigle--Hall (DH) condition, is only available for functional data. However, there is a danger that a large sample size is required to achieve the perfect classification even though the DH condition holds because a convergence of misclassification errors of functional data is significantly slow. Specifically, a minimax rate of the convergence of errors with functional data has a logarithm order in the sample size. This study solves this complication by proving that the DH condition also achieves fast convergence of the misclassification error in sample size. Therefore, we study a classifier with empirical risk minimization using reproducing kernel Hilbert space (RKHS) and analyse its convergence rate under the DH condition. The result shows that the convergence speed of the misclassification error by the RKHS classifier has an exponential order in sample size. Technically, the proof is based on the following points: (i) connecting the DH condition and a margin of classifiers, and (ii) handling metric entropy of functional data. Experimentally, we validate that the DH condition and the associated margin condition have a certain impact on the convergence rate of the RKHS classifier. We also find that some of the other classifiers for functional data have a similar property.Comment: 26 page

arXiv.org e-Print Archive