14 research outputs found

    Fast learning rates for plug-in classifiers

    Full text link
    It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n−1/2n^{-1/2}. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n−1n^{-1}, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n−1n^{-1}. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast learning rates for plug-in classifiers under the margin condition

    Get PDF
    It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than n−1/2n^{-1/2}. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order n−1n^{-1}, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than n−1n^{-1}. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page

    Exponential convergence of testing error for stochastic gradient methods

    Get PDF
    We consider binary classification problems with positive definite kernels and square loss, and study the convergence rates of stochastic gradient methods. We show that while the excess testing loss (squared loss) converges slowly to zero as the number of observations (and thus iterations) goes to infinity, the testing error (classification error) converges exponentially fast if low-noise conditions are assumed

    Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors

    Full text link
    We consider stochastic gradient descent and its averaging variant for binary classification problems in a reproducing kernel Hilbert space. In the traditional analysis using a consistency property of loss functions, it is known that the expected classification error converges more slowly than the expected risk even when assuming a low-noise condition on the conditional label probabilities. Consequently, the resulting rate is sublinear. Therefore, it is important to consider whether much faster convergence of the expected classification error can be achieved. In recent research, an exponential convergence rate for stochastic gradient descent was shown under a strong low-noise condition but provided theoretical analysis was limited to the squared loss function, which is somewhat inadequate for binary classification tasks. In this paper, we show an exponential convergence of the expected classification error in the final phase of the stochastic gradient descent for a wide class of differentiable convex loss functions under similar assumptions. As for the averaged stochastic gradient descent, we show that the same convergence rate holds from the early phase of training. In experiments, we verify our analyses on the L2L_2-regularized logistic regression.Comment: 15 pages, 2 figure

    Fast Convergence on Perfect Classification for Functional Data

    Full text link
    In this study, we investigate the availability of approaching to perfect classification on functional data with finite samples. The seminal work (Delaigle and Hall (2012)) showed that classification on functional data is easier to define on a perfect classifier than on finite-dimensional data. This result is based on their finding that a sufficient condition for the existence of a perfect classifier, named a Delaigle--Hall (DH) condition, is only available for functional data. However, there is a danger that a large sample size is required to achieve the perfect classification even though the DH condition holds because a convergence of misclassification errors of functional data is significantly slow. Specifically, a minimax rate of the convergence of errors with functional data has a logarithm order in the sample size. This study solves this complication by proving that the DH condition also achieves fast convergence of the misclassification error in sample size. Therefore, we study a classifier with empirical risk minimization using reproducing kernel Hilbert space (RKHS) and analyse its convergence rate under the DH condition. The result shows that the convergence speed of the misclassification error by the RKHS classifier has an exponential order in sample size. Technically, the proof is based on the following points: (i) connecting the DH condition and a margin of classifiers, and (ii) handling metric entropy of functional data. Experimentally, we validate that the DH condition and the associated margin condition have a certain impact on the convergence rate of the RKHS classifier. We also find that some of the other classifiers for functional data have a similar property.Comment: 26 page
    corecore