14 research outputs found
Fast learning rates for plug-in classifiers
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, that is, rates faster than . The work on this subject has
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge more
slowly than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only fast, but also super-fast rates, that is,
rates faster than . We establish minimax lower bounds showing that the
obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fast learning rates for plug-in classifiers under the margin condition
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, i.e., the rates faster than . The works on this subject
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge
slower than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only the fast, but also the {\it super-fast}
rates, i.e., the rates faster than . We establish minimax lower bounds
showing that the obtained rates cannot be improved.Comment: 36 page
Exponential convergence of testing error for stochastic gradient methods
We consider binary classification problems with positive definite kernels and
square loss, and study the convergence rates of stochastic gradient methods. We
show that while the excess testing loss (squared loss) converges slowly to zero
as the number of observations (and thus iterations) goes to infinity, the
testing error (classification error) converges exponentially fast if low-noise
conditions are assumed
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
We consider stochastic gradient descent and its averaging variant for binary
classification problems in a reproducing kernel Hilbert space. In the
traditional analysis using a consistency property of loss functions, it is
known that the expected classification error converges more slowly than the
expected risk even when assuming a low-noise condition on the conditional label
probabilities. Consequently, the resulting rate is sublinear. Therefore, it is
important to consider whether much faster convergence of the expected
classification error can be achieved. In recent research, an exponential
convergence rate for stochastic gradient descent was shown under a strong
low-noise condition but provided theoretical analysis was limited to the
squared loss function, which is somewhat inadequate for binary classification
tasks. In this paper, we show an exponential convergence of the expected
classification error in the final phase of the stochastic gradient descent for
a wide class of differentiable convex loss functions under similar assumptions.
As for the averaged stochastic gradient descent, we show that the same
convergence rate holds from the early phase of training. In experiments, we
verify our analyses on the -regularized logistic regression.Comment: 15 pages, 2 figure
Fast Convergence on Perfect Classification for Functional Data
In this study, we investigate the availability of approaching to perfect
classification on functional data with finite samples. The seminal work
(Delaigle and Hall (2012)) showed that classification on functional data is
easier to define on a perfect classifier than on finite-dimensional data. This
result is based on their finding that a sufficient condition for the existence
of a perfect classifier, named a Delaigle--Hall (DH) condition, is only
available for functional data. However, there is a danger that a large sample
size is required to achieve the perfect classification even though the DH
condition holds because a convergence of misclassification errors of functional
data is significantly slow. Specifically, a minimax rate of the convergence of
errors with functional data has a logarithm order in the sample size. This
study solves this complication by proving that the DH condition also achieves
fast convergence of the misclassification error in sample size. Therefore, we
study a classifier with empirical risk minimization using reproducing kernel
Hilbert space (RKHS) and analyse its convergence rate under the DH condition.
The result shows that the convergence speed of the misclassification error by
the RKHS classifier has an exponential order in sample size. Technically, the
proof is based on the following points: (i) connecting the DH condition and a
margin of classifiers, and (ii) handling metric entropy of functional data.
Experimentally, we validate that the DH condition and the associated margin
condition have a certain impact on the convergence rate of the RKHS classifier.
We also find that some of the other classifiers for functional data have a
similar property.Comment: 26 page