9,373 research outputs found
Fast learning rates for plug-in classifiers
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, that is, rates faster than . The work on this subject has
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge more
slowly than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only fast, but also super-fast rates, that is,
rates faster than . We establish minimax lower bounds showing that the
obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Fast learning rates for plug-in classifiers under the margin condition
It has been recently shown that, under the margin (or low noise) assumption,
there exist classifiers attaining fast rates of convergence of the excess Bayes
risk, i.e., the rates faster than . The works on this subject
suggested the following two conjectures: (i) the best achievable fast rate is
of the order , and (ii) the plug-in classifiers generally converge
slower than the classifiers based on empirical risk minimization. We show that
both conjectures are not correct. In particular, we construct plug-in
classifiers that can achieve not only the fast, but also the {\it super-fast}
rates, i.e., the rates faster than . We establish minimax lower bounds
showing that the obtained rates cannot be improved.Comment: 36 page
Generalization bounds for averaged classifiers
We study a simple learning algorithm for binary classification. Instead of
predicting with the best hypothesis in the hypothesis class, that is, the
hypothesis that minimizes the training error, our algorithm predicts with a
weighted average of all hypotheses, weighted exponentially with respect to
their training error. We show that the prediction of this algorithm is much
more stable than the prediction of an algorithm that predicts with the best
hypothesis. By allowing the algorithm to abstain from predicting on some
examples, we show that the predictions it makes when it does not abstain are
very reliable. Finally, we show that the probability that the algorithm
abstains is comparable to the generalization error of the best hypothesis in
the class.Comment: Published by the Institute of Mathematical Statistics
(http://www.imstat.org) in the Annals of Statistics
(http://www.imstat.org/aos/) at http://dx.doi.org/10.1214/00905360400000005
Multiclass Learning with Simplex Coding
In this paper we discuss a novel framework for multiclass learning, defined
by a suitable coding/decoding strategy, namely the simplex coding, that allows
to generalize to multiple classes a relaxation approach commonly used in binary
classification. In this framework, a relaxation error analysis can be developed
avoiding constraints on the considered hypotheses class. Moreover, we show that
in this setting it is possible to derive the first provably consistent
regularized method with training/tuning complexity which is independent to the
number of classes. Tools from convex analysis are introduced that can be used
beyond the scope of this paper
On the Sample Complexity of Predictive Sparse Coding
The goal of predictive sparse coding is to learn a representation of examples
as sparse linear combinations of elements from a dictionary, such that a
learned hypothesis linear in the new representation performs well on a
predictive task. Predictive sparse coding algorithms recently have demonstrated
impressive performance on a variety of supervised tasks, but their
generalization properties have not been studied. We establish the first
generalization error bounds for predictive sparse coding, covering two
settings: 1) the overcomplete setting, where the number of features k exceeds
the original dimensionality d; and 2) the high or infinite-dimensional setting,
where only dimension-free bounds are useful. Both learning bounds intimately
depend on stability properties of the learned sparse encoder, as measured on
the training sample. Consequently, we first present a fundamental stability
result for the LASSO, a result characterizing the stability of the sparse codes
with respect to perturbations to the dictionary. In the overcomplete setting,
we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with
respect to d and k. In the high or infinite-dimensional setting, we show a
dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and
s, where s is an upper bound on the number of non-zeros in the sparse code for
any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed
considerably using a new notion of coding margin. Old Sparse Coding Stability
Theorem still in new version, now as Theorem 2. Presentation of all proofs
simplified/improved considerably. Paper reorganized. Empirical analysis
showing new coding margin is non-trivial on real dataset
- …