Search CORE

65,345 research outputs found

Fast learning rates for plug-in classifiers

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than

n^{-1/2}

. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Hal-Diderot

Fast learning rates for plug-in classifiers under the margin condition

Author: Audibert Jean-Yves
Tsybakov Alexandre B.
Publication venue
Publication date: 24/05/2011
Field of study

It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than

n^{-1/2}

. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order

n^{-1}

, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than

n^{-1}

. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page

arXiv.org e-Print Archive

Hal-Diderot

HAL-Ecole des Ponts ParisTech

Anisotropic oracle inequalities in noisy quantization

Author: Loustau Sébastien
Publication venue
Publication date: 26/04/2013
Field of study

The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample

Z_i=X_i+\epsilon_i,i=1,\ldots,n

, where

X_i

are i.i.d. with density

f

and

\epsilon_i

are i.i.d. with density

\eta

. These rates depend on the geometry of the density

f

and the asymptotic behaviour of the characteristic function of

\eta

. This general study can be applied to the problem of

k

-means clustering with noisy data. For this purpose, we introduce a deconvolution

k

-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141

arXiv.org e-Print Archive

Gibbs Max-margin Topic Models with Data Augmentation

Author: Chen Ning
Perkins Hugh
Zhang Bo
Zhu Jun
Publication venue
Publication date: 10/10/2013
Field of study

Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

arXiv.org e-Print Archive

CiteSeerX

Faster Rates for Policy Learning

Author: Chambaz Antoine
Luedtke Alexander
Publication venue
Publication date: 20/04/2017
Field of study

This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme

arXiv.org e-Print Archive

Generalization error for multi-class margin classification

Author: Shen Xiaotong
Wang Lifeng
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

In this article, we study rates of convergence of the generalization error of multi-class margin classifiers. In particular, we develop an upper bound theory quantifying the generalization error of various large margin classifiers. The theory permits a treatment of general margin losses, convex or nonconvex, in presence or absence of a dominating class. Three main results are established. First, for any fixed margin loss, there may be a trade-off between the ideal and actual generalization performances with respect to the choice of the class of candidate decision functions, which is governed by the trade-off between the approximation and estimation errors. In fact, different margin losses lead to different ideal or actual performances in specific cases. Second, we demonstrate, in a problem of linear learning, that the convergence rate can be arbitrarily fast in the sample size

n

depending on the joint distribution of the input/output pair. This goes beyond the anticipated rate

O(n^{-1})

. Third, we establish rates of convergence of several margin classifiers in feature selection with the number of candidate variables

p

allowed to greatly exceed the sample size

n

but no faster than

\exp(n)

.Comment: Published at http://dx.doi.org/10.1214/07-EJS069 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Fast rates in statistical and online learning

Author: Grünwald Peter D.
Mehta Nishant A.
Reid Mark D.
van Erven Tim
Williamson Robert C.
Publication venue
Publication date: 01/01/2015
Field of study

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

Robust classification via MOM minimization

Author: Lecué Guillaume
Lerasle Matthieu
Mathieu Timothée
Publication venue
Publication date: 09/08/2018
Field of study

We present an extension of Vapnik's classical empirical risk minimizer (ERM) where the empirical risk is replaced by a median-of-means (MOM) estimator, the new estimators are called MOM minimizers. While ERM is sensitive to corruption of the dataset for many classical loss functions used in classification, we show that MOM minimizers behave well in theory, in the sense that it achieves Vapnik's (slow) rates of convergence under weak assumptions: data are only required to have a finite second moment and some outliers may also have corrupted the dataset. We propose an algorithm inspired by MOM minimizers. These algorithms can be analyzed using arguments quite similar to those used for Stochastic Block Gradient descent. As a proof of concept, we show how to modify a proof of consistency for a descent algorithm to prove consistency of its MOM version. As MOM algorithms perform a smart subsampling, our procedure can also help to reduce substantially time computations and memory ressources when applied to non linear algorithms. These empirical performances are illustrated on both simulated and real datasets

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions

Author: Gadat Sébastien
Klein Thierry
Marteau Clément
Publication venue
Publication date: 01/11/2014
Field of study

Given an

n

-sample of random vectors

(X_i,Y_i)_{1 \leq i \leq n}

whose joint law is unknown, the long-standing problem of supervised classification aims to \textit{optimally} predict the label

Y

of a given a new observation

X

. In this context, the nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is

\mathbb{R}^d

. This paper is devoted to the study of the statistical properties of the nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of

X

, as well as the smoothness and margin properties of the \textit{regression function}

\eta(X) = \mathbb{E}[Y | X]

. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and to derive sharp estimates in the case of the nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure

arXiv.org e-Print Archive

Toulouse Capitole Publications

Toulouse 1 Capitole Publications