65,345 research outputs found

    Fast learning rates for plug-in classifiers

    Full text link
    It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, that is, rates faster than n1/2n^{-1/2}. The work on this subject has suggested the following two conjectures: (i) the best achievable fast rate is of the order n1n^{-1}, and (ii) the plug-in classifiers generally converge more slowly than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only fast, but also super-fast rates, that is, rates faster than n1n^{-1}. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: Published at http://dx.doi.org/10.1214/009053606000001217 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast learning rates for plug-in classifiers under the margin condition

    Get PDF
    It has been recently shown that, under the margin (or low noise) assumption, there exist classifiers attaining fast rates of convergence of the excess Bayes risk, i.e., the rates faster than n1/2n^{-1/2}. The works on this subject suggested the following two conjectures: (i) the best achievable fast rate is of the order n1n^{-1}, and (ii) the plug-in classifiers generally converge slower than the classifiers based on empirical risk minimization. We show that both conjectures are not correct. In particular, we construct plug-in classifiers that can achieve not only the fast, but also the {\it super-fast} rates, i.e., the rates faster than n1n^{-1}. We establish minimax lower bounds showing that the obtained rates cannot be improved.Comment: 36 page

    Anisotropic oracle inequalities in noisy quantization

    Get PDF
    The effect of errors in variables in quantization is investigated. We prove general exact and non-exact oracle inequalities with fast rates for an empirical minimization based on a noisy sample Zi=Xi+ϵi,i=1,,nZ_i=X_i+\epsilon_i,i=1,\ldots,n, where XiX_i are i.i.d. with density ff and ϵi\epsilon_i are i.i.d. with density η\eta. These rates depend on the geometry of the density ff and the asymptotic behaviour of the characteristic function of η\eta. This general study can be applied to the problem of kk-means clustering with noisy data. For this purpose, we introduce a deconvolution kk-means stochastic minimization which reaches fast rates of convergence under standard Pollard's regularity assumptions.Comment: 30 pages. arXiv admin note: text overlap with arXiv:1205.141

    Gibbs Max-margin Topic Models with Data Augmentation

    Full text link
    Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

    Faster Rates for Policy Learning

    Full text link
    This article improves the existing proven rates of regret decay in optimal policy estimation. We give a margin-free result showing that the regret decay for estimating a within-class optimal policy is second-order for empirical risk minimizers over Donsker classes, with regret decaying at a faster rate than the standard error of an efficient estimator of the value of an optimal policy. We also give a result from the classification literature that shows that faster regret decay is possible via plug-in estimation provided a margin condition holds. Four examples are considered. In these examples, the regret is expressed in terms of either the mean value or the median value; the number of possible actions is either two or finitely many; and the sampling scheme is either independent and identically distributed or sequential, where the latter represents a contextual bandit sampling scheme

    Generalization error for multi-class margin classification

    Full text link
    In this article, we study rates of convergence of the generalization error of multi-class margin classifiers. In particular, we develop an upper bound theory quantifying the generalization error of various large margin classifiers. The theory permits a treatment of general margin losses, convex or nonconvex, in presence or absence of a dominating class. Three main results are established. First, for any fixed margin loss, there may be a trade-off between the ideal and actual generalization performances with respect to the choice of the class of candidate decision functions, which is governed by the trade-off between the approximation and estimation errors. In fact, different margin losses lead to different ideal or actual performances in specific cases. Second, we demonstrate, in a problem of linear learning, that the convergence rate can be arbitrarily fast in the sample size nn depending on the joint distribution of the input/output pair. This goes beyond the anticipated rate O(n1)O(n^{-1}). Third, we establish rates of convergence of several margin classifiers in feature selection with the number of candidate variables pp allowed to greatly exceed the sample size nn but no faster than exp(n)\exp(n).Comment: Published at http://dx.doi.org/10.1214/07-EJS069 in the Electronic Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

    Robust classification via MOM minimization

    Full text link
    We present an extension of Vapnik's classical empirical risk minimizer (ERM) where the empirical risk is replaced by a median-of-means (MOM) estimator, the new estimators are called MOM minimizers. While ERM is sensitive to corruption of the dataset for many classical loss functions used in classification, we show that MOM minimizers behave well in theory, in the sense that it achieves Vapnik's (slow) rates of convergence under weak assumptions: data are only required to have a finite second moment and some outliers may also have corrupted the dataset. We propose an algorithm inspired by MOM minimizers. These algorithms can be analyzed using arguments quite similar to those used for Stochastic Block Gradient descent. As a proof of concept, we show how to modify a proof of consistency for a descent algorithm to prove consistency of its MOM version. As MOM algorithms perform a smart subsampling, our procedure can also help to reduce substantially time computations and memory ressources when applied to non linear algorithms. These empirical performances are illustrated on both simulated and real datasets

    Classification with the nearest neighbor rule in general finite dimensional spaces: necessary and sufficient conditions

    Get PDF
    Given an nn-sample of random vectors (Xi,Yi)1in(X_i,Y_i)_{1 \leq i \leq n} whose joint law is unknown, the long-standing problem of supervised classification aims to \textit{optimally} predict the label YY of a given a new observation XX. In this context, the nearest neighbor rule is a popular flexible and intuitive method in non-parametric situations. Even if this algorithm is commonly used in the machine learning and statistics communities, less is known about its prediction ability in general finite dimensional spaces, especially when the support of the density of the observations is Rd\mathbb{R}^d. This paper is devoted to the study of the statistical properties of the nearest neighbor rule in various situations. In particular, attention is paid to the marginal law of XX, as well as the smoothness and margin properties of the \textit{regression function} η(X)=E[YX]\eta(X) = \mathbb{E}[Y | X]. We identify two necessary and sufficient conditions to obtain uniform consistency rates of classification and to derive sharp estimates in the case of the nearest neighbor rule. Some numerical experiments are proposed at the end of the paper to help illustrate the discussion.Comment: 53 Pages, 3 figure
    corecore