19 research outputs found

    Sharper lower bounds on the performance of the empirical risk minimization algorithm

    Full text link
    We present an argument based on the multidimensional and the uniform central limit theorems, proving that, under some geometrical assumptions between the target function TT and the learning class FF, the excess risk of the empirical risk minimization algorithm is lower bounded by EsupqQGqnδ,\frac{\mathbb{E}\sup_{q\in Q}G_q}{\sqrt{n}}\delta, where (Gq)qQ(G_q)_{q\in Q} is a canonical Gaussian process associated with QQ (a well chosen subset of FF) and δ\delta is a parameter governing the oscillations of the empirical excess risk function over a small ball in FF.Comment: Published in at http://dx.doi.org/10.3150/09-BEJ225 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

    General nonexact oracle inequalities for classes with a subexponential envelope

    Full text link
    We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures based on 1\ell_1 and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like 1/n1/n for every LqL_q-loss functions (q2q\geq2), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or "incoherence condition" are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    From Stochastic Mixability to Fast Rates

    Full text link
    Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution P\mathsf{P} and returns a hypothesis ff chosen from a fixed class F\mathcal{F} with small loss \ell. In the parametric setting, depending upon (,F,P)(\ell, \mathcal{F},\mathsf{P}) ERM can have slow (1/n)(1/\sqrt{n}) or fast (1/n)(1/n) rates of convergence of the excess risk as a function of the sample size nn. There exist several results that give sufficient conditions for fast rates in terms of joint properties of \ell, F\mathcal{F}, and P\mathsf{P}, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss \ell (there being no role there for F\mathcal{F} or P\mathsf{P}). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (,F,P)(\ell,\mathcal{F}, \mathsf{P}), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201

    A Stochastic View of Optimal Regret through Minimax Duality

    Get PDF
    We study the regret of optimal strategies for online convex optimization games. Using von Neumann's minimax theorem, we show that the optimal regret in this adversarial setting is closely related to the behavior of the empirical minimization algorithm in a stochastic process setting: it is equal to the maximum, over joint distributions of the adversary's action sequence, of the difference between a sum of minimal expected losses and the minimal empirical loss. We show that the optimal regret has a natural geometric interpretation, since it can be viewed as the gap in Jensen's inequality for a concave functional--the minimizer over the player's actions of expected loss--defined on a set of probability distributions. We use this expression to obtain upper and lower bounds on the regret of an optimal strategy for a variety of online learning problems. Our method provides upper bounds without the need to construct a learning algorithm; the lower bounds provide explicit optimal strategies for the adversary

    From Stochastic Mixability to Fast Rates

    Get PDF
    Empirical risk minimization (ERM) is a fundamental algorithm for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F,P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of `, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with experts setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ` (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (`,F,P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the generalized moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

    Complementary Labels Learning with Augmented Classes

    Full text link
    Complementary Labels Learning (CLL) arises in many real-world tasks such as private questions classification and online learning, which aims to alleviate the annotation cost compared with standard supervised learning. Unfortunately, most previous CLL algorithms were in a stable environment rather than an open and dynamic scenarios, where data collected from unseen augmented classes in the training process might emerge in the testing phase. In this paper, we propose a novel problem setting called Complementary Labels Learning with Augmented Classes (CLLAC), which brings the challenge that classifiers trained by complementary labels should not only be able to classify the instances from observed classes accurately, but also recognize the instance from the Augmented Classes in the testing phase. Specifically, by using unlabeled data, we propose an unbiased estimator of classification risk for CLLAC, which is guaranteed to be provably consistent. Moreover, we provide generalization error bound for proposed method which shows that the optimal parametric convergence rate is achieved for estimation error. Finally, the experimental results on several benchmark datasets verify the effectiveness of the proposed method