19 research outputs found
Sharper lower bounds on the performance of the empirical risk minimization algorithm
We present an argument based on the multidimensional and the uniform central
limit theorems, proving that, under some geometrical assumptions between the
target function and the learning class , the excess risk of the
empirical risk minimization algorithm is lower bounded by
where
is a canonical Gaussian process associated with (a well chosen subset of
) and is a parameter governing the oscillations of the empirical
excess risk function over a small ball in .Comment: Published in at http://dx.doi.org/10.3150/09-BEJ225 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
General nonexact oracle inequalities for classes with a subexponential envelope
We show that empirical risk minimization procedures and regularized empirical
risk minimization procedures satisfy nonexact oracle inequalities in an
unbounded framework, under the assumption that the class has a subexponential
envelope function. The main novelty, in addition to the boundedness assumption
free setup, is that those inequalities can yield fast rates even in situations
in which exact oracle inequalities only hold with slower rates. We apply these
results to show that procedures based on and nuclear norms
regularization functions satisfy oracle inequalities with a residual term that
decreases like for every -loss functions (), while only
assuming that the tail behavior of the input and output variables are well
behaved. In particular, no RIP type of assumption or "incoherence condition"
are needed to obtain fast residual terms in those setups. We also apply these
results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental learning rule for
statistical learning problems where the data is generated according to some
unknown distribution and returns a hypothesis chosen from a
fixed class with small loss . In the parametric setting,
depending upon ERM can have slow
or fast rates of convergence of the excess risk as a
function of the sample size . There exist several results that give
sufficient conditions for fast rates in terms of joint properties of ,
, and , such as the margin condition and the Bernstein
condition. In the non-statistical prediction with expert advice setting, there
is an analogous slow and fast rate phenomenon, and it is entirely characterized
in terms of the mixability of the loss (there being no role there for
or ). The notion of stochastic mixability builds a
bridge between these two models of learning, reducing to classical mixability
in a special case. The present paper presents a direct proof of fast rates for
ERM in terms of stochastic mixability of , and
in so doing provides new insight into the fast-rates phenomenon. The proof
exploits an old result of Kemperman on the solution to the general moment
problem. We also show a partial converse that suggests a characterization of
fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201
A Stochastic View of Optimal Regret through Minimax Duality
We study the regret of optimal strategies for online convex optimization
games. Using von Neumann's minimax theorem, we show that the optimal regret in
this adversarial setting is closely related to the behavior of the empirical
minimization algorithm in a stochastic process setting: it is equal to the
maximum, over joint distributions of the adversary's action sequence, of the
difference between a sum of minimal expected losses and the minimal empirical
loss. We show that the optimal regret has a natural geometric interpretation,
since it can be viewed as the gap in Jensen's inequality for a concave
functional--the minimizer over the player's actions of expected loss--defined
on a set of probability distributions. We use this expression to obtain upper
and lower bounds on the regret of an optimal strategy for a variety of online
learning problems. Our method provides upper bounds without the need to
construct a learning algorithm; the lower bounds provide explicit optimal
strategies for the adversary
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental algorithm for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F,P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of `, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with experts setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ` (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (`,F,P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the generalized moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.
Complementary Labels Learning with Augmented Classes
Complementary Labels Learning (CLL) arises in many real-world tasks such as
private questions classification and online learning, which aims to alleviate
the annotation cost compared with standard supervised learning. Unfortunately,
most previous CLL algorithms were in a stable environment rather than an open
and dynamic scenarios, where data collected from unseen augmented classes in
the training process might emerge in the testing phase. In this paper, we
propose a novel problem setting called Complementary Labels Learning with
Augmented Classes (CLLAC), which brings the challenge that classifiers trained
by complementary labels should not only be able to classify the instances from
observed classes accurately, but also recognize the instance from the Augmented
Classes in the testing phase. Specifically, by using unlabeled data, we propose
an unbiased estimator of classification risk for CLLAC, which is guaranteed to
be provably consistent. Moreover, we provide generalization error bound for
proposed method which shows that the optimal parametric convergence rate is
achieved for estimation error. Finally, the experimental results on several
benchmark datasets verify the effectiveness of the proposed method