14 research outputs found
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental learning rule for
statistical learning problems where the data is generated according to some
unknown distribution and returns a hypothesis chosen from a
fixed class with small loss . In the parametric setting,
depending upon ERM can have slow
or fast rates of convergence of the excess risk as a
function of the sample size . There exist several results that give
sufficient conditions for fast rates in terms of joint properties of ,
, and , such as the margin condition and the Bernstein
condition. In the non-statistical prediction with expert advice setting, there
is an analogous slow and fast rate phenomenon, and it is entirely characterized
in terms of the mixability of the loss (there being no role there for
or ). The notion of stochastic mixability builds a
bridge between these two models of learning, reducing to classical mixability
in a special case. The present paper presents a direct proof of fast rates for
ERM in terms of stochastic mixability of , and
in so doing provides new insight into the fast-rates phenomenon. The proof
exploits an old result of Kemperman on the solution to the general moment
problem. We also show a partial converse that suggests a characterization of
fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201
From Stochastic Mixability to Fast Rates
Empirical risk minimization (ERM) is a fundamental algorithm for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F,P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of `, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with experts setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ` (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (`,F,P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the generalized moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.
General nonexact oracle inequalities for classes with a subexponential envelope
We show that empirical risk minimization procedures and regularized empirical
risk minimization procedures satisfy nonexact oracle inequalities in an
unbounded framework, under the assumption that the class has a subexponential
envelope function. The main novelty, in addition to the boundedness assumption
free setup, is that those inequalities can yield fast rates even in situations
in which exact oracle inequalities only hold with slower rates. We apply these
results to show that procedures based on and nuclear norms
regularization functions satisfy oracle inequalities with a residual term that
decreases like for every -loss functions (), while only
assuming that the tail behavior of the input and output variables are well
behaved. In particular, no RIP type of assumption or "incoherence condition"
are needed to obtain fast residual terms in those setups. We also apply these
results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Regularization in kernel learning
Under mild assumptions on the kernel, we obtain the best known error rates in
a regularized learning scenario taking place in the corresponding reproducing
kernel Hilbert space (RKHS). The main novelty in the analysis is a proof that
one can use a regularization term that grows significantly slower than the
standard quadratic growth in the RKHS norm.Comment: Published in at http://dx.doi.org/10.1214/09-AOS728 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org