Search CORE

14 research outputs found

From Stochastic Mixability to Fast Rates

Author: Mehta Nishant A.
Williamson Robert C.
Publication venue
Publication date: 22/11/2014
Field of study

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution

\mathsf{P}

and returns a hypothesis

f

chosen from a fixed class

\mathcal{F}

with small loss

\ell

. In the parametric setting, depending upon

(\ell, \mathcal{F},\mathsf{P})

ERM can have slow

(1/\sqrt{n})

or fast

(1/n)

rates of convergence of the excess risk as a function of the sample size

n

. There exist several results that give sufficient conditions for fast rates in terms of joint properties of

\ell

\mathcal{F}

, and

\mathsf{P}

, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss

\ell

(there being no role there for

\mathcal{F}

\mathsf{P}

). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of

(\ell,\mathcal{F}, \mathsf{P})

, and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201

arXiv.org e-Print Archive

CiteSeerX

From Stochastic Mixability to Fast Rates

Author: Mehta Nishant A
Williamson Robert
Publication venue: Neural Information Processing Systems Foundation
Publication date: 14/06/2016
Field of study

Empirical risk minimization (ERM) is a fundamental algorithm for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F,P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of `, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with experts setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ` (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (`,F,P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the generalized moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

CiteSeerX

The Australian National University

General nonexact oracle inequalities for classes with a subexponential envelope

Author: Lecué Guillaume
Mendelson Shahar
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 05/06/2012
Field of study

We show that empirical risk minimization procedures and regularized empirical risk minimization procedures satisfy nonexact oracle inequalities in an unbounded framework, under the assumption that the class has a subexponential envelope function. The main novelty, in addition to the boundedness assumption free setup, is that those inequalities can yield fast rates even in situations in which exact oracle inequalities only hold with slower rates. We apply these results to show that procedures based on

\ell_1

and nuclear norms regularization functions satisfy oracle inequalities with a residual term that decreases like

1/n

for every

L_q

-loss functions (

q\geq2

), while only assuming that the tail behavior of the input and output variables are well behaved. In particular, no RIP type of assumption or "incoherence condition" are needed to obtain fast residual terms in those setups. We also apply these results to the problems of convex aggregation and model selection.Comment: Published in at http://dx.doi.org/10.1214/11-AOS965 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

The Australian National University

HAL - UPEC / UPEM

On the optimality of the empirical risk minimization procedure for the convex aggregation problem

Author: Lecué Guillaume
Mendelson Shahar
Publication venue
Publication date: 01/01/2013
Field of study

Numérisation de Documents Anciens Mathématiques

Regularization in kernel learning

Author: Mendelson Shahar
Neeman Joseph
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Under mild assumptions on the kernel, we obtain the best known error rates in a regularized learning scenario taking place in the corresponding reproducing kernel Hilbert space (RKHS). The main novelty in the analysis is a proof that one can use a regularization term that grows significantly slower than the standard quadratic growth in the RKHS norm.Comment: Published in at http://dx.doi.org/10.1214/09-AOS728 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Australian National University