Search CORE

208 research outputs found

From Stochastic Mixability to Fast Rates

Author: Mehta Nishant A.
Williamson Robert C.
Publication venue
Publication date: 22/11/2014
Field of study

Empirical risk minimization (ERM) is a fundamental learning rule for statistical learning problems where the data is generated according to some unknown distribution

\mathsf{P}

and returns a hypothesis

f

chosen from a fixed class

\mathcal{F}

with small loss

\ell

. In the parametric setting, depending upon

(\ell, \mathcal{F},\mathsf{P})

ERM can have slow

(1/\sqrt{n})

or fast

(1/n)

rates of convergence of the excess risk as a function of the sample size

n

. There exist several results that give sufficient conditions for fast rates in terms of joint properties of

\ell

\mathcal{F}

, and

\mathsf{P}

, such as the margin condition and the Bernstein condition. In the non-statistical prediction with expert advice setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss

\ell

(there being no role there for

\mathcal{F}

\mathsf{P}

). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of

(\ell,\mathcal{F}, \mathsf{P})

, and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the general moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.Comment: 21 pages, accepted to NIPS 201

arXiv.org e-Print Archive

CiteSeerX

Fast rates in statistical and online learning

Author: Grünwald Peter D.
Mehta Nishant A.
Reid Mark D.
van Erven Tim
Williamson Robert C.
Publication venue
Publication date: 01/01/2015
Field of study

The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

arXiv.org e-Print Archive

CWI's Institutional Repository

Leiden University Scholary Publications

From Stochastic Mixability to Fast Rates

Author: Mehta Nishant A
Williamson Robert
Publication venue: Neural Information Processing Systems Foundation
Publication date: 14/06/2016
Field of study

Empirical risk minimization (ERM) is a fundamental algorithm for statistical learning problems where the data is generated according to some unknown distribution P and returns a hypothesis f chosen from a fixed class F with small loss `. In the parametric setting, depending upon (`,F,P) ERM can have slow (1/ n) or fast (1/n) rates of convergence of the excess risk as a function of the sample size n. There exist several results that give sufficient conditions for fast rates in terms of joint properties of `, F, and P, such as the margin condition and the Bernstein condition. In the non-statistical prediction with experts setting, there is an analogous slow and fast rate phenomenon, and it is entirely characterized in terms of the mixability of the loss ` (there being no role there for F or P). The notion of stochastic mixability builds a bridge between these two models of learning, reducing to classical mixability in a special case. The present paper presents a direct proof of fast rates for ERM in terms of stochastic mixability of (`,F,P), and in so doing provides new insight into the fast-rates phenomenon. The proof exploits an old result of Kemperman on the solution to the generalized moment problem. We also show a partial converse that suggests a characterization of fast rates for ERM in terms of stochastic mixability is possible.

CiteSeerX

The Australian National University

Generalized Mixability via Entropic Duality

Author: Frongillo Rafael M.
Mehta Nishant
Reid Mark D.
Williamson Robert C.
Publication venue
Publication date: 23/06/2014
Field of study

Mixability is a property of a loss which characterizes when fast convergence is possible in the game of prediction with expert advice. We show that a key property of mixability generalizes, and the exp and log operations present in the usual theory are not as special as one might have thought. In doing this we introduce a more general notion of

\Phi

-mixability where

\Phi

is a general entropy (\ie, any convex function on probabilities). We show how a property shared by the convex dual of any such entropy yields a natural algorithm (the minimizer of a regret bound) which, analogous to the classical aggregating algorithm, is guaranteed a constant regret when used with

\Phi

-mixable losses. We characterize precisely which

\Phi

have

\Phi

-mixable losses and put forward a number of conjectures about the optimality and relationships between different choices of entropy.Comment: 20 pages, 1 figure. Supersedes the work in arXiv:1403.2433 [cs.LG

arXiv.org e-Print Archive

CiteSeerX

Generalised Mixability, Constant Regret, and Bayesian Updating

Author: Frongillo Rafael M.
Reid Mark D.
Williamson Robert C.
Publication venue
Publication date: 10/03/2014
Field of study

Mixability of a loss is known to characterise when constant regret bounds are achievable in games of prediction with expert advice through the use of Vovk's aggregating algorithm. We provide a new interpretation of mixability via convex analysis that highlights the role of the Kullback-Leibler divergence in its definition. This naturally generalises to what we call

\Phi

-mixability where the Bregman divergence

D_\Phi

replaces the KL divergence. We prove that losses that are

\Phi

-mixable also enjoy constant regret bounds via a generalised aggregating algorithm that is similar to mirror descent.Comment: 12 page

arXiv.org e-Print Archive

CiteSeerX