471 research outputs found
A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity
We present a novel notion of complexity that interpolates between and
generalizes some classic existing complexity notions in learning theory: for
estimators like empirical risk minimization (ERM) with arbitrary bounded
losses, it is upper bounded in terms of data-independent Rademacher complexity;
for generalized Bayesian estimators, it is upper bounded by the data-dependent
information complexity (also known as stochastic or PAC-Bayesian,
complexity. For
(penalized) ERM, the new complexity reduces to (generalized) normalized maximum
likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence
regret. Our first main result bounds excess risk in terms of the new
complexity. Our second main result links the new complexity via Rademacher
complexity to entropy, thereby generalizing earlier results of Opper,
Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with .
Together, these results recover optimal bounds for VC- and large (polynomial
entropy) classes, replacing localized Rademacher complexity by a simpler
analysis which almost completely separates the two aspects that determine the
achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page
Universality of Bayesian mixture predictors
The problem is that of sequential probability forecasting for finite-valued
time series. The data is generated by an unknown probability distribution over
the space of all one-way infinite sequences. It is known that this measure
belongs to a given set C, but the latter is completely arbitrary (uncountably
infinite, without any structure given). The performance is measured with
asymptotic average log loss. In this work it is shown that the minimax
asymptotic performance is always attainable, and it is attained by a convex
combination of a countably many measures from the set C (a Bayesian mixture).
This was previously only known for the case when the best achievable asymptotic
error is 0. This also contrasts previous results that show that in the
non-realizable case all Bayesian mixtures may be suboptimal, while there is a
predictor that achieves the optimal performance
Relax and Localize: From Value to Algorithms
We show a principled way of deriving online learning algorithms from a
minimax analysis. Various upper bounds on the minimax value, previously thought
to be non-constructive, are shown to yield algorithms. This allows us to
seamlessly recover known methods and to derive new ones. Our framework also
captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2
forecaster. We emphasize that understanding the inherent complexity of the
learning problem leads to the development of algorithms.
We define local sequential Rademacher complexities and associated algorithms
that allow us to obtain faster rates in online learning, similarly to
statistical learning theory. Based on these localized complexities we build a
general adaptive method that can take advantage of the suboptimality of the
observed sequence.
We present a number of new algorithms, including a family of randomized
methods that use the idea of a "random playout". Several new versions of the
Follow-the-Perturbed-Leader algorithms are presented, as well as methods based
on the Littlestone's dimension, efficient methods for matrix completion with
trace norm, and algorithms for the problems of transductive learning and
prediction with static experts
Online Nonparametric Regression
We establish optimal rates for online regression for arbitrary classes of
regression functions in terms of the sequential entropy introduced in (Rakhlin,
Sridharan, Tewari, 2010). The optimal rates are shown to exhibit a phase
transition analogous to the i.i.d./statistical learning case, studied in
(Rakhlin, Sridharan, Tsybakov 2013). In the frequently encountered situation
when sequential entropy and i.i.d. empirical entropy match, our results point
to the interesting phenomenon that the rates for statistical learning with
squared loss and online nonparametric regression are the same.
In addition to a non-algorithmic study of minimax regret, we exhibit a
generic forecaster that enjoys the established optimal rates. We also provide a
recipe for designing online regression algorithms that can be computationally
efficient. We illustrate the techniques by deriving existing and new
forecasters for the case of finite experts and for online linear regression
Efficient Transductive Online Learning via Randomized Rounding
Most traditional online learning algorithms are based on variants of mirror
descent or follow-the-leader. In this paper, we present an online algorithm
based on a completely different approach, tailored for transductive settings,
which combines "random playout" and randomized rounding of loss subgradients.
As an application of our approach, we present the first computationally
efficient online algorithm for collaborative filtering with trace-norm
constrained matrices. As a second application, we solve an open question
linking batch learning and transductive online learningComment: To appear in a Festschrift in honor of V.N. Vapnik. Preliminary
version presented in NIPS 201
- …