97 research outputs found
Optimization, Learning, and Games with Predictable Sequences
We provide several applications of Optimistic Mirror Descent, an online
learning algorithm based on the idea of predictable sequences. First, we
recover the Mirror Prox algorithm for offline optimization, prove an extension
to Holder-smooth functions, and apply the results to saddle-point type
problems. Next, we prove that a version of Optimistic Mirror Descent (which has
a close relation to the Exponential Weights algorithm) can be used by two
strongly-uncoupled players in a finite zero-sum matrix game to converge to the
minimax equilibrium at the rate of O((log T)/T). This addresses a question of
Daskalakis et al 2011. Further, we consider a partial information version of
the problem. We then apply the results to convex programming and exhibit a
simple algorithm for the approximate Max Flow problem
Online Nonparametric Regression
We establish optimal rates for online regression for arbitrary classes of
regression functions in terms of the sequential entropy introduced in (Rakhlin,
Sridharan, Tewari, 2010). The optimal rates are shown to exhibit a phase
transition analogous to the i.i.d./statistical learning case, studied in
(Rakhlin, Sridharan, Tsybakov 2013). In the frequently encountered situation
when sequential entropy and i.i.d. empirical entropy match, our results point
to the interesting phenomenon that the rates for statistical learning with
squared loss and online nonparametric regression are the same.
In addition to a non-algorithmic study of minimax regret, we exhibit a
generic forecaster that enjoys the established optimal rates. We also provide a
recipe for designing online regression algorithms that can be computationally
efficient. We illustrate the techniques by deriving existing and new
forecasters for the case of finite experts and for online linear regression
Hierarchies of Relaxations for Online Prediction Problems with Evolving Constraints
We study online prediction where regret of the algorithm is measured against
a benchmark defined via evolving constraints. This framework captures online
prediction on graphs, as well as other prediction problems with combinatorial
structure. A key aspect here is that finding the optimal benchmark predictor
(even in hindsight, given all the data) might be computationally hard due to
the combinatorial nature of the constraints. Despite this, we provide
polynomial-time \emph{prediction} algorithms that achieve low regret against
combinatorial benchmark sets. We do so by building improper learning algorithms
based on two ideas that work together. The first is to alleviate part of the
computational burden through random playout, and the second is to employ
Lasserre semidefinite hierarchies to approximate the resulting integer program.
Interestingly, for our prediction algorithms, we only need to compute the
values of the semidefinite programs and not the rounded solutions. However, the
integrality gap for Lasserre hierarchy \emph{does} enter the generic regret
bound in terms of Rademacher complexity of the benchmark set. This establishes
a trade-off between the computation time and the regret bound of the algorithm
Competing With Strategies
We study the problem of online learning with a notion of regret defined with
respect to a set of strategies. We develop tools for analyzing the minimax
rates and for deriving regret-minimization algorithms in this scenario. While
the standard methods for minimizing the usual notion of regret fail, through
our analysis we demonstrate existence of regret-minimization methods that
compete with such sets of strategies as: autoregressive algorithms, strategies
based on statistical models, regularized least squares, and follow the
regularized leader strategies. In several cases we also derive efficient
learning algorithms
Online Learning: Beyond Regret
We study online learnability of a wide class of problems, extending the
results of (Rakhlin, Sridharan, Tewari, 2010) to general notions of performance
measure well beyond external regret. Our framework simultaneously captures such
well-known notions as internal and general Phi-regret, learning with
non-additive global cost functions, Blackwell's approachability, calibration of
forecasters, adaptive regret, and more. We show that learnability in all these
situations is due to control of the same three quantities: a martingale
convergence term, a term describing the ability to perform well if future is
known, and a generalization of sequential Rademacher complexity, studied in
(Rakhlin, Sridharan, Tewari, 2010). Since we directly study complexity of the
problem instead of focusing on efficient algorithms, we are able to improve and
extend many known results which have been previously derived via an algorithmic
construction
Relax and Localize: From Value to Algorithms
We show a principled way of deriving online learning algorithms from a
minimax analysis. Various upper bounds on the minimax value, previously thought
to be non-constructive, are shown to yield algorithms. This allows us to
seamlessly recover known methods and to derive new ones. Our framework also
captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2
forecaster. We emphasize that understanding the inherent complexity of the
learning problem leads to the development of algorithms.
We define local sequential Rademacher complexities and associated algorithms
that allow us to obtain faster rates in online learning, similarly to
statistical learning theory. Based on these localized complexities we build a
general adaptive method that can take advantage of the suboptimality of the
observed sequence.
We present a number of new algorithms, including a family of randomized
methods that use the idea of a "random playout". Several new versions of the
Follow-the-Perturbed-Leader algorithms are presented, as well as methods based
on the Littlestone's dimension, efficient methods for matrix completion with
trace norm, and algorithms for the problems of transductive learning and
prediction with static experts
- …