2,482 research outputs found
Adaptive Online Prediction by Following the Perturbed Leader
When applying aggregating strategies to Prediction with Expert Advice, the
learning rate must be adaptively tuned. The natural choice of
sqrt(complexity/current loss) renders the analysis of Weighted Majority
derivatives quite complicated. In particular, for arbitrary weights there have
been no results proven so far. The analysis of the alternative "Follow the
Perturbed Leader" (FPL) algorithm from Kalai & Vempala (2003) (based on
Hannan's algorithm) is easier. We derive loss bounds for adaptive learning rate
and both finite expert classes with uniform weights and countable expert
classes with arbitrary weights. For the former setup, our loss bounds match the
best known results so far, while for the latter our results are new.Comment: 25 page
Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm
In this paper the sequential prediction problem with expert advice is
considered for the case where losses of experts suffered at each step cannot be
bounded in advance. We present some modification of Kalai and Vempala algorithm
of following the perturbed leader where weights depend on past losses of the
experts. New notions of a volume and a scaled fluctuation of a game are
introduced. We present a probabilistic algorithm protected from unrestrictedly
large one-step losses. This algorithm has the optimal performance in the case
when the scaled fluctuations of one-step losses of experts of the pool tend to
zero.Comment: 31 pages, 3 figure
Relax and Localize: From Value to Algorithms
We show a principled way of deriving online learning algorithms from a
minimax analysis. Various upper bounds on the minimax value, previously thought
to be non-constructive, are shown to yield algorithms. This allows us to
seamlessly recover known methods and to derive new ones. Our framework also
captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2
forecaster. We emphasize that understanding the inherent complexity of the
learning problem leads to the development of algorithms.
We define local sequential Rademacher complexities and associated algorithms
that allow us to obtain faster rates in online learning, similarly to
statistical learning theory. Based on these localized complexities we build a
general adaptive method that can take advantage of the suboptimality of the
observed sequence.
We present a number of new algorithms, including a family of randomized
methods that use the idea of a "random playout". Several new versions of the
Follow-the-Perturbed-Leader algorithms are presented, as well as methods based
on the Littlestone's dimension, efficient methods for matrix completion with
trace norm, and algorithms for the problems of transductive learning and
prediction with static experts
Fighting Bandits with a New Kind of Smoothness
We define a novel family of algorithms for the adversarial multi-armed bandit
problem, and provide a simple analysis technique based on convex smoothing. We
prove two main results. First, we show that regularization via the
\emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the
minimax regret. Second, we show that a wide class of
perturbation methods achieve a near-optimal regret as low as if the perturbation distribution has a bounded hazard rate. For example,
the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this
key property.Comment: In Proceedings of NIPS, 201
First-order regret bounds for combinatorial semi-bandits
We consider the problem of online combinatorial optimization under
semi-bandit feedback, where a learner has to repeatedly pick actions from a
combinatorial decision set in order to minimize the total losses associated
with its decisions. After making each decision, the learner observes the losses
associated with its action, but not other losses. For this problem, there are
several learning algorithms that guarantee that the learner's expected regret
grows as with the number of rounds . In this
paper, we propose an algorithm that improves this scaling to
, where is the total loss of the best
action. Our algorithm is among the first to achieve such guarantees in a
partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201
- …