Search CORE

2,397 research outputs found

Adaptive Online Prediction by Following the Perturbed Leader

Author: Hutter Marcus
Poland Jan
Publication venue
Publication date: 01/01/2005
Field of study

When applying aggregating strategies to Prediction with Expert Advice, the learning rate must be adaptively tuned. The natural choice of sqrt(complexity/current loss) renders the analysis of Weighted Majority derivatives quite complicated. In particular, for arbitrary weights there have been no results proven so far. The analysis of the alternative "Follow the Perturbed Leader" (FPL) algorithm from Kalai & Vempala (2003) (based on Hannan's algorithm) is easier. We derive loss bounds for adaptive learning rate and both finite expert classes with uniform weights and countable expert classes with arbitrary weights. For the former setup, our loss bounds match the best known results so far, while for the latter our results are new.Comment: 25 page

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

Online Learning in Case of Unbounded Losses Using the Follow Perturbed Leader Algorithm

Author: V'yugin Vladimir V.
Publication venue
Publication date: 01/01/2010
Field of study

In this paper the sequential prediction problem with expert advice is considered for the case where losses of experts suffered at each step cannot be bounded in advance. We present some modification of Kalai and Vempala algorithm of following the perturbed leader where weights depend on past losses of the experts. New notions of a volume and a scaled fluctuation of a game are introduced. We present a probabilistic algorithm protected from unrestrictedly large one-step losses. This algorithm has the optimal performance in the case when the scaled fluctuations of one-step losses of experts of the pool tend to zero.Comment: 31 pages, 3 figure

arXiv.org e-Print Archive

CiteSeerX

Relax and Localize: From Value to Algorithms

Author: Rakhlin Alexander
Shamir Ohad
Sridharan Karthik
Publication venue
Publication date: 01/01/2012
Field of study

We show a principled way of deriving online learning algorithms from a minimax analysis. Various upper bounds on the minimax value, previously thought to be non-constructive, are shown to yield algorithms. This allows us to seamlessly recover known methods and to derive new ones. Our framework also captures such "unorthodox" methods as Follow the Perturbed Leader and the R^2 forecaster. We emphasize that understanding the inherent complexity of the learning problem leads to the development of algorithms. We define local sequential Rademacher complexities and associated algorithms that allow us to obtain faster rates in online learning, similarly to statistical learning theory. Based on these localized complexities we build a general adaptive method that can take advantage of the suboptimality of the observed sequence. We present a number of new algorithms, including a family of randomized methods that use the idea of a "random playout". Several new versions of the Follow-the-Perturbed-Leader algorithms are presented, as well as methods based on the Littlestone's dimension, efficient methods for matrix completion with trace norm, and algorithms for the problems of transductive learning and prediction with static experts

arXiv.org e-Print Archive

CiteSeerX

Fighting Bandits with a New Kind of Smoothness

Author: Abernethy Jacob
Lee Chansoo
Tewari Ambuj
Publication venue
Publication date: 13/12/2015
Field of study

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the

\Theta(\sqrt{TN})

minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as

O(\sqrt{TN \log N})

if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.Comment: In Proceedings of NIPS, 201

arXiv.org e-Print Archive

CiteSeerX

First-order regret bounds for combinatorial semi-bandits

Author: Neu Gergely
Publication venue
Publication date: 10/06/2015
Field of study

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as

\widetilde{O}(\sqrt{T})

with the number of rounds

T

. In this paper, we propose an algorithm that improves this scaling to

\widetilde{O}(\sqrt{{L_T^*}})

, where

L_T^*

is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.Comment: To appear at COLT 201

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot