Fighting Bandits with a New Kind of Smoothness

Abernethy, Jacob; Lee, Chansoo; Tewari, Ambuj

research

Fighting Bandits with a New Kind of Smoothness

Authors: Jacob Abernethy
Chansoo Lee
Ambuj Tewari
Publication date: 13 December 2015
Publisher

Abstract

We define a novel family of algorithms for the adversarial multi-armed bandit problem, and provide a simple analysis technique based on convex smoothing. We prove two main results. First, we show that regularization via the \emph{Tsallis entropy}, which includes EXP3 as a special case, achieves the

\Theta(\sqrt{TN})

minimax regret. Second, we show that a wide class of perturbation methods achieve a near-optimal regret as low as

O(\sqrt{TN \log N})

if the perturbation distribution has a bounded hazard rate. For example, the Gumbel, Weibull, Frechet, Pareto, and Gamma distributions all satisfy this key property.Comment: In Proceedings of NIPS, 201

Similar works

Full text

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.716.7...

Last time updated on 30/10/2017