High-probability regret bounds for bandit online linear optimization

Bartlett, Peter; Dani, Varsha; Hayes, Thomas; Kakade, Sham; Rakhlin, Alexander; Tewari, Ambuj

research

oai:eprints.qut.edu.au:45706

High-probability regret bounds for bandit online linear optimization

Authors: Peter Bartlett
Varsha Dani
Thomas Hayes
Sham Kakade
Alexander Rakhlin
Ambuj Tewari
Publication date: 1 January 2008
Publisher: Omnipress

Abstract

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings

Chapter in Book, Report or Conference volume

Similar works

Full text

Open in the Core reader

Download PDF

Queensland University of Technology ePrints Archive

oai:eprints.qut.edu.au:45706

Last time updated on 02/07/2013

This paper was published in Queensland University of Technology ePrints Archive.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.