Skip to main content
Article thumbnail
Location of Repository

High-probability regret bounds for bandit online linear optimization

By Peter L. Bartlett, Varsha Dani, Thomas Hayes, Sham Kakade, Alexander Rakhlin and Ambuj Tewari

Abstract

We present a modification of the algorithm of Dani et al. [8] for the online linear optimization problem in the bandit setting, which with high probability has regret at most O ∗ ( √ T) against an adaptive adversary. This improves on the previous algorithm [8] whose regret is bounded in expectation against an oblivious adversary. We obtain the same dependence on the dimension (n 3/2) as that exhibited by Dani et al. The results of this paper rest firmly on those of [8] and the remarkable technique of Auer et al. [2] for obtaining high probability bounds via optimistic estimates. This paper answers an open question: it eliminates the gap between the high-probability bounds obtained in the full-information vs bandit settings

Topics: 080600 INFORMATION SYSTEMS, algorithm, linear optimization, high probability
Year: 2008
OAI identifier: oai:eprints.qut.edu.au:45706

Suggested articles


To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.