3,410 research outputs found
Further Optimal Regret Bounds for Thompson Sampling
Thompson Sampling is one of the oldest heuristics for multi-armed bandit
problems. It is a randomized algorithm based on Bayesian ideas, and has
recently generated significant interest after several studies demonstrated it
to have better empirical performance compared to the state of the art methods.
In this paper, we provide a novel regret analysis for Thompson Sampling that
simultaneously proves both the optimal problem-dependent bound of
and the
first near-optimal problem-independent bound of on the
expected regret of this algorithm. Our near-optimal problem-independent bound
solves a COLT 2012 open problem of Chapelle and Li. The optimal
problem-dependent regret bound for this problem was first proven recently by
Kaufmann et al. [ALT 2012]. Our novel martingale-based analysis techniques are
conceptually simple, easily extend to distributions other than the Beta
distribution, and also extend to the more general contextual bandits setting
[Manuscript, Agrawal and Goyal, 2012].Comment: arXiv admin note: substantial text overlap with arXiv:1111.179
An Information-Theoretic Analysis of Thompson Sampling
We provide an information-theoretic analysis of Thompson sampling that
applies across a broad range of online optimization problems in which a
decision-maker must learn from partial feedback. This analysis inherits the
simplicity and elegance of information theory and leads to regret bounds that
scale with the entropy of the optimal-action distribution. This strengthens
preexisting results and yields new insight into how information improves
performance
- β¦