We derive the first finite-time logarithmic Bayes regret upper bounds for
Bayesian bandits. In a multi-armed bandit, we obtain O(cΔlogn) and
O(chlog2n) upper bounds for an upper confidence bound algorithm, where
ch and cΔ are constants depending on the prior distribution and the
gaps of bandit instances sampled from it, respectively. The latter bound
asymptotically matches the lower bound of Lai (1987). Our proofs are a major
technical departure from prior works, while being simple and general. To show
the generality of our techniques, we apply them to linear bandits. Our results
provide insights on the value of prior in the Bayesian setting, both in the
objective and as a side information given to the learner. They significantly
improve upon existing O~(n) bounds, which have become standard
in the literature despite the logarithmic lower bound of Lai (1987)