This paper is devoted to regret lower bounds in the classical model of
stochastic multi-armed bandit. A well-known result of Lai and Robbins, which
has then been extended by Burnetas and Katehakis, has established the presence
of a logarithmic bound for all consistent policies. We relax the notion of
consistence, and exhibit a generalisation of the logarithmic bound. We also
show the non existence of logarithmic bound in the general case of Hannan
consistency. To get these results, we study variants of popular Upper
Confidence Bounds (ucb) policies. As a by-product, we prove that it is
impossible to design an adaptive policy that would select the best of two
algorithms by taking advantage of the properties of the environment