This paper studies the one-shot behavior of no-regret algorithms for
stochastic bandits. Although many algorithms are known to be asymptotically
optimal with respect to the expected regret, over a single run, their
pseudo-regret seems to follow one of two tendencies: it is either smooth or
bumpy. To measure this tendency, we introduce a new notion: the sliding regret,
that measures the worst pseudo-regret over a time-window of fixed length
sliding to infinity. We show that randomized methods (e.g. Thompson Sampling
and MED) have optimal sliding regret, while index policies, although possibly
asymptotically optimal for the expected regret, have the worst possible sliding
regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB,
MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret
of index policies via the regret of exploration, that we show to be suboptimal
as well.Comment: 31 page