The Sliding Regret in Stochastic Bandits: Discriminating Index and
  Randomized Policies

Boone, Victor

The Sliding Regret in Stochastic Bandits: Discriminating Index and Randomized Policies

Authors: Victor Boone
Publication date: 30 November 2023
Publisher

Abstract

This paper studies the one-shot behavior of no-regret algorithms for stochastic bandits. Although many algorithms are known to be asymptotically optimal with respect to the expected regret, over a single run, their pseudo-regret seems to follow one of two tendencies: it is either smooth or bumpy. To measure this tendency, we introduce a new notion: the sliding regret, that measures the worst pseudo-regret over a time-window of fixed length sliding to infinity. We show that randomized methods (e.g. Thompson Sampling and MED) have optimal sliding regret, while index policies, although possibly asymptotically optimal for the expected regret, have the worst possible sliding regret under regularity conditions on their index (e.g. UCB, UCB-V, KL-UCB, MOSS, IMED etc.). We further analyze the average bumpiness of the pseudo-regret of index policies via the regret of exploration, that we show to be suboptimal as well.Comment: 31 page

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2311.18437

Last time updated on 10/05/2024