Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL
  with General Regularizers and Multiple Optimal Arms

Jin, Tiancheng; Liu, Junyan; Luo, Haipeng

Improved Best-of-Both-Worlds Guarantees for Multi-Armed Bandits: FTRL with General Regularizers and Multiple Optimal Arms

Authors: Tiancheng Jin
Junyan Liu
Haipeng Luo
Publication date: 26 October 2023
Publisher

Abstract

We study the problem of designing adaptive multi-armed bandit algorithms that perform optimally in both the stochastic setting and the adversarial setting simultaneously (often known as a best-of-both-world guarantee). A line of recent works shows that when configured and analyzed properly, the Follow-the-Regularized-Leader (FTRL) algorithm, originally designed for the adversarial setting, can in fact optimally adapt to the stochastic setting as well. Such results, however, critically rely on an assumption that there exists one unique optimal arm. Recently, Ito (2021) took the first step to remove such an undesirable uniqueness assumption for one particular FTRL algorithm with the

\frac{1}{2}

-Tsallis entropy regularizer. In this work, we significantly improve and generalize this result, showing that uniqueness is unnecessary for FTRL with a broad family of regularizers and a new learning rate schedule. For some regularizers, our regret bounds also improve upon prior results even when uniqueness holds. We further provide an application of our results to the decoupled exploration and exploitation problem, demonstrating that our techniques are broadly applicable.Comment: Update the camera-ready version for NeurIPS 202

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.13534

Last time updated on 18/03/2023