Bandits with many optimal arms

Carpentier, Alexandra; Cheshire, James; de Heide, Rianne; Ménard, Pierre

Bandits with many optimal arms

Authors: Alexandra Carpentier
James Cheshire
Rianne de Heide
Pierre Ménard
Publication date: 5 November 2021
Publisher

Abstract

We consider a stochastic bandit problem with a possibly infinite number of arms. We write

p^*

for the proportion of optimal arms and

\Delta

for the minimal mean-gap between optimal and sub-optimal arms. We characterize the optimal learning rates both in the cumulative regret setting, and in the best-arm identification setting in terms of the problem parameters

T

(the budget),

p^*

and

\Delta

. For the objective of minimizing the cumulative regret, we provide a lower bound of order

\Omega(\log(T)/(p^*\Delta))

and a UCB-style algorithm with matching upper bound up to a factor of

\log(1/\Delta)

. Our algorithm needs

p^*

to calibrate its parameters, and we prove that this knowledge is necessary, since adapting to

p^*

in this setting is impossible. For best-arm identification we also provide a lower bound of order

\Omega(\exp(-cT\Delta^2 p^*))

on the probability of outputting a sub-optimal arm where

c>0

is an absolute constant. We also provide an elimination algorithm with an upper bound matching the lower bound up to a factor of order

\log(T)

in the exponential, and that does not need

p^*

or

\Delta

as parameter. Our results apply directly to the three related problems of competing against the

j

-th best arm, identifying an

\epsilon

good arm, and finding an arm with mean larger than a quantile of a known order.Comment: Substantial rewrite and added experiments. Accepted for NeurIPS 202

Similar works

Full text

Available Versions

CWI's Institutional Repository

oai:cwi.nl:31354

Last time updated on 10/01/2022