We investigate the fixed-budget best-arm identification (BAI) problem for
linear bandits in a potentially non-stationary environment. Given a finite arm
set XβRd, a fixed budget T, and an unpredictable
sequence of parameters {ΞΈtβ}t=1Tβ, an
algorithm will aim to correctly identify the best arm xβ:=argmaxxβXβxβ€βt=1TβΞΈtβ with probability as
high as possible. Prior work has addressed the stationary setting where
ΞΈtβ=ΞΈ1β for all t and demonstrated that the error probability
decreases as exp(βT/Οβ) for a problem-dependent constant Οβ. But
in many real-world A/B/n multivariate testing scenarios that motivate our
work, the environment is non-stationary and an algorithm expecting a stationary
setting can easily fail. For robust identification, it is well-known that if
arms are chosen randomly and non-adaptively from a G-optimal design over
X at each time then the error probability decreases as
exp(βTΞ(1)2β/d), where Ξ(1)β=minxξ =xββ(xββx)β€T1ββt=1TβΞΈtβ. As there exist environments where
Ξ(1)2β/dβͺ1/Οβ, we are motivated to propose a novel
algorithm P1-RAGE that aims to obtain the best of both
worlds: robustness to non-stationarity and fast rates of identification in
benign settings. We characterize the error probability of
P1-RAGE and demonstrate empirically that the algorithm
indeed never performs worse than G-optimal design but compares favorably to the
best algorithms in the stationary setting.Comment: 25 pages, 6 figure