A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity

Abstract

We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set XβŠ‚Rd\mathcal{X}\subset\mathbb{R}^d, a fixed budget TT, and an unpredictable sequence of parameters {ΞΈt}t=1T\left\lbrace\theta_t\right\rbrace_{t=1}^{T}, an algorithm will aim to correctly identify the best arm xβˆ—:=arg⁑max⁑x∈XxβŠ€βˆ‘t=1TΞΈtx^* := \arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t with probability as high as possible. Prior work has addressed the stationary setting where ΞΈt=ΞΈ1\theta_t = \theta_1 for all tt and demonstrated that the error probability decreases as exp⁑(βˆ’T/Οβˆ—)\exp(-T /\rho^*) for a problem-dependent constant Οβˆ—\rho^*. But in many real-world A/B/nA/B/n multivariate testing scenarios that motivate our work, the environment is non-stationary and an algorithm expecting a stationary setting can easily fail. For robust identification, it is well-known that if arms are chosen randomly and non-adaptively from a G-optimal design over X\mathcal{X} at each time then the error probability decreases as exp⁑(βˆ’TΞ”(1)2/d)\exp(-T\Delta^2_{(1)}/d), where Ξ”(1)=min⁑xβ‰ xβˆ—(xβˆ—βˆ’x)⊀1Tβˆ‘t=1TΞΈt\Delta_{(1)} = \min_{x \neq x^*} (x^* - x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t. As there exist environments where Ξ”(1)2/dβ‰ͺ1/Οβˆ—\Delta_{(1)}^2/ d \ll 1/ \rho^*, we are motivated to propose a novel algorithm P1\mathsf{P1}-RAGE\mathsf{RAGE} that aims to obtain the best of both worlds: robustness to non-stationarity and fast rates of identification in benign settings. We characterize the error probability of P1\mathsf{P1}-RAGE\mathsf{RAGE} and demonstrate empirically that the algorithm indeed never performs worse than G-optimal design but compares favorably to the best algorithms in the stationary setting.Comment: 25 pages, 6 figure

    Similar works

    Full text

    thumbnail-image

    Available Versions