research

Skyline Identification in Multi-Armed Bandits

Abstract

We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of nn arms A[1],,A[n]A[1],\dots,A[n], each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the skylineskyline of the set AA, consisting of all arms A[i]A[i] such that A[i]A[i] has larger expected reward than all lower-numbered arms A[1],,A[i1]A[1],\dots,A[i-1]. We define a natural notion of an ε\varepsilon-approximate skyline and prove matching upper and lower bounds for identifying an ε\varepsilon-skyline. Specifically, we show that in order to identify an ε\varepsilon-skyline from among nn arms with probability 1δ1-\delta, Θ(nε2min{log(1εδ),log(nδ)}) \Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg\{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg\} \bigg) samples are necessary and sufficient. When ε1/n\varepsilon \gg 1/n, our results improve over the naive algorithm, which draws enough samples to approximate the expected reward of every arm; the algorithm of (Auer et al., AISTATS'16) for Pareto-optimal arm identification is likewise superseded. Our results show that the sample complexity of the skyline problem lies strictly in between that of best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.Comment: 18 pages, 2 Figures; an ALT'18/ISIT'18 submissio

    Similar works

    Full text

    thumbnail-image

    Available Versions

    Last time updated on 10/08/2021