64,925 research outputs found
Optimal Best Arm Identification with Fixed Confidence
International audienceWe give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis
Best-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the
rewards of the arms depend linearly on an unknown parameter and the
objective is to return the arm with the largest reward. We characterize the
complexity of the problem and introduce sample allocation strategies that pull
arms to identify the best arm with a fixed confidence, while minimizing the
sample budget. In particular, we show the importance of exploiting the global
linear structure to improve the estimate of the reward of near-optimal arms. We
analyze the proposed strategies and compare their empirical performance.
Finally, as a by-product of our analysis, we point out the connection to the
-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201
Gamification of Pure Exploration for Linear Bandits
We investigate an active pure-exploration setting, that includes best-arm
identification, in the context of linear stochastic bandits. While
asymptotically optimal algorithms exist for standard multi-arm bandits, the
existence of such algorithms for the best-arm identification in linear bandits
has been elusive despite several attempts to address it. First, we provide a
thorough comparison and new insight over different notions of optimality in the
linear case, including G-optimality, transductive optimality from optimal
experimental design and asymptotic optimality. Second, we design the first
asymptotically optimal algorithm for fixed-confidence pure exploration in
linear bandits. As a consequence, our algorithm naturally bypasses the pitfall
caused by a simple but difficult instance, that most prior algorithms had to be
engineered to deal with explicitly. Finally, we avoid the need to fully solve
an optimal design problem by providing an approach that entails an efficient
implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202
On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure
We investigate the sample complexity of learning the optimal arm for
multi-task bandit problems. Arms consist of two components: one that is shared
across tasks (that we call representation) and one that is task-specific (that
we call predictor). The objective is to learn the optimal (representation,
predictor)-pair for each task, under the assumption that the optimal
representation is common to all tasks. Within this framework, efficient
learning algorithms should transfer knowledge across tasks. We consider the
best-arm identification problem for a fixed confidence, where, in each round,
the learner actively selects both a task, and an arm, and observes the
corresponding reward. We derive instance-specific sample complexity lower
bounds satisfied by any -PAC algorithm (such an algorithm
identifies the best representation with probability at least , and
the best predictor for a task with probability at least ). We
devise an algorithm OSRL-SC whose sample complexity approaches the lower bound,
and scales at most as , with
being, respectively, the number of tasks, representations and predictors. By
comparison, this scaling is significantly better than the classical best-arm
identification algorithm that scales as .Comment: Accepted at the Thirty-Seventh AAAI Conference on Artificial
Intelligence (AAAI23
Dual-Directed Algorithm Design for Efficient Pure Exploration
We consider pure-exploration problems in the context of stochastic sequential
adaptive experiments with a finite set of alternative options. The goal of the
decision-maker is to accurately answer a query question regarding the
alternatives with high confidence with minimal measurement efforts. A typical
query question is to identify the alternative with the best performance,
leading to ranking and selection problems, or best-arm identification in the
machine learning literature. We focus on the fixed-precision setting and derive
a sufficient condition for optimality in terms of a notion of strong
convergence to the optimal allocation of samples. Using dual variables, we
characterize the necessary and sufficient conditions for an allocation to be
optimal. The use of dual variables allow us to bypass the combinatorial
structure of the optimality conditions that relies solely on primal variables.
Remarkably, these optimality conditions enable an extension of top-two
algorithm design principle, initially proposed for best-arm identification.
Furthermore, our optimality conditions give rise to a straightforward yet
efficient selection rule, termed information-directed selection, which
adaptively picks from a candidate set based on information gain of the
candidates. We outline the broad contexts where our algorithmic approach can be
implemented. We establish that, paired with information-directed selection,
top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm
identification, solving a glaring open problem in the pure exploration
literature. Our algorithm is optimal for -best-arm identification and
thresholding bandit problems. Our analysis also leads to a general principle to
guide adaptations of Thompson sampling for pure-exploration problems. Numerical
experiments highlight the exceptional efficiency of our proposed algorithms
relative to existing ones.Comment: An earlier version of this paper appeared as an extended abstract in
the Proceedings of the 36th Annual Conference on Learning Theory, COLT'23,
with the title "Information-Directed Selection for Top-Two Algorithms.'
Gamification of pure exploration for linear bandits
Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation
- âŠ