Search CORE

178 research outputs found

Gamification of Pure Exploration for Linear Bandits

Author: Degenne Rémy
Ménard Pierre
Shang Xuedong
Valko Michal
Publication venue
Publication date: 01/01/2020
Field of study

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Price of Safety in Linear Best Arm Identification

Author: Barlier Merwan
Cherkaoui Hamza
Colin Igor
Shang Xuedong
Publication venue
Publication date: 15/09/2023
Field of study

We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.Comment: 20 pages, 1 figure

arXiv.org e-Print Archive

Optimal Exploration is no harder than Thompson Sampling

Author: Jain Lalit
Jamieson Kevin
Li Zhaoqi
Publication venue
Publication date: 24/10/2023
Field of study

Given a set of arms

\mathcal{Z}\subset \mathbb{R}^d

and an unknown parameter vector

\theta_\ast\in\mathbb{R}^d

, the pure exploration linear bandit problem aims to return

\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}

, with high probability through noisy measurements of

x^{\top}\theta_{\ast}

with

x\in \mathcal{X}\subset \mathbb{R}^d

. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm

z\in \mathcal{Z}

or b) explicitly maintaining a subset of

\mathcal{Z}

under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate

\mathcal{Z}

at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods

arXiv.org e-Print Archive

Gamification of pure exploration for linear bandits

Author: Degenne Rémy
Ménard Pierre
Shang Xuedong
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation

INRIA a CCSD electronic archive server