Search CORE

3,090 research outputs found

Best-Arm Identification in Linear Bandits

Author: Lazaric Alessandro
Munos Rémi
Soare Marta
Publication venue
Publication date: 04/11/2014
Field of study

We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter

\theta^*

and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the

G

-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201

arXiv.org e-Print Archive

HAL - Lille 3

INRIA a CCSD electronic archive server

Gamification of Pure Exploration for Linear Bandits

Author: Degenne Rémy
Ménard Pierre
Shang Xuedong
Valko Michal
Publication venue
Publication date: 01/01/2020
Field of study

We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Gamification of pure exploration for linear bandits

Author: Degenne Rémy
Ménard Pierre
Shang Xuedong
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation

INRIA a CCSD electronic archive server

Multi-task Representation Learning for Pure Exploration in Linear Bandits

Author: Du Yihan
Huang Longbo
Sun Wen
Publication venue
Publication date: 30/05/2023
Field of study

Despite the recent success of representation learning in sequential decision making, the study of the pure exploration scenario (i.e., identify the best option and minimize the sample complexity) is still limited. In this paper, we study multi-task representation learning for best arm identification in linear bandits (RepBAI-LB) and best policy identification in contextual linear bandits (RepBPI-CLB), two popular pure exploration settings with wide applications, e.g., clinical trials and web content optimization. In these two problems, all tasks share a common low-dimensional linear representation, and our goal is to leverage this feature to accelerate the best arm (policy) identification process for all tasks. For these problems, we design computationally and sample efficient algorithms DouExpDes and C-DouExpDes, which perform double experimental designs to plan optimal sample allocations for learning the global representation. We show that by learning the common representation among tasks, our sample complexity is significantly better than that of the native approach which solves tasks independently. To the best of our knowledge, this is the first work to demonstrate the benefits of representation learning for multi-task pure exploration

arXiv.org e-Print Archive

Optimal Thresholding Linear Bandit

Author: Rivera Eduardo Ochoa
Tewari Ambuj
Publication venue
Publication date: 11/02/2024
Field of study

We study a novel pure exploration problem: the

\epsilon

-Thresholding Bandit Problem (TBP) with fixed confidence in stochastic linear bandits. We prove a lower bound for the sample complexity and extend an algorithm designed for Best Arm Identification in the linear case to TBP that is asymptotically optimal.Comment: arXiv admin note: substantial text overlap with arXiv:2006.16073 by other author

arXiv.org e-Print Archive

Fixed-Budget Best-Arm Identification in Contextual Bandits: A Static-Adaptive Algorithm

Author: Azizi Mohammad Javad
Ghavamzadeh Mohammad
Kveton Branislav
Publication venue
Publication date: 08/07/2021
Field of study

We study the problem of best-arm identification (BAI) in contextual bandits in the fixed-budget setting. We propose a general successive elimination algorithm that proceeds in stages and eliminates a fixed fraction of suboptimal arms in each stage. This design takes advantage of the strengths of static and adaptive allocations. We analyze the algorithm in linear models and obtain a better error bound than prior work. We also apply it to generalized linear models (GLMs) and bound its error. This is the first BAI algorithm for GLMs in the fixed-budget setting. Our extensive numerical experiments show that our algorithm outperforms the state of art.Comment: 23 page

arXiv.org e-Print Archive