178 research outputs found
Gamification of Pure Exploration for Linear Bandits
We investigate an active pure-exploration setting, that includes best-arm
identification, in the context of linear stochastic bandits. While
asymptotically optimal algorithms exist for standard multi-arm bandits, the
existence of such algorithms for the best-arm identification in linear bandits
has been elusive despite several attempts to address it. First, we provide a
thorough comparison and new insight over different notions of optimality in the
linear case, including G-optimality, transductive optimality from optimal
experimental design and asymptotic optimality. Second, we design the first
asymptotically optimal algorithm for fixed-confidence pure exploration in
linear bandits. As a consequence, our algorithm naturally bypasses the pitfall
caused by a simple but difficult instance, that most prior algorithms had to be
engineered to deal with explicitly. Finally, we avoid the need to fully solve
an optimal design problem by providing an approach that entails an efficient
implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202
Price of Safety in Linear Best Arm Identification
We introduce the safe best-arm identification framework with linear feedback,
where the agent is subject to some stage-wise safety constraint that linearly
depends on an unknown parameter vector. The agent must take actions in a
conservative way so as to ensure that the safety constraint is not violated
with high probability at each round. Ways of leveraging the linear structure
for ensuring safety has been studied for regret minimization, but not for
best-arm identification to the best our knowledge. We propose a gap-based
algorithm that achieves meaningful sample complexity while ensuring the
stage-wise safety. We show that we pay an extra term in the sample complexity
due to the forced exploration phase incurred by the additional safety
constraint. Experimental illustrations are provided to justify the design of
our algorithm.Comment: 20 pages, 1 figure
Optimal Exploration is no harder than Thompson Sampling
Given a set of arms and an unknown
parameter vector , the pure exploration linear
bandit problem aims to return , with high probability through noisy measurements of
with . Existing
(asymptotically) optimal methods require either a) potentially costly
projections for each arm or b) explicitly maintaining a
subset of under consideration at each time. This complexity is at
odds with the popular and simple Thompson Sampling algorithm for regret
minimization, which just requires access to a posterior sampling and argmax
oracle, and does not need to enumerate at any point.
Unfortunately, Thompson sampling is known to be sub-optimal for pure
exploration. In this work, we pose a natural question: is there an algorithm
that can explore optimally and only needs the same computational primitives as
Thompson Sampling? We answer the question in the affirmative. We provide an
algorithm that leverages only sampling and argmax oracles and achieves an
exponential convergence rate, with the exponent being the optimal among all
possible allocations asymptotically. In addition, we show that our algorithm
can be easily implemented and performs as well empirically as existing
asymptotically optimal methods
Gamification of pure exploration for linear bandits
Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation
- âŠ