178 research outputs found

    Gamification of Pure Exploration for Linear Bandits

    Get PDF
    We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202

    Price of Safety in Linear Best Arm Identification

    Full text link
    We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.Comment: 20 pages, 1 figure

    Optimal Exploration is no harder than Thompson Sampling

    Full text link
    Given a set of arms Z⊂Rd\mathcal{Z}\subset \mathbb{R}^d and an unknown parameter vector ξ∗∈Rd\theta_\ast\in\mathbb{R}^d, the pure exploration linear bandit problem aims to return arg⁥max⁥z∈Zz⊀Ξ∗\arg\max_{z\in \mathcal{Z}} z^{\top}\theta_{\ast}, with high probability through noisy measurements of x⊀Ξ∗x^{\top}\theta_{\ast} with x∈X⊂Rdx\in \mathcal{X}\subset \mathbb{R}^d. Existing (asymptotically) optimal methods require either a) potentially costly projections for each arm z∈Zz\in \mathcal{Z} or b) explicitly maintaining a subset of Z\mathcal{Z} under consideration at each time. This complexity is at odds with the popular and simple Thompson Sampling algorithm for regret minimization, which just requires access to a posterior sampling and argmax oracle, and does not need to enumerate Z\mathcal{Z} at any point. Unfortunately, Thompson sampling is known to be sub-optimal for pure exploration. In this work, we pose a natural question: is there an algorithm that can explore optimally and only needs the same computational primitives as Thompson Sampling? We answer the question in the affirmative. We provide an algorithm that leverages only sampling and argmax oracles and achieves an exponential convergence rate, with the exponent being the optimal among all possible allocations asymptotically. In addition, we show that our algorithm can be easily implemented and performs as well empirically as existing asymptotically optimal methods

    Gamification of pure exploration for linear bandits

    Get PDF
    Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation
    • 

    corecore