64,925 research outputs found

    Optimal Best Arm Identification with Fixed Confidence

    Get PDF
    International audienceWe give a complete characterization of the complexity of best-arm identification in one-parameter bandit problems. We prove a new, tight lower bound on the sample complexity. We propose the `Track-and-Stop' strategy, which we prove to be asymptotically optimal. It consists in a new sampling rule (which tracks the optimal proportions of arm draws highlighted by the lower bound) and in a stopping rule named after Chernoff, for which we give a new analysis

    Best-Arm Identification in Linear Bandits

    Get PDF
    We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter ξ∗\theta^* and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In particular, we show the importance of exploiting the global linear structure to improve the estimate of the reward of near-optimal arms. We analyze the proposed strategies and compare their empirical performance. Finally, as a by-product of our analysis, we point out the connection to the GG-optimality criterion used in optimal experimental design.Comment: In Advances in Neural Information Processing Systems 27 (NIPS), 201

    Gamification of Pure Exploration for Linear Bandits

    Get PDF
    We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental design and asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation.Comment: 11+25 pages. To be published in the proceedings of ICML 202

    On the Sample Complexity of Representation Learning in Multi-task Bandits with Global and Local structure

    Full text link
    We investigate the sample complexity of learning the optimal arm for multi-task bandit problems. Arms consist of two components: one that is shared across tasks (that we call representation) and one that is task-specific (that we call predictor). The objective is to learn the optimal (representation, predictor)-pair for each task, under the assumption that the optimal representation is common to all tasks. Within this framework, efficient learning algorithms should transfer knowledge across tasks. We consider the best-arm identification problem for a fixed confidence, where, in each round, the learner actively selects both a task, and an arm, and observes the corresponding reward. We derive instance-specific sample complexity lower bounds satisfied by any (ήG,ήH)(\delta_G,\delta_H)-PAC algorithm (such an algorithm identifies the best representation with probability at least 1−ήG1-\delta_G, and the best predictor for a task with probability at least 1−ήH1-\delta_H). We devise an algorithm OSRL-SC whose sample complexity approaches the lower bound, and scales at most as H(Glog⁡(1/ήG)+Xlog⁡(1/ήH))H(G\log(1/\delta_G)+ X\log(1/\delta_H)), with X,G,HX,G,H being, respectively, the number of tasks, representations and predictors. By comparison, this scaling is significantly better than the classical best-arm identification algorithm that scales as HGXlog⁡(1/ή)HGX\log(1/\delta).Comment: Accepted at the Thirty-Seventh AAAI Conference on Artificial Intelligence (AAAI23

    Dual-Directed Algorithm Design for Efficient Pure Exploration

    Full text link
    We consider pure-exploration problems in the context of stochastic sequential adaptive experiments with a finite set of alternative options. The goal of the decision-maker is to accurately answer a query question regarding the alternatives with high confidence with minimal measurement efforts. A typical query question is to identify the alternative with the best performance, leading to ranking and selection problems, or best-arm identification in the machine learning literature. We focus on the fixed-precision setting and derive a sufficient condition for optimality in terms of a notion of strong convergence to the optimal allocation of samples. Using dual variables, we characterize the necessary and sufficient conditions for an allocation to be optimal. The use of dual variables allow us to bypass the combinatorial structure of the optimality conditions that relies solely on primal variables. Remarkably, these optimality conditions enable an extension of top-two algorithm design principle, initially proposed for best-arm identification. Furthermore, our optimality conditions give rise to a straightforward yet efficient selection rule, termed information-directed selection, which adaptively picks from a candidate set based on information gain of the candidates. We outline the broad contexts where our algorithmic approach can be implemented. We establish that, paired with information-directed selection, top-two Thompson sampling is (asymptotically) optimal for Gaussian best-arm identification, solving a glaring open problem in the pure exploration literature. Our algorithm is optimal for Ï”\epsilon-best-arm identification and thresholding bandit problems. Our analysis also leads to a general principle to guide adaptations of Thompson sampling for pure-exploration problems. Numerical experiments highlight the exceptional efficiency of our proposed algorithms relative to existing ones.Comment: An earlier version of this paper appeared as an extended abstract in the Proceedings of the 36th Annual Conference on Learning Theory, COLT'23, with the title "Information-Directed Selection for Top-Two Algorithms.'

    Gamification of pure exploration for linear bandits

    Get PDF
    Virtual conferenceInternational audienceWe investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has been elusive despite several attempts to address it. First, we provide a thorough comparison and new insight over different notions of optimality in the linear case, including G-optimality, transductive optimality from optimal experimental designand asymptotic optimality. Second, we design the first asymptotically optimal algorithm for fixed-confidence pure exploration in linear bandits. As a consequence, our algorithm naturally bypasses the pitfall caused by a simple but difficult instance, that most prior algorithms had to be engineered to deal with explicitly. Finally, we avoid the need to fully solve an optimal design problem by providing an approach that entails an efficient implementation
    • 

    corecore