Search CORE

87 research outputs found

Bandit Algorithms for Tree Search

Author: Coquelin Pierre-Arnaud
Munos Rémi
Publication venue
Publication date: 01/01/2007
Field of study

Bandit based methods for tree search have recently gained popularity when applied to huge trees, e.g. in the game of go (Gelly et al., 2006). The UCT algorithm (Kocsis and Szepesvari, 2006), a tree search method based on Upper Confidence Bounds (UCB) (Auer et al., 2002), is believed to adapt locally to the effective smoothness of the tree. However, we show that UCT is too ``optimistic'' in some cases, leading to a regret O(exp(exp(D))) where D is the depth of the tree. We propose alternative bandit algorithms for tree search. First, a modification of UCT using a confidence sequence that scales exponentially with the horizon depth is proven to have a regret O(2^D \sqrt{n}), but does not adapt to possible smoothness in the tree. We then analyze Flat-UCB performed on the leaves and provide a finite regret bound with high probability. Then, we introduce a UCB-based Bandit Algorithm for Smooth Trees which takes into account actual smoothness of the rewards for performing efficient ``cuts'' of sub-optimal branches with high confidence. Finally, we present an incremental tree search version which applies when the full tree is too big (possibly infinite) to be entirely represented and show that with high probability, essentially only the optimal branches is indefinitely developed. We illustrate these methods on a global optimization problem of a Lipschitz function, given noisy data

arXiv.org e-Print Archive

CiteSeerX

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Polytechnique

HAL UVSQ

Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic

Author: G. Chaslot
L. Kocsis
M. Kearns
P. Auer
R. Bellman
R. Coulom
S. Gelly
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of using a heuristic policy to improve the value approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in non-adversarial settings such as planning with large-state space Markov Decision Processes. Current improvements to UCT focus on either changing the action selection formula at the internal nodes or the rollout policy at the leaf nodes of the search tree. In this work, we propose to add an auxiliary arm to each of the internal nodes, and always use the heuristic policy to roll out simulations at the auxiliary arms. The method aims to get fast convergence to optimal values at states where the heuristic policy is optimal, while retaining similar approximation as the original UCT in other states. We show that bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs better compared to the original UCT algorithm and its variants in two benchmark experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

ScholarBank@NUS

Exploration vs Exploitation vs Safety: Risk-averse Multi-Armed Bandits

Author: Galichet Nicolas
Sebag Michèle
Teytaud Olivier
Publication venue
Publication date: 13/11/2013
Field of study

Motivated by applications in energy management, this paper presents the Multi-Armed Risk-Aware Bandit (MARAB) algorithm. With the goal of limiting the exploration of risky arms, MARAB takes as arm quality its conditional value at risk. When the user-supplied risk level goes to 0, the arm quality tends toward the essential infimum of the arm distribution density, and MARAB tends toward the MIN multi-armed bandit algorithm, aimed at the arm with maximal minimal value. As a first contribution, this paper presents a theoretical analysis of the MIN algorithm under mild assumptions, establishing its robustness comparatively to UCB. The analysis is supported by extensive experimental validation of MIN and MARAB compared to UCB and state-of-art risk-aware MAB algorithms on artificial and real-world problems.Comment: 16 page

arXiv.org e-Print Archive

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

Author: Alaoui Issam El
Audibert Jean-Yves
Salomon Antoine
Publication venue
Publication date: 16/12/2011
Field of study

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistence, and exhibit a generalisation of the logarithmic bound. We also show the non existence of logarithmic bound in the general case of Hannan consistency. To get these results, we study variants of popular Upper Confidence Bounds (ucb) policies. As a by-product, we prove that it is impossible to design an adaptive policy that would select the best of two algorithms by taking advantage of the properties of the environment

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL - UPEC / UPEM

Practical Open-Loop Optimistic Planning

Author: D Silver
D Silver
D Silver
J-F Hren
L Buşoniu
O Cappé
R Bellman
R Coulom
Publication venue
Publication date: 09/04/2019
Field of study

We consider the problem of online planning in a Markov Decision Process when given only access to a generative model, restricted to open-loop policies - i.e. sequences of actions - and under budget constraint. In this setting, the Open-Loop Optimistic Planning (OLOP) algorithm enjoys good theoretical guarantees but is overly conservative in practice, as we show in numerical experiments. We propose a modified version of the algorithm with tighter upper-confidence bounds, KLOLOP, that leads to better practical performances while retaining the sample complexity bound. Finally, we propose an efficient implementation that significantly improves the time complexity of both algorithms

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot