12,005 research outputs found

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    The 2014 International Planning Competition: Progress and Trends

    Get PDF
    We review the 2014 International Planning Competition (IPC-2014), the eighth in a series of competitions starting in 1998. IPC-2014 was held in three separate parts to assess state-of-the-art in three prominent areas of planning research: the deterministic (classical) part (IPCD), the learning part (IPCL), and the probabilistic part (IPPC). Each part evaluated planning systems in ways that pushed the edge of existing planner performance by introducing new challenges, novel tasks, or both. The competition surpassed again the number of competitors than its predecessor, highlighting the competition’s central role in shaping the landscape of ongoing developments in evaluating planning systems

    Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

    Full text link
    Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in Neural Information Processing Systems (NIPS) 201

    Maximum a Posteriori Estimation by Search in Probabilistic Programs

    Full text link
    We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.Comment: To appear in proceedings of SOCS1

    Thinking Fast and Slow with Deep Learning and Tree Search

    Get PDF
    Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters changed - Repetition of experiments: improved accuracy and errors shown. (note the reduction in effect size for the tpt/cat experiment) - Results from a longer training run, including changes in expert strength in training - Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see appendix
    • …
    corecore