Search CORE

12,005 research outputs found

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Crossref

The 2014 International Planning Competition: Progress and Trends

Author: Chrpa Lukas
Editor Managing
Grześ Marek
McCluskey Thomas Leo
Roberts Mark
Sanner Scott
Vallati Mauro
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/03/2015
Field of study

We review the 2014 International Planning Competition (IPC-2014), the eighth in a series of competitions starting in 1998. IPC-2014 was held in three separate parts to assess state-of-the-art in three prominent areas of planning research: the deterministic (classical) part (IPCD), the learning part (IPCL), and the probabilistic part (IPPC). Each part evaluated planning systems in ways that pushed the edge of existing planner performance by introducing new challenges, novel tasks, or both. The competition surpassed again the number of competitors than its predecessor, highlighting the competition’s central role in shaping the landscape of ongoing developments in evaluating planning systems

Crossref

Kent Academic Repository

Association for the Advancement of Artificial Intelligence: AAAI Publications

University of Huddersfield Repository

Huddersfield Research Portal

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

Author: Dayan Peter
Guez Arthur
Silver David
Publication venue
Publication date: 01/01/2012
Field of study

Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, finding the resulting Bayes-optimal policies is notoriously taxing, since the search space becomes enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach outperformed prior Bayesian model-based RL algorithms by a significant margin on several well-known benchmark problems -- because it avoids expensive applications of Bayes rule within the search tree by lazily sampling models from the current beliefs. We illustrate the advantages of our approach by showing it working in an infinite state space domain which is qualitatively out of reach of almost all previous work in Bayesian exploration.Comment: 14 pages, 7 figures, includes supplementary material. Advances in Neural Information Processing Systems (NIPS) 201

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

Maximum a Posteriori Estimation by Search in Probabilistic Programs

Author: Tolpin David
Wood Frank
Publication venue
Publication date: 01/01/2015
Field of study

We introduce an approximate search algorithm for fast maximum a posteriori probability estimation in probabilistic programs, which we call Bayesian ascent Monte Carlo (BaMC). Probabilistic programs represent probabilistic models with varying number of mutually dependent finite, countable, and continuous random variables. BaMC is an anytime MAP search algorithm applicable to any combination of random variables and dependencies. We compare BaMC to other MAP estimation algorithms and show that BaMC is faster and more robust on a range of probabilistic models.Comment: To appear in proceedings of SOCS1

arXiv.org e-Print Archive

Oxford University Research Archive

Thinking Fast and Slow with Deep Learning and Tree Search

Author: Anthony Thomas
Barber David
Tian Zheng
Publication venue
Publication date: 01/11/2017
Field of study

Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters changed - Repetition of experiments: improved accuracy and errors shown. (note the reduction in effect size for the tpt/cat experiment) - Results from a longer training run, including changes in expert strength in training - Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see appendix

arXiv.org e-Print Archive

UCL Discovery