Search CORE

21,237 research outputs found

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Crossref

Online algorithms for POMDPs with continuous state, action, and observation spaces

Author: Kochenderfer Mykel
Sunberg Zachary
Publication venue
Publication date: 15/06/2018
Field of study

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.Comment: Added Multilane sectio

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Reinforcement Learning via AIXI Approximation

Author: Hutter Marcus
Ng Kee Siong
Silver David
Veness Joel
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces a principled approach for the design of a scalable general reinforcement learning agent. This approach is based on a direct approximation of AIXI, a Bayesian optimality notion for general reinforcement learning agents. Previously, it has been unclear whether the theory of AIXI could motivate the design of practical algorithms. We answer this hitherto open question in the affirmative, by providing the first computationally feasible approximation to the AIXI agent. To develop our approximation, we introduce a Monte Carlo Tree Search algorithm along with an agent-specific extension of the Context Tree Weighting algorithm. Empirically, we present a set of encouraging results on a number of stochastic, unknown, and partially observable domains.Comment: 8 LaTeX pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

UCL Discovery

The Australian National University

Association for the Advancement of Artificial Intelligence: AAAI Publications

Memory Bounded Open-Loop Planning in Large POMDPs using Thompson Sampling

Author: Belzner Lenz
Friedrich Markus
Kiermeier Marie
Linnhoff-Popien Claudia
Phan Thomy
Schmid Kyrill
Publication venue
Publication date: 10/05/2019
Field of study

State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to open-loop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performance-memory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.Comment: Presented at AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fast Approximate Max-n Monte Carlo Tree Search for Ms Pac-Man

Author: Lucas Simon
Robles David
Samothrakis Spyridon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/04/2011
Field of study

We present an application of Monte Carlo tree search (MCTS) for the game of Ms Pac-Man. Contrary to most applications of MCTS to date, Ms Pac-Man requires almost real-time decision making and does not have a natural end state. We approached the problem by performing Monte Carlo tree searches on a five player maxn tree representation of the game with limited tree search depth. We performed a number of experiments using both the MCTS game agents (for pacman and ghosts) and agents used in previous work (for ghosts). Performance-wise, our approach gets excellent scores, outperforming previous non-MCTS opponent approaches to the game by up to two orders of magnitude. © 2011 IEEE

University of Essex Research Repository

Crossref

Scalable Planning and Learning for Multiagent POMDPs: Extended Version

Author: Amato Christopher
Oliehoek Frans A.
Publication venue
Publication date: 19/12/2014
Field of study

Online, sample-based planning algorithms for POMDPs have shown great promise in scaling to problems with large state spaces, but they become intractable for large action and observation spaces. This is particularly problematic in multiagent POMDPs where the action and observation space grows exponentially with the number of agents. To combat this intractability, we propose a novel scalable approach based on sample-based planning and factored value functions that exploits structure present in many multiagent settings. This approach applies not only in the planning case, but also in the Bayesian reinforcement learning setting. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems

arXiv.org e-Print Archive

University of Liverpool Repository

CiteSeerX

International Migration, Integration and Social Cohesion online publications

Association for the Advancement of Artificial Intelligence: AAAI Publications