11,167 research outputs found
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Learning in Real-Time Search: A Unifying Framework
Real-time search methods are suited for tasks in which the agent is
interacting with an initially unknown environment in real time. In such
simultaneous planning and learning problems, the agent has to select its
actions in a limited amount of time, while sensing only a local part of the
environment centered at the agents current location. Real-time heuristic search
agents select actions using a limited lookahead search and evaluating the
frontier states with a heuristic function. Over repeated experiences, they
refine heuristic values of states to avoid infinite loops and to converge to
better solutions. The wide spread of such settings in autonomous software and
hardware agents has led to an explosion of real-time search algorithms over the
last two decades. Not only is a potential user confronted with a hodgepodge of
algorithms, but he also faces the choice of control parameters they use. In
this paper we address both problems. The first contribution is an introduction
of a simple three-parameter framework (named LRTS) which extracts the core
ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*,
SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they
are unified and extended with additional features. Second, we prove
completeness and convergence of any algorithm covered by the LRTS framework.
Third, we prove several upper-bounds relating the control parameters and
solution quality. Finally, we analyze the influence of the three control
parameters empirically in the realistic scalable domains of real-time
navigation on initially unknown maps from a commercial role-playing game as
well as routing in ad hoc sensor networks
Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time
Hopping technique used for faster Reinforcement Learning in simulations.
Eligibility Propagation provides for Time Hopping similar abilities to what
eligibility traces provide for conventional Reinforcement Learning. It
propagates values from one state to all of its temporal predecessors using a
state transitions graph. Experiments on a simulated biped crawling robot
confirm that Eligibility Propagation accelerates the learning process more than
3 times.Comment: 7 page
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
- …