298 research outputs found
Simple Regret Optimization in Online Planning for Markov Decision Processes
We consider online planning in Markov decision processes (MDPs). In online
planning, the agent focuses on its current state only, deliberates about the
set of possible policies from that state onwards and, when interrupted, uses
the outcome of that exploratory deliberation to choose what action to perform
next. The performance of algorithms for online planning is assessed in terms of
simple regret, which is the agent's expected performance loss when the chosen
action, rather than an optimal one, is followed.
To date, state-of-the-art algorithms for online planning in general MDPs are
either best effort, or guarantee only polynomial-rate reduction of simple
regret over time. Here we introduce a new Monte-Carlo tree search algorithm,
BRUE, that guarantees exponential-rate reduction of simple regret and error
probability. This algorithm is based on a simple yet non-standard state-space
sampling scheme, MCTS2e, in which different parts of each sample are dedicated
to different exploratory objectives. Our empirical evaluation shows that BRUE
not only provides superior performance guarantees, but is also very effective
in practice and favorably compares to state-of-the-art. We then extend BRUE
with a variant of "learning by forgetting." The resulting set of algorithms,
BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper
bound on its reduction rate, and exhibits even more attractive empirical
performance
Decentralized Cooperative Planning for Automated Vehicles with Hierarchical Monte Carlo Tree Search
Today's automated vehicles lack the ability to cooperate implicitly with
others. This work presents a Monte Carlo Tree Search (MCTS) based approach for
decentralized cooperative planning using macro-actions for automated vehicles
in heterogeneous environments. Based on cooperative modeling of other agents
and Decoupled-UCT (a variant of MCTS), the algorithm evaluates the
state-action-values of each agent in a cooperative and decentralized manner,
explicitly modeling the interdependence of actions between traffic
participants. Macro-actions allow for temporal extension over multiple time
steps and increase the effective search depth requiring fewer iterations to
plan over longer horizons. Without predefined policies for macro-actions, the
algorithm simultaneously learns policies over and within macro-actions. The
proposed method is evaluated under several conflict scenarios, showing that the
algorithm can achieve effective cooperative planning with learned macro-actions
in heterogeneous environments
Planning spatial networks with Monte Carlo tree search
We tackle the problem of goal-directed graph construction: given a starting graph, finding a set of edges whose addition maximally improves a global objective function. This problem emerges in many transportation and infrastructure networks that are of critical importance to society. We identify two significant shortcomings of present reinforcement learning methods: their exclusive focus on topology to the detriment of spatial characteristics (which are known to influence the growth and density of links), as well as the rapid growth in the action spaces and costs of model training. Our formulation as a deterministic Markov decision process allows us to adopt the Monte Carlo tree search framework, an artificial intelligence decision-time planning method. We propose improvements over the standard upper confidence bounds for trees (UCT) algorithm for this family of problems that addresses their single-agent nature, the trade-off between the cost of edges and their contribution to the objective, and an action space linear in the number of nodes. Our approach yields substantial improvements over UCT for increasing the efficiency and attack resilience of synthetic networks and real-world Internet backbone and metro systems, while using a wall clock time budget similar to other search-based algorithms. We also demonstrate that our approach scales to significantly larger networks than previous reinforcement learning methods, since it does not require training a model
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Planning search and rescue missions for UAV teams
The coordination of multiple Unmanned Aerial Vehicles (UAVs) to carry out aerial surveys is a major challenge for emergency responders. In particular, UAVs have to fly over kilometre-scale areas while trying to discover casualties as quickly as possible. To aid in this process, it is desirable to exploit the increasing availability of data about a disaster from sources such as crowd reports, satellite re- mote sensing, or manned reconnaissance. In particular, such inform- ation can be a valuable resource to drive the planning of UAV flight paths over a space in order to discover people who are in danger. However challenges of computational tractability remain when plan- ning over the very large action spaces that result. To overcome these, we introduce the survivor discovery problem and present as our solu- tion, the first example of a continuous factored coordinated Monte Carlo tree search algorithm. Our evaluation against state of the art benchmarks show that our algorithm, Co-CMCTS, is able to localise more casualties faster than standard approaches by 7% or more on simulations with real-world data
- …