20 research outputs found
On monte carlo tree search and reinforcement learning
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread
adoption within the games community. Its links to traditional reinforcement learning (RL)
methods have been outlined in the past; however, the use of RL techniques within tree search has
not been thoroughly studied yet. In this paper we re-examine in depth this close relation between
the two fields; our goal is to improve the cross-awareness between the two communities. We show
that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new
algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning
methods inspired by RL in conjunction with online search demonstrate encouraging results on
several classic board games and in arcade video game competitions, where our algorithm recently
ranked first. Our study promotes a unified view of learning, planning, and search
Decentralized Cooperative Planning for Automated Vehicles with Continuous Monte Carlo Tree Search
Urban traffic scenarios often require a high degree of cooperation between
traffic participants to ensure safety and efficiency. Observing the behavior of
others, humans infer whether or not others are cooperating. This work aims to
extend the capabilities of automated vehicles, enabling them to cooperate
implicitly in heterogeneous environments. Continuous actions allow for
arbitrary trajectories and hence are applicable to a much wider class of
problems than existing cooperative approaches with discrete action spaces.
Based on cooperative modeling of other agents, Monte Carlo Tree Search (MCTS)
in conjunction with Decoupled-UCT evaluates the action-values of each agent in
a cooperative and decentralized way, respecting the interdependence of actions
among traffic participants. The extension to continuous action spaces is
addressed by incorporating novel MCTS-specific enhancements for efficient
search space exploration. The proposed algorithm is evaluated under different
scenarios, showing that the algorithm is able to achieve effective cooperative
planning and generate solutions egocentric planning fails to identify
Learning Policies from Self-Play with Policy Gradients and MCTS Value Estimates
In recent years, state-of-the-art game-playing agents often involve policies
that are trained in self-playing processes where Monte Carlo tree search (MCTS)
algorithms and trained policies iteratively improve each other. The strongest
results have been obtained when policies are trained to mimic the search
behaviour of MCTS by minimising a cross-entropy loss. Because MCTS, by design,
includes an element of exploration, policies trained in this manner are also
likely to exhibit a similar extent of exploration. In this paper, we are
interested in learning policies for a project with future goals including the
extraction of interpretable strategies, rather than state-of-the-art
game-playing performance. For these goals, we argue that such an extent of
exploration is undesirable, and we propose a novel objective function for
training policies that are not exploratory. We derive a policy gradient
expression for maximising this objective function, which can be estimated using
MCTS value estimates, rather than MCTS visit counts. We empirically evaluate
various properties of resulting policies, in a variety of board games.Comment: Accepted at the IEEE Conference on Games (CoG) 201
Exploring Adaptive MCTS with TD Learning in miniXCOM
In recent years, Monte Carlo tree search (MCTS) has achieved widespread
adoption within the game community. Its use in conjunction with deep
reinforcement learning has produced success stories in many applications. While
these approaches have been implemented in various games, from simple board
games to more complicated video games such as StarCraft, the use of deep neural
networks requires a substantial training period. In this work, we explore
on-line adaptivity in MCTS without requiring pre-training. We present MCTS-TD,
an adaptive MCTS algorithm improved with temporal difference learning. We
demonstrate our new approach on the game miniXCOM, a simplified version of
XCOM, a popular commercial franchise consisting of several turn-based tactical
games, and show how adaptivity in MCTS-TD allows for improved performances
against opponents.Comment: 7 page
Action Guidance with MCTS for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years,
however, one main challenge is the sample inefficiency. In this paper, we focus
on how to use action guidance by means of a non-expert demonstrator to improve
sample efficiency in a domain with sparse, delayed, and possibly deceptive
rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a
new framework where even a non-expert simulated demonstrator, e.g., planning
algorithms such as Monte Carlo tree search with a small number rollouts, can be
integrated within asynchronous distributed deep reinforcement learning methods.
Compared to a vanilla deep RL algorithm, our proposed methods both learn faster
and converge to better policies on a two-player mini version of the Pommerman
game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with
arXiv:1904.05759, arXiv:1812.0004