139 research outputs found
Biasing MCTS with Features for General Games
This paper proposes using a linear function approximator, rather than a deep
neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for
general games. This is unlikely to match the potential raw playing strength of
DNNs, but has advantages in terms of generality, interpretability and resources
(time and hardware) required for training. Features describing local patterns
are used as inputs. The features are formulated in such a way that they are
easily interpretable and applicable to a wide range of general games, and might
encode simple local strategies. We gradually create new features during the
same self-play training process used to learn feature weights. We evaluate the
playing strength of an MCTS player biased by learnt features against a standard
upper confidence bounds for trees (UCT) player in multiple different board
games, and demonstrate significantly improved playing strength in the majority
of them after a small number of self-play training games.Comment: Accepted at IEEE CEC 2019, Special Session on Games. Copyright of
final version held by IEE
Fast Evolutionary Adaptation for Monte Carlo Tree Search
This paper describes a new adaptive Monte Carlo Tree Search (MCTS) algorithm that uses evolution to rapidly optimise its performance. An evolutionary algorithm is used as a source of control parameters to modify the behaviour of each iteration (i.e. each simulation or roll-out) of the MCTS algorithm; in this paper we largely restrict this to modifying the behaviour of the random default policy, though it can also be applied to modify the tree policy
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Warm-Start AlphaZero Self-Play Search Enhancements
Recently, AlphaZero has achieved landmark results in deep reinforcement
learning, by providing a single self-play architecture that learned three
different games at super human level. AlphaZero is a large and complicated
system with many parameters, and success requires much compute power and
fine-tuning. Reproducing results in other games is a challenge, and many
researchers are looking for ways to improve results while reducing
computational demands. AlphaZero's design is purely based on self-play and
makes no use of labeled expert data ordomain specific enhancements; it is
designed to learn from scratch. We propose a novel approach to deal with this
cold-start problem by employing simple search enhancements at the beginning
phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE)
and dynamically weighted combinations of these with the neural network, and
Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that
most of these enhancements improve the performance of their baseline player in
three different (small) board games, with especially RAVE based variants
playing strongly
The 2016 Two-Player GVGAI Competition
This paper showcases the setting and results of the first Two-Player General Video Game AI competition, which ran in 2016 at the IEEE World Congress on Computational Intelligence and the IEEE Conference on Computational Intelligence and Games. The challenges for the general game AI agents are expanded in this track from the single-player version, looking at direct player interaction in both competitive and cooperative environments of various types and degrees of difficulty. The focus is on the agents not only handling multiple problems, but also having to account for another intelligent entity in the game, who is expected to work towards their own goals (winning the game). This other player will possibly interact with first agent in a more engaging way than the environment or any non-playing character may do. The top competition entries are analyzed in detail and the performance of all agents is compared across the four sets of games. The results validate the competition system in assessing generality, as well as showing Monte Carlo Tree Search continuing to dominate by winning the overall Championship. However, this approach is closely followed by Rolling Horizon Evolutionary Algorithms, employed by the winner of the second leg of the contest
On monte carlo tree search and reinforcement learning
Fuelled by successes in Computer Go, Monte Carlo tree search (MCTS) has achieved widespread
adoption within the games community. Its links to traditional reinforcement learning (RL)
methods have been outlined in the past; however, the use of RL techniques within tree search has
not been thoroughly studied yet. In this paper we re-examine in depth this close relation between
the two fields; our goal is to improve the cross-awareness between the two communities. We show
that a straightforward adaptation of RL semantics within tree search can lead to a wealth of new
algorithms, for which the traditional MCTS is only one of the variants. We confirm that planning
methods inspired by RL in conjunction with online search demonstrate encouraging results on
several classic board games and in arcade video game competitions, where our algorithm recently
ranked first. Our study promotes a unified view of learning, planning, and search
Action Guidance with MCTS for Deep Reinforcement Learning
Deep reinforcement learning has achieved great successes in recent years,
however, one main challenge is the sample inefficiency. In this paper, we focus
on how to use action guidance by means of a non-expert demonstrator to improve
sample efficiency in a domain with sparse, delayed, and possibly deceptive
rewards: the recently-proposed multi-agent benchmark of Pommerman. We propose a
new framework where even a non-expert simulated demonstrator, e.g., planning
algorithms such as Monte Carlo tree search with a small number rollouts, can be
integrated within asynchronous distributed deep reinforcement learning methods.
Compared to a vanilla deep RL algorithm, our proposed methods both learn faster
and converge to better policies on a two-player mini version of the Pommerman
game.Comment: AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE'19). arXiv admin note: substantial text overlap with
arXiv:1904.05759, arXiv:1812.0004
- …