13,843 research outputs found
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.Comment: AAAI 201
A hybridisation technique for game playing using the upper confidence for trees algorithm with artificial neural networks
In the domain of strategic game playing, the use of statistical techniques such as the Upper Confidence for Trees (UCT) algorithm, has become the norm as they offer many benefits over classical algorithms. These benefits include requiring no game-specific strategic knowledge and time-scalable performance. UCT does not incorporate any strategic information specific to the game considered, but instead uses repeated sampling to effectively brute-force search through the game tree or search space. The lack of game-specific knowledge in UCT is thus both a benefit but also a strategic disadvantage. Pattern recognition techniques, specifically Neural Networks (NN), were identified as a means of addressing the lack of game-specific knowledge in UCT. Through a novel hybridisation technique which combines UCT and trained NNs for pruning, the UCTNN algorithm was derived. The NN component of UCT-NN was trained using a UCT self-play scheme to generate game-specific knowledge without the need to construct and manage game databases for training purposes. The UCT-NN algorithm is outlined for pruning in the game of Go-Moku as a candidate case-study for this research. The UCT-NN algorithm contained three major parameters which emerged from the UCT algorithm, the use of NNs and the pruning schemes considered. Suitable methods for finding candidate values for these three parameters were outlined and applied to the game of Go-Moku on a 5 by 5 board. An empirical investigation of the playing performance of UCT-NN was conducted in comparison to UCT through three benchmarks. The benchmarks comprise a common randomly moving opponent, a common UCTmax player which is given a large amount of playing time, and a pair-wise tournament between UCT-NN and UCT. The results of the performance evaluation for 5 by 5 Go-Moku were promising, which prompted an evaluation of a larger 9 by 9 Go-Moku board. The results of both evaluations indicate that the time allocated to the UCT-NN algorithm directly affects its performance when compared to UCT. The UCT-NN algorithm generally performs better than UCT in games with very limited time-constraints in all benchmarks considered except when playing against a randomly moving player in 9 by 9 Go-Moku. In real-time and near-real-time Go-Moku games, UCT-NN provides statistically significant improvements compared to UCT. The findings of this research contribute to the realisation of applying game-specific knowledge to the UCT algorithm
A Dynamical Systems Approach for Static Evaluation in Go
In the paper arguments are given why the concept of static evaluation has the
potential to be a useful extension to Monte Carlo tree search. A new concept of
modeling static evaluation through a dynamical system is introduced and
strengths and weaknesses are discussed. The general suitability of this
approach is demonstrated.Comment: IEEE Transactions on Computational Intelligence and AI in Games, vol
3 (2011), no
The effect of simulation bias on action selection in Monte Carlo Tree Search
A dissertation submitted to the Faculty of Science, University of the Witwatersrand,
in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread
attention in recent years. It combines a traditional tree-search approach with Monte Carlo
simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate
states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly
well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of
artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on
19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of
other fields, such as Bayesian reinforcement learning and general game-playing.
Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not
yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in
MCTS still remains unknown, with interesting results indicating that better-informed rollouts do
not necessarily result in stronger agents. This research provides support for the notion that MCTS
is well-suited to a class of domain possessing a smoothness property. In these domains, biased
rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias
is compounded in non-smooth domains, and in particular for low-variance simulations. This is
demonstrated empirically in a number of single-agent domains.LG201
Enhancing automated red teaming with Monte Carlo Tree Search
This study has investigated novel Automated Red Teaming methods that support replanning. Traditional Automated Red Teaming (ART) approaches usually use evolutionary computing methods for evolving plans using simulations. A drawback of this method is the inability to change a team’s strategy part way through a simulation. This study focussed on a Monte-Carlo Tree Search (MCTS) method in an ART environment that supports re-planning to lead to better strategy decisions and a higher average scor
Solving the Physical Traveling Salesman Problem: Tree Search and Macro Actions
This paper presents a number of approaches for solving a real-time game consisting of a ship that must visit a number of waypoints scattered around a 2-D maze full of obstacles. The game, the Physical Traveling Salesman Problem (PTSP), which featured in two IEEE conference competitions during 2012, provides a good balance between long-term planning (finding the optimal sequence of waypoints to visit), and short-term planning (driving the ship in the maze). This paper focuses on the algorithm that won both PTSP competitions: it takes advantage of the physics of the game to calculate the optimal order of waypoints, and it employs Monte Carlo tree search (MCTS) to drive the ship. The algorithm uses repetitions of actions (macro actions) to reduce the search space for navigation. Variations of this algorithm are presented and analyzed, in order to understand the strength of each one of its constituents and to comprehend what makes such an approach the best controller found so far for the PTSP. © 2009-2012 IEEE
Semi-Analytical Solution of the Theory on an Lattice
Investigating the cutoff dependence of the Higgs mass triviality bound, the
theory is formulated on an lattice which preserves Lorentz
invariance to a higher degree than the commonly used hypercubic lattice. I
solve this model non-perturbatively by evaluating the high temperature
expansion through 13th order following the approach of L\"uscher and Weisz. The
results are continued across the transition line into the broken phase by
integrating the perturbative RG equations. In the broken phase, the
renormalized coupling never exceeds 2/3 of the tree level unitarity bound when
. The results confirm recent Monte Carlo data and I obtain
as an upper bound for the Higgs mass at .Comment: 36 pages, CU-TP-603, Latex, 3 PS figures included, uuencoded
compressed shar fil
- …