13,843 research outputs found

    Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation

    Full text link
    Monte Carlo tree search (MCTS) is extremely popular in computer Go which determines each action by enormous simulations in a broad and deep search tree. However, human experts select most actions by pattern analysis and careful evaluation rather than brute search of millions of future nteractions. In this paper, we propose a computer Go system that follows experts way of thinking and playing. Our system consists of two parts. The first part is a novel deep alternative neural network (DANN) used to generate candidates of next move. Compared with existing deep convolutional neural network (DCNN), DANN inserts recurrent layer after each convolutional layer and stacks them in an alternative manner. We show such setting can preserve more contexts of local features and its evolutions which are beneficial for move prediction. The second part is a long-term evaluation (LTE) module used to provide a reliable evaluation of candidates rather than a single probability from move predictor. This is consistent with human experts nature of playing since they can foresee tens of steps to give an accurate estimation of candidates. In our system, for each candidate, LTE calculates a cumulative reward after several future interactions when local variations are settled. Combining criteria from the two parts, our system determines the optimal choice of next move. For more comprehensive experiments, we introduce a new professional Go dataset (PGD), consisting of 253233 professional records. Experiments on GoGoD and PGD datasets show the DANN can substantially improve performance of move prediction over pure DCNN. When combining LTE, our system outperforms most relevant approaches and open engines based on MCTS.Comment: AAAI 201

    A hybridisation technique for game playing using the upper confidence for trees algorithm with artificial neural networks

    Get PDF
    In the domain of strategic game playing, the use of statistical techniques such as the Upper Confidence for Trees (UCT) algorithm, has become the norm as they offer many benefits over classical algorithms. These benefits include requiring no game-specific strategic knowledge and time-scalable performance. UCT does not incorporate any strategic information specific to the game considered, but instead uses repeated sampling to effectively brute-force search through the game tree or search space. The lack of game-specific knowledge in UCT is thus both a benefit but also a strategic disadvantage. Pattern recognition techniques, specifically Neural Networks (NN), were identified as a means of addressing the lack of game-specific knowledge in UCT. Through a novel hybridisation technique which combines UCT and trained NNs for pruning, the UCTNN algorithm was derived. The NN component of UCT-NN was trained using a UCT self-play scheme to generate game-specific knowledge without the need to construct and manage game databases for training purposes. The UCT-NN algorithm is outlined for pruning in the game of Go-Moku as a candidate case-study for this research. The UCT-NN algorithm contained three major parameters which emerged from the UCT algorithm, the use of NNs and the pruning schemes considered. Suitable methods for finding candidate values for these three parameters were outlined and applied to the game of Go-Moku on a 5 by 5 board. An empirical investigation of the playing performance of UCT-NN was conducted in comparison to UCT through three benchmarks. The benchmarks comprise a common randomly moving opponent, a common UCTmax player which is given a large amount of playing time, and a pair-wise tournament between UCT-NN and UCT. The results of the performance evaluation for 5 by 5 Go-Moku were promising, which prompted an evaluation of a larger 9 by 9 Go-Moku board. The results of both evaluations indicate that the time allocated to the UCT-NN algorithm directly affects its performance when compared to UCT. The UCT-NN algorithm generally performs better than UCT in games with very limited time-constraints in all benchmarks considered except when playing against a randomly moving player in 9 by 9 Go-Moku. In real-time and near-real-time Go-Moku games, UCT-NN provides statistically significant improvements compared to UCT. The findings of this research contribute to the realisation of applying game-specific knowledge to the UCT algorithm

    A Dynamical Systems Approach for Static Evaluation in Go

    Full text link
    In the paper arguments are given why the concept of static evaluation has the potential to be a useful extension to Monte Carlo tree search. A new concept of modeling static evaluation through a dynamical system is introduced and strengths and weaknesses are discussed. The general suitability of this approach is demonstrated.Comment: IEEE Transactions on Computational Intelligence and AI in Games, vol 3 (2011), no

    The effect of simulation bias on action selection in Monte Carlo Tree Search

    Get PDF
    A dissertation submitted to the Faculty of Science, University of the Witwatersrand, in fulfilment of the requirements for the degree of Master of Science. August 2016.Monte Carlo Tree Search (MCTS) is a family of directed search algorithms that has gained widespread attention in recent years. It combines a traditional tree-search approach with Monte Carlo simulations, using the outcome of these simulations (also known as playouts or rollouts) to evaluate states in a look-ahead tree. That MCTS does not require an evaluation function makes it particularly well-suited to the game of Go — seen by many to be chess’s successor as a grand challenge of artificial intelligence — with MCTS-based agents recently able to achieve expert-level play on 19×19 boards. Furthermore, its domain-independent nature also makes it a focus in a variety of other fields, such as Bayesian reinforcement learning and general game-playing. Despite the vast amount of research into MCTS, the dynamics of the algorithm are still not yet fully understood. In particular, the effect of using knowledge-heavy or biased simulations in MCTS still remains unknown, with interesting results indicating that better-informed rollouts do not necessarily result in stronger agents. This research provides support for the notion that MCTS is well-suited to a class of domain possessing a smoothness property. In these domains, biased rollouts are more likely to produce strong agents. Conversely, any error due to incorrect bias is compounded in non-smooth domains, and in particular for low-variance simulations. This is demonstrated empirically in a number of single-agent domains.LG201

    Enhancing automated red teaming with Monte Carlo Tree Search

    Get PDF
    This study has investigated novel Automated Red Teaming methods that support replanning. Traditional Automated Red Teaming (ART) approaches usually use evolutionary computing methods for evolving plans using simulations. A drawback of this method is the inability to change a team’s strategy part way through a simulation. This study focussed on a Monte-Carlo Tree Search (MCTS) method in an ART environment that supports re-planning to lead to better strategy decisions and a higher average scor

    Solving the Physical Traveling Salesman Problem: Tree Search and Macro Actions

    Get PDF
    This paper presents a number of approaches for solving a real-time game consisting of a ship that must visit a number of waypoints scattered around a 2-D maze full of obstacles. The game, the Physical Traveling Salesman Problem (PTSP), which featured in two IEEE conference competitions during 2012, provides a good balance between long-term planning (finding the optimal sequence of waypoints to visit), and short-term planning (driving the ship in the maze). This paper focuses on the algorithm that won both PTSP competitions: it takes advantage of the physics of the game to calculate the optimal order of waypoints, and it employs Monte Carlo tree search (MCTS) to drive the ship. The algorithm uses repetitions of actions (macro actions) to reduce the search space for navigation. Variations of this algorithm are presented and analyzed, in order to understand the strength of each one of its constituents and to comprehend what makes such an approach the best controller found so far for the PTSP. © 2009-2012 IEEE

    Semi-Analytical Solution of the Ï•4\phi^4 Theory on an F4F_4 Lattice

    Full text link
    Investigating the cutoff dependence of the Higgs mass triviality bound, the ϕ4\phi^4 theory is formulated on an F4F_4 lattice which preserves Lorentz invariance to a higher degree than the commonly used hypercubic lattice. I solve this model non-perturbatively by evaluating the high temperature expansion through 13th order following the approach of L\"uscher and Weisz. The results are continued across the transition line into the broken phase by integrating the perturbative RG equations. In the broken phase, the renormalized coupling never exceeds 2/3 of the tree level unitarity bound when Λ/mR≥2\Lambda/m_R \geq 2. The results confirm recent Monte Carlo data and I obtain as an upper bound for the Higgs mass mR/fπ≤2.46±0.02HTE±0.08PTm_R/f_\pi \leq 2.46 \pm 0.02_{\rm HTE} \pm 0.08_{\rm PT} at Λ/mR=2\Lambda/m_R=2.Comment: 36 pages, CU-TP-603, Latex, 3 PS figures included, uuencoded compressed shar fil
    • …
    corecore