30 research outputs found

    Distributed Nested Rollout Policy for Same Game

    Get PDF
    Nested Rollout Policy Adaptation (NRPA) is a Monte Carlo search heuristic for puzzles and other optimization problems. It achieves state-of-the-art performance on several games including SameGame. In this paper, we design several parallel and distributed NRPA-based search techniques, and we provide a number of experimental insights about their execution. Finally, we use our best implementation to discover 15 better scores for 20 standard SameGame boards

    Investigating the Limits of Monte-Carlo Tree Search Methods in Computer Go

    No full text

    Monte-Carlo Tree Search for the Physical Travelling Salesman Problem

    No full text
    The significant success of MCTS in recent years, particularly in the game Go, has led to the application of MCTS to numerous other domains. In an ongoing effort to better understand the performance of MCTS in open-ended real-time video games, we apply MCTS to the Physical Travelling Salesman Problem (PTSP). We discuss different approaches to tailor MCTS to this particular problem domain and subsequently identify and attempt to overcome some of the apparent shortcomings. Results show that suitable heuristics can boost the performance of MCTS significantly in this domain. However, visualisations of the search indicate that MCTS is currently seeking solutions in a rather greedy manner, and coercing it to balance short term and long term constraints for the PTSP remains an open problem. © 2012 Springer-Verlag

    Learning a Move-Generator for Upper Confidence Trees

    No full text
    We experiment the introduction of machine learning tools to improve Monte-Carlo Tree Search. More precisely, we propose the use of Direct Policy Search, a classical reinforcement learning paradigm, to learn the Monte-Carlo Move Generator. We experiment our algorithm on different forms of unit commitment problems, including experiments on a problem with both macrolevel and microlevel decisions

    Challenging Established Move Ordering Strategies with Adaptive Data Structures

    No full text
    The field of game playing is a particularly well-studied area within the context of AI, leading to the development of powerful techniques, such as the alpha-beta search, capable of achieving competitive game play against an intelligent opponent. It is well known that tree pruning strategies, such as alpha-beta, benefit strongly from proper move ordering, that is, searching the best element first. Inspired by the formerly unrelated field of Adaptive Data Structures (ADSs), we have previously introduced the History-ADS technique, which employs an adaptive list to achieve effective and dynamic move ordering, in a domain independent fashion, and found that it performs well in a wide range of cases. However, previous work did not compare the performance of the History-ADS heuristic to any established move ordering strategy. In an attempt to address this problem, we present here a comparison to two well-known, acclaimed strategies, which operate on a similar philosophy to the History-ADS, the History Heuristic, and the Killer Moves technique. We find that, in a wide range of two-player and multi-player games, at various points in the game’s progression, the History-ADS performs at least as well as these strategies, and, in fact, outperforms them in the majority of cases

    On Semeai Detection in Monte-Carlo Go

    No full text

    Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search

    No full text
    Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, sim-ple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the UCT se-lection policy for minimizing cumulative regret in the tree, this paper introduces a new MCTS variant, Hybrid MCTS (H-MCTS), which min-imizes both types of regret in different parts of the tree. H-MCTS uses SHOT, a recursive version of Sequential Halving, to minimize simple regret near the root, and UCT to minimize cumulative regret when de-scending further down the tree. We discuss the motivation for this new search technique, and show the performance of H-MCTS in six distinc
    corecore