2,233 research outputs found

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Thinking Fast and Slow with Deep Learning and Tree Search

    Get PDF
    Sequential decision making problems, such as structured prediction, robotic control, and game playing, require a combination of planning policies and generalisation of those plans. In this paper, we present Expert Iteration (ExIt), a novel reinforcement learning algorithm which decomposes the problem into separate planning and generalisation tasks. Planning new policies is performed by tree search, while a deep neural network generalises those plans. Subsequently, tree search is improved by using the neural network policy to guide search, increasing the strength of new plans. In contrast, standard deep Reinforcement Learning algorithms rely on a neural network not only to generalise plans, but to discover them too. We show that ExIt outperforms REINFORCE for training a neural network to play the board game Hex, and our final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters changed - Repetition of experiments: improved accuracy and errors shown. (note the reduction in effect size for the tpt/cat experiment) - Results from a longer training run, including changes in expert strength in training - Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see appendix

    Using Cultural Coevolution for Learning in General Game Playing

    Get PDF
    Traditionally, the construction of game playing agents relies on using pre-programmed heuristics and architectures tailored for a specific game. General Game Playing (GGP) provides a challenging alternative to this approach, with the aim being to construct players that are able to play any game, given just the rules. This thesis describes the construction of a General Game Player that is able to learn and build knowledge about the game in a multi-agent setup using cultural coevolution and reinforcement learning. We also describe how this knowledge can be used to complement UCT search, a Monte-Carlo tree search that has already been used successfully in GGP. Experiments are conducted to test the effectiveness of the knowledge by playing several games between our player and a player using random moves, and also a player using standard UCT search. The results show a marked improvement in performance when using the knowledge

    Random Search Algorithms

    Get PDF
    In this project we designed and developed improvements for the random search algorithm UCT with a focus on improving performance with directed acyclic graphs and groupings. We then performed experiments in order to quantify performance gains with both artificial game trees and computer Go. Finally, we analyzed the outcome of the experiments and presented our findings. Overall, this project represents original work in the area of random search algorithms on directed acyclic graphs and provides several opportunities for further research

    Adding expert knowledge and exploration in Monte-Carlo Tree Search

    Get PDF
    International audienceWe present a new exploration term, more efficient than clas- sical UCT-like exploration terms and combining efficiently expert rules, patterns extracted from datasets, All-Moves-As-First values and classi- cal online values. As this improved bandit formula does not solve several important situations (semeais, nakade) in computer Go, we present three other important improvements which are central in the recent progress of our program MoGo: { We show an expert-based improvement of Monte-Carlo simulations for nakade situations; we also emphasize some limitations of this modification. { We show a technique which preserves diversity in the Monte-Carlo simulation, which greatly improves the results in 19x19. { Whereas the UCB-based exploration term is not efficient in MoGo, we show a new exploration term which is highly efficient in MoGo. MoGo recently won a game with handicap 7 against a 9Dan Pro player, Zhou JunXun, winner of the LG Cup 2007, and a game with handicap 6 against a 1Dan pro player, Li-Chen Chien

    Adaptive Neural Network Usage in Computer Go

    Get PDF
    For decades, computer scientists have worked to develop an artificial intelligence for the game of Go intelligent enough to beat skilled human players. In 2016, Google accomplished just that with their program, AlphaGo. AlphaGo was a huge leap forward in artificial intelligence, but required quite a lot of computational power to run. The goal of our project was to take some of the techniques that make AlphaGo so powerful, and integrate them with a less resource intensive artificial intelligence. Specifically, we expanded on the work of last year’s MQP of integrating a neural network into an existing Go AI, Pachi. We rigorously tested the resultant program’s performance. We also used SPSA training to determine an adaptive value function so as to make the best use of the neural network

    Complexity, Heuristic, and Search Analysis for the Games of Crossings and Epaminondas

    Get PDF
    Games provide fertile research domains for algorithmic research. Often, game research helps solve real-world problems through the testing and refinement of search algorithms in game domains. Other times, game research finds limits for certain algorithms. For example, the game of Go proved intractable for the Min-Max with Alpha-Beta pruning algorithm leading to the popularity of Monte-Carlo based search algorithms. Although effective in Go, and game domains once ruled by Alpha-Beta such as Lines of Action, Monte-Carlo methods appear to have limits too as they fall short in tactical domains such as Hex and Chess. In a continuation of this type of research, two new games, Crossings and Epaminondas, are presented, analyzed and used to test two Monte-Carlo based algorithms: Upper Confidence Bounds applied to Trees (UCT) and Heuristic Guided UCT (HUCT). Results indicate that heuristic knowledge can positively affect UCT\u27s performance in the lower complexity domain of Crossings. However, both agents perform worse in the higher complexity domain of Epaminondas. This identifies Epaminondas as another domain that poses difficulties for Monte Carlo agents
    • …
    corecore