29 research outputs found

    Learning to Search in Reinforcement Learning

    Get PDF
    In this thesis, we investigate the use of search based algorithms with deep neural networks to tackle a wide range of problems ranging from board games to video games and beyond. Drawing inspiration from AlphaGo, the first computer program to achieve superhuman performance in the game of Go, we developed a new algorithm AlphaZero. AlphaZero is a general reinforcement learning algorithm that combines deep neural networks with a Monte Carlo Tree search for planning and learning. Starting completely from scratch, without any prior human knowledge beyond the basic rules of the game, AlphaZero managed to achieve superhuman performance in Go, chess and shogi. Subsequently, building upon the success of AlphaZero, we investigated ways to extend our methods to problems in which the rules are not known or cannot be hand-coded. This line of work led to the development of MuZero, a model-based reinforcement learning agent that builds a deterministic internal model of the world and uses it to construct plans in its imagination. We applied our method to Go, chess, shogi and the classic Atari suite of video-games, achieving superhuman performance. MuZero is the first RL algorithm to master a variety of both canonical challenges for high performance planning and visually complex problems using the same principles. Finally, we describe Stochastic MuZero, a general agent that extends the applicability of MuZero to highly stochastic environments. We show that our method achieves superhuman performance in stochastic domains such as backgammon and the classic game of 2048 while matching the performance of MuZero in deterministic ones like Go

    Expert iteration

    Get PDF
    In this thesis, we study how reinforcement learning algorithms can tackle classical board games without recourse to human knowledge. Specifically, we develop a framework and algorithms which learn to play the board game Hex starting from random play. We first describe Expert Iteration (ExIt), a novel reinforcement learning framework which extends Modified Policy Iteration. ExIt explicitly decomposes the reinforcement learning problem into two parts: planning and generalisation. A planning algorithm explores possible move sequences starting from a particular position to find good strategies from that position, while a parametric function approximator is trained to predict those plans, generalising to states not yet seen. Subsequently, planning is improved by using the approximated policy to guide search, increasing the strength of new plans. This decomposition allows ExIt to combine the benefits of both planning methods and function approximation methods. We demonstrate the effectiveness of the ExIt paradigm by implementing ExIt with two different planning algorithms. First, we develop a version based on Monte Carlo Tree Search (MCTS), a search algorithm which has been successful both in specific games, such as Go, Hex and Havannah, and in general game playing competitions. We then develop a new planning algorithm, Policy Gradient Search (PGS), which uses a model-free reinforcement learning algorithm for online planning. Unlike MCTS, PGS does not require an explicit search tree. Instead PGS uses function approximation within a single search, allowing it to be applied to problems with larger branching factors. Both MCTS-ExIt and PGS-ExIt defeated MoHex 2.0 - the most recent Hex Olympiad winner to be open sourced - in 9 × 9 Hex. More importantly, whereas MoHex makes use of many Hex-specific improvements and knowledge, all our programs were trained tabula rasa using general reinforcement learning methods. This bodes well for ExIt’s applicability to both other games and real world decision making problems

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    The dark side of the board: advances in chess Kriegspiel

    Get PDF
    While imperfect information games are an excellent model of real-world problems and tasks, they are often difficult for computer programs to play at a high level of proficiency, especially if they involve major uncertainty and a very large state space. Kriegspiel, a variant of chess making it similar to a wargame, is a perfect example: while the game was studied for decades from a game-theoretical viewpoint, it was only very recently that the first practical algorithms for playing it began to appear. This thesis presents, documents and tests a multi-sided effort towards making a strong Kriegspiel player, using heuristic searching, retrograde analysis and Monte Carlo tree search algorithms to achieve increasingly higher levels of play. The resulting program is currently the strongest computer player in the world and plays at an above-average human level

    Selective search in games of different complexity

    Get PDF

    Monte-Carlo tree search enhancements for one-player and two-player domains

    Get PDF

    Proceedings of the 18th Irish Conference on Artificial Intelligence and Cognitive Science

    Get PDF
    These proceedings contain the papers that were accepted for publication at AICS-2007, the 18th Annual Conference on Artificial Intelligence and Cognitive Science, which was held in the Technological University Dublin; Dublin, Ireland; on the 29th to the 31st August 2007. AICS is the annual conference of the Artificial Intelligence Association of Ireland (AIAI)
    corecore