24 research outputs found
Thinking Fast and Slow with Deep Learning and Tree Search
Sequential decision making problems, such as structured prediction, robotic
control, and game playing, require a combination of planning policies and
generalisation of those plans. In this paper, we present Expert Iteration
(ExIt), a novel reinforcement learning algorithm which decomposes the problem
into separate planning and generalisation tasks. Planning new policies is
performed by tree search, while a deep neural network generalises those plans.
Subsequently, tree search is improved by using the neural network policy to
guide search, increasing the strength of new plans. In contrast, standard deep
Reinforcement Learning algorithms rely on a neural network not only to
generalise plans, but to discover them too. We show that ExIt outperforms
REINFORCE for training a neural network to play the board game Hex, and our
final tree search agent, trained tabula rasa, defeats MoHex 1.0, the most
recent Olympiad Champion player to be publicly released.Comment: v1 to v2: - Add a value function in MCTS - Some MCTS hyper-parameters
changed - Repetition of experiments: improved accuracy and errors shown.
(note the reduction in effect size for the tpt/cat experiment) - Results from
a longer training run, including changes in expert strength in training -
Comparison to MoHex. v3: clarify independence of ExIt and AG0. v4: see
appendix
Expert iteration
In this thesis, we study how reinforcement learning algorithms can tackle classical board games without recourse to human knowledge. Specifically, we develop a framework and algorithms which learn to play the board game Hex starting from random play. We first describe Expert Iteration (ExIt), a novel reinforcement learning framework which extends Modified Policy Iteration. ExIt explicitly decomposes the reinforcement learning problem into two parts: planning and generalisation. A planning algorithm explores possible move sequences starting from a particular position to find good strategies from that position, while a parametric function approximator is trained to predict those plans, generalising to states not yet seen. Subsequently, planning is improved by using the approximated policy to guide search, increasing the strength of new plans. This decomposition allows ExIt to combine the benefits of both planning methods and function approximation methods. We demonstrate the effectiveness of the ExIt paradigm by implementing ExIt with two different planning algorithms. First, we develop a version based on Monte Carlo Tree Search (MCTS), a search algorithm which has been successful both in specific games, such as Go, Hex and Havannah, and in general game playing competitions. We then develop a new planning algorithm, Policy Gradient Search (PGS), which uses a model-free reinforcement learning algorithm for online planning. Unlike MCTS, PGS does not require an explicit search tree. Instead PGS uses function approximation within a single search, allowing it to be applied to problems with larger branching factors. Both MCTS-ExIt and PGS-ExIt defeated MoHex 2.0 - the most recent Hex Olympiad winner to be open sourced - in 9 × 9 Hex. More importantly, whereas MoHex makes use of many Hex-specific improvements and knowledge, all our programs were trained tabula rasa using general reinforcement learning methods. This bodes well for ExIt’s applicability to both other games and real world decision making problems
Logic-based AI for Interpretable Board Game Winner Prediction with Tsetlin Machine
Hex is a turn-based two-player connection game with a high branching factor,
making the game arbitrarily complex with increasing board sizes. As such,
top-performing algorithms for playing Hex rely on accurate evaluation of board
positions using neural networks. However, the limited interpretability of
neural networks is problematic when the user wants to understand the reasoning
behind the predictions made. In this paper, we propose to use propositional
logic expressions to describe winning and losing board game positions,
facilitating precise visual interpretation. We employ a Tsetlin Machine (TM) to
learn these expressions from previously played games, describing where pieces
must be located or not located for a board position to be strong. Extensive
experiments on boards compare our TM-based solution with popular
machine learning algorithms like XGBoost, InterpretML, decision trees, and
neural networks, considering various board configurations with to
moves played. On average, the TM testing accuracy is , outperforming
all the other evaluated algorithms. We further demonstrate the global
interpretation of the logical expressions and map them down to particular board
game configurations to investigate local interpretability. We believe the
resulting interpretability establishes building blocks for accurate assistive
AI and human-AI collaboration, also for more complex prediction tasks
Depth, balancing, and limits of the Elo model
-Much work has been devoted to the computational complexity of games.
However, they are not necessarily relevant for estimating the complexity in
human terms. Therefore, human-centered measures have been proposed, e.g. the
depth. This paper discusses the depth of various games, extends it to a
continuous measure. We provide new depth results and present tool
(given-first-move, pie rule, size extension) for increasing it. We also use
these measures for analyzing games and opening moves in Y, NoGo, Killall Go,
and the effect of pie rules
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Implementation of Game Strategies for HEX Game
Bakalářská práce se věnuje seznámení se hrou Hex a jejími herními strategiemi. Cílem práce je vytvoření vlastní implementace hry včetně prostředí, které umožňuje simulaci hry. Prostředí musí umožňovat různé herní možnosti včetně hry dvou hráčů a hry hráče proti počítači. Ve vlastní implementaci musí být použity různé herní strategie, které bude využívat počítač při hře proti uživateli. Práce se dále zabývá přehledem volně dostupných implementací hry s umělou inteligencí a porovnání těchto implementací s vlastní implementací.This bachelor thesis deals with getting to know the Hex game and its strategies. The aim of the work is to create custom implementation of the game including simulation enviroment. The enviroment has to be capable of simulating various game options including player vs. player mode and player vs. PC mode. The custom implementation has to use various game strategies, which would be used by the PC during player vs. PC mode. This thesis also maps freely available game implementations with AI and compares those implementations with the new, custom one.460 - Katedra informatikyvýborn