8 research outputs found
Acquisition of Chess Knowledge in AlphaZero
What is learned by sophisticated neural network agents such as AlphaZero?
This question is of both scientific and practical interest. If the
representations of strong neural networks bear no resemblance to human
concepts, our ability to understand faithful explanations of their decisions
will be restricted, ultimately limiting what we can achieve with neural network
interpretability. In this work we provide evidence that human knowledge is
acquired by the AlphaZero neural network as it trains on the game of chess. By
probing for a broad range of human chess concepts we show when and where these
concepts are represented in the AlphaZero network. We also provide a
behavioural analysis focusing on opening play, including qualitative analysis
from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary
investigation looking at the low-level details of AlphaZero's representations,
and make the resulting behavioural and representational analyses available
online.Comment: 69 pages, 44 figure
From Images to Connections: Can DQN with GNNs learn the Strategic Game of Hex?
The gameplay of strategic board games such as chess, Go and Hex is often
characterized by combinatorial, relational structures -- capturing distinct
interactions and non-local patterns -- and not just images. Nonetheless, most
common self-play reinforcement learning (RL) approaches simply approximate
policy and value functions using convolutional neural networks (CNN). A key
feature of CNNs is their relational inductive bias towards locality and
translational invariance. In contrast, graph neural networks (GNN) can encode
more complicated and distinct relational structures. Hence, we investigate the
crucial question: Can GNNs, with their ability to encode complex connections,
replace CNNs in self-play reinforcement learning? To this end, we do a
comparison with Hex -- an abstract yet strategically rich board game -- serving
as our experimental platform. Our findings reveal that GNNs excel at dealing
with long range dependency situations in game states and are less prone to
overfitting, but also showing a reduced proficiency in discerning local
patterns. This suggests a potential paradigm shift, signaling the use of
game-specific structures to reshape self-play reinforcement learning
Learning Personalized Models of Human Behavior in Chess
Even when machine learning systems surpass human ability in a domain, there
are many reasons why AI systems that capture human-like behavior would be
desirable: humans may want to learn from them, they may need to collaborate
with them, or they may expect them to serve as partners in an extended
interaction. Motivated by this goal of human-like AI systems, the problem of
predicting human actions -- as opposed to predicting optimal actions -- has
become an increasingly useful task. We extend this line of work by developing
highly accurate personalized models of human behavior in the context of chess.
Chess is a rich domain for exploring these questions, since it combines a set
of appealing features: AI systems have achieved superhuman performance but
still interact closely with human chess players both as opponents and
preparation tools, and there is an enormous amount of recorded data on
individual players. Starting with an open-source version of AlphaZero trained
on a population of human players, we demonstrate that we can significantly
improve prediction of a particular player's moves by applying a series of
fine-tuning adjustments. Furthermore, we can accurately perform stylometry --
predicting who made a given set of actions -- indicating that our personalized
models capture human decision-making at an individual level.Comment: The current version of the paper corrects data processing problems
present in the previous version. 21 pages, 13 figures, 7 tables (one very
long
AlphaZe∗∗: AlphaZero-like baselines for imperfect information games are surprisingly strong
In recent years, deep neural networks for strategy games have made significant progress. AlphaZero-like frameworks which combine Monte-Carlo tree search with reinforcement learning have been successfully applied to numerous games with perfect information. However, they have not been developed for domains where uncertainty and unknowns abound, and are therefore often considered unsuitable due to imperfect observations. Here, we challenge this view and argue that they are a viable alternative for games with imperfect information — a domain currently dominated by heuristic approaches or methods explicitly designed for hidden information, such as oracle-based techniques. To this end, we introduce a novel algorithm based solely on reinforcement learning, called AlphaZe∗∗, which is an AlphaZero-based framework for games with imperfect information. We examine its learning convergence on the games Stratego and DarkHex and show that it is a surprisingly strong baseline, while using a model-based approach: it achieves similar win rates against other Stratego bots like Pipeline Policy Space Response Oracle (P2SRO), while not winning in direct comparison against P2SRO or reaching the much stronger numbers of DeepNash. Compared to heuristics and oracle-based approaches, AlphaZe∗∗ can easily deal with rule changes, e.g., when more information than usual is given, and drastically outperforms other approaches in this respect
From Analog to Digital Computing: Is Homo sapiens’ Brain on Its Way to Become a Turing Machine?
The abstract basis of modern computation is the formal description of a finite state machine, the Universal Turing Machine, based on manipulation of integers and logic symbols. In this contribution to the discourse on the computer-brain analogy, we discuss the extent to which analog computing, as performed by the mammalian brain, is like and unlike the digital computing of Universal Turing Machines. We begin with ordinary reality being a permanent dialog between continuous and discontinuous worlds. So it is with computing, which can be analog or digital, and is often mixed. The theory behind computers is essentially digital, but efficient simulations of phenomena can be performed by analog devices; indeed, any physical calculation requires implementation in the physical world and is therefore analog to some extent, despite being based on abstract logic and arithmetic. The mammalian brain, comprised of neuronal networks, functions as an analog device and has given rise to artificial neural networks that are implemented as digital algorithms but function as analog models would. Analog constructs compute with the implementation of a variety of feedback and feedforward loops. In contrast, digital algorithms allow the implementation of recursive processes that enable them to generate unparalleled emergent properties. We briefly illustrate how the cortical organization of neurons can integrate signals and make predictions analogically. While we conclude that brains are not digital computers, we speculate on the recent implementation of human writing in the brain as a possible digital path that slowly evolves the brain into a genuine (slow) Turing machine
Learning to Play the Chess Variant Crazyhouse Above World Champion Level With Deep Neural Networks and Human Data
Deep neural networks have been successfully applied in learning the board games Go,
chess, and shogi without prior knowledge by making use of reinforcement learning.
Although starting from zero knowledge has been shown to yield impressive results, it
is associated with high computationally costs especially for complex games. With this
paper, we present CrazyAra which is a neural network based engine solely trained
in supervised manner for the chess variant crazyhouse. Crazyhouse is a game with
a higher branching factor than chess and there is only limited data of lower quality
available compared to AlphaGo. Therefore, we focus on improving efficiency in multiple
aspects while relying on low computational resources. These improvements include
modifications in the neural network design and training configuration, the introduction of a
data normalization step and a more sample efficient Monte-Carlo tree search which has a
lower chance to blunder. After training on 569537 human games for 1.5 days we achieve
a move prediction accuracy of 60.4%. During development, versions of CrazyAra played
professional human players.Most notably, CrazyAra achieved a four to one win over 2017
crazyhouse world champion Justin Tan (aka LM Jann Lee) who is more than 400 Elo
higher rated compared to the average player in our training set. Furthermore, we test the
playing strength of CrazyAra on CPU against all participants of the second Crazyhouse
Computer Championships 2017, winning against twelve of the thirteen participants.
Finally, for CrazyAraFish we continue training our model on generated engine games.
In 10 long-time control matches playing Stockfish 10, CrazyAraFish wins three games
and draws one out of 10 matches