27 research outputs found
Accelerating and Improving AlphaZero Using Population Based Training
AlphaZero has been very successful in many games. Unfortunately, it still
consumes a huge amount of computing resources, the majority of which is spent
in self-play. Hyperparameter tuning exacerbates the training cost since each
hyperparameter configuration requires its own time to train one run, during
which it will generate its own self-play records. As a result, multiple runs
are usually needed for different hyperparameter configurations. This paper
proposes using population based training (PBT) to help tune hyperparameters
dynamically and improve strength during training time. Another significant
advantage is that this method requires a single run only, while incurring a
small additional time cost, since the time for generating self-play records
remains unchanged though the time for optimization is increased following the
AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is
able to achieve a higher win rate for 9x9 Go than the baselines, each with its
own hyperparameter configuration and trained individually. For 19x19 Go, with
PBT, we are able to obtain improvements in playing strength. Specifically, the
PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source
state-of-the-art AlphaZero program using a neural network of a comparable
capacity. This is compared to a saturated non-PBT agent, which achieves a win
rate of 47% against ELF OpenGo under the same circumstances.Comment: accepted by AAAI2020 as oral presentation. In this version,
supplementary materials are adde
Warm-Start AlphaZero Self-Play Search Enhancements
Recently, AlphaZero has achieved landmark results in deep reinforcement
learning, by providing a single self-play architecture that learned three
different games at super human level. AlphaZero is a large and complicated
system with many parameters, and success requires much compute power and
fine-tuning. Reproducing results in other games is a challenge, and many
researchers are looking for ways to improve results while reducing
computational demands. AlphaZero's design is purely based on self-play and
makes no use of labeled expert data ordomain specific enhancements; it is
designed to learn from scratch. We propose a novel approach to deal with this
cold-start problem by employing simple search enhancements at the beginning
phase of self-play training, namely Rollout, Rapid Action Value Estimate (RAVE)
and dynamically weighted combinations of these with the neural network, and
Rolling Horizon Evolutionary Algorithms (RHEA). Our experiments indicate that
most of these enhancements improve the performance of their baseline player in
three different (small) board games, with especially RAVE based variants
playing strongly
Searching by learning: Exploring artificial general intelligence on small board games by deep reinforcement learning
In deep reinforcement learning, searching and learning techniques are two important components. They can be used independently and in combination to deal with different problems in AI. These results have inspired research into artificial general intelligence (AGI).We study table based classic Q-learning on the General Game Playing (GGP) system, showing that classic Q-learning works on GGP, although convergence is slow, and it is computationally expensive to learn complex games.This dissertation uses an AlphaZero-like self-play framework to explore AGI on small games. By tuning different hyper-parameters, the role, effects and contributions of searching and learning are studied. A further experiment shows that search techniques can contribute as experts to generate better training examples to speed up the start phase of training.In order to extend the AlphaZero-likeself-play approach to single player complex games, the Morpion Solitaire game is implemented by combining Ranked Reward method. Our first AlphaZero-based approach is able to achieve a near human best record.Overall, in this thesis, both searching and learning techniques are studied (by themselves and in combination) in GGP and AlphaZero-like self-play systems. We do so for the purpose of making steps towards artificial general intelligence, towards systems that exhibit intelligent behavior in more than one domain. China Scholarship CouncilAlgorithms and the Foundations of Software technolog
Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task
Language models show a surprising range of capabilities, but the source of
their apparent competence is unclear. Do these networks just memorize a
collection of surface statistics, or do they rely on internal representations
of the process that generates the sequences they see? We investigate this
question by applying a variant of the GPT model to the task of predicting legal
moves in a simple board game, Othello. Although the network has no a priori
knowledge of the game or its rules, we uncover evidence of an emergent
nonlinear internal representation of the board state. Interventional
experiments indicate this representation can be used to control the output of
the network and create "latent saliency maps" that can help explain predictions
in human terms.Comment: ICLR 2023 oral (notable-top-5%):
https://openreview.net/forum?id=DeG07_TcZvT; code:
https://github.com/likenneth/othello_worl