Search CORE

6 research outputs found

Accelerating and Improving AlphaZero Using Population Based Training

Author: Wei Ting-Han
Wu I-Chen
Wu Ti-Rong
Publication venue
Publication date: 13/03/2020
Field of study

AlphaZero has been very successful in many games. Unfortunately, it still consumes a huge amount of computing resources, the majority of which is spent in self-play. Hyperparameter tuning exacerbates the training cost since each hyperparameter configuration requires its own time to train one run, during which it will generate its own self-play records. As a result, multiple runs are usually needed for different hyperparameter configurations. This paper proposes using population based training (PBT) to help tune hyperparameters dynamically and improve strength during training time. Another significant advantage is that this method requires a single run only, while incurring a small additional time cost, since the time for generating self-play records remains unchanged though the time for optimization is increased following the AlphaZero training algorithm. In our experiments for 9x9 Go, the PBT method is able to achieve a higher win rate for 9x9 Go than the baselines, each with its own hyperparameter configuration and trained individually. For 19x19 Go, with PBT, we are able to obtain improvements in playing strength. Specifically, the PBT agent can obtain up to 74% win rate against ELF OpenGo, an open-source state-of-the-art AlphaZero program using a neural network of a comparable capacity. This is compared to a saturated non-PBT agent, which achieves a win rate of 47% against ELF OpenGo under the same circumstances.Comment: accepted by AAAI2020 as oral presentation. In this version, supplementary materials are adde

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

SAI: A sensible artificial intelligence that plays with handicap and targets high scores in 9x9 Go

Author: Amato G.
Fantozzi M.
Gini R.
Metta C.
Morandin F.
Parton M.
Publication venue: 'IOS Press'
Publication date: 01/01/2020
Field of study

We develop a new framework for the game of Go to target a high score, and thus a perfect play. We integrate this framework into the Monte Carlo tree search - policy iteration learning pipeline introduced by Google DeepMind with AlphaGo. Training on 9×9 Go produces a superhuman Go player, thus proving that this framework is stable and robust. We show that this player can be used to effectively play with both positional and score handicap. We develop a family of agents that can target high scores against any opponent, recover from very severe disadvantage against weak opponents, and avoid suboptimal moves

Archivio istituzionale della Ricerca - Università degli Studi di Parma

Algorithms for Adaptive Game-playing Agents

Author: Justesen Niels Orsleff
Publication venue: IT-Universitetet i København
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Balancing MCTS by Dynamically Adjusting the Komi Value

Author
Publication venue: 'IOS Press'
Publication date
Field of study

Crossref