189,873 research outputs found
SAI, a Sensible Artificial Intelligence that plays Go
We propose a multiple-komi modification of the AlphaGo Zero/Leela Zero
paradigm. The winrate as a function of the komi is modeled with a
two-parameters sigmoid function, so that the neural network must predict just
one more variable to assess the winrate for all komi values. A second novel
feature is that training is based on self-play games that occasionally branch
-- with changed komi -- when the position is uneven. With this setting,
reinforcement learning is showed to work on 7x7 Go, obtaining very strong
playing agents. As a useful byproduct, the sigmoid parameters given by the
network allow to estimate the score difference on the board, and to evaluate
how much the game is decided.Comment: Updated for IJCNN 2019 conferenc
GAMES: A new Scenario for Software and Knowledge Reuse
Games are a well-known test bed for testing search algorithms and learning methods, and many authors have presented numerous reasons for the research in this area. Nevertheless, they have not received the attention they deserve as software projects.
In this paper, we analyze the applicability of software
and knowledge reuse in the games domain. In spite of the
need to find a good evaluation function, search algorithms
and interface design can be said to be the primary concerns.
In addition, we will discuss the current state of the main
statistical learning methods and how they can be addressed
from a software engineering point of view. So, this paper
proposes a reliable environment and adequate tools, necessary in order to achieve high levels of reuse in the games domain
On Reinforcement Learning for Full-length Game of StarCraft
StarCraft II poses a grand challenge for reinforcement learning. The main
difficulties of it include huge state and action space and a long-time horizon.
In this paper, we investigate a hierarchical reinforcement learning approach
for StarCraft II. The hierarchy involves two levels of abstraction. One is the
macro-action automatically extracted from expert's trajectories, which reduces
the action space in an order of magnitude yet remains effective. The other is a
two-layer hierarchical architecture which is modular and easy to scale,
enabling a curriculum transferring from simpler tasks to more complex tasks.
The reinforcement training algorithm for this architecture is also
investigated. On a 64x64 map and using restrictive units, we achieve a winning
rate of more than 99\% against the difficulty level-1 built-in AI. Through the
curriculum transfer learning algorithm and a mixture of combat model, we can
achieve over 93\% winning rate of Protoss against the most difficult
non-cheating built-in AI (level-7) of Terran, training within two days using a
single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong
generalization performance, when tested against never seen opponents including
cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We
hope this study could shed some light on the future research of large-scale
reinforcement learning.Comment: Appeared in AAAI 201
Beyond Monte Carlo Tree Search: Playing Go with Deep Alternative Neural Network and Long-Term Evaluation
Monte Carlo tree search (MCTS) is extremely popular in computer Go which
determines each action by enormous simulations in a broad and deep search tree.
However, human experts select most actions by pattern analysis and careful
evaluation rather than brute search of millions of future nteractions. In this
paper, we propose a computer Go system that follows experts way of thinking and
playing. Our system consists of two parts. The first part is a novel deep
alternative neural network (DANN) used to generate candidates of next move.
Compared with existing deep convolutional neural network (DCNN), DANN inserts
recurrent layer after each convolutional layer and stacks them in an
alternative manner. We show such setting can preserve more contexts of local
features and its evolutions which are beneficial for move prediction. The
second part is a long-term evaluation (LTE) module used to provide a reliable
evaluation of candidates rather than a single probability from move predictor.
This is consistent with human experts nature of playing since they can foresee
tens of steps to give an accurate estimation of candidates. In our system, for
each candidate, LTE calculates a cumulative reward after several future
interactions when local variations are settled. Combining criteria from the two
parts, our system determines the optimal choice of next move. For more
comprehensive experiments, we introduce a new professional Go dataset (PGD),
consisting of 253233 professional records. Experiments on GoGoD and PGD
datasets show the DANN can substantially improve performance of move prediction
over pure DCNN. When combining LTE, our system outperforms most relevant
approaches and open engines based on MCTS.Comment: AAAI 201
- …