1,067 research outputs found
On Reinforcement Learning for Full-length Game of StarCraft
StarCraft II poses a grand challenge for reinforcement learning. The main
difficulties of it include huge state and action space and a long-time horizon.
In this paper, we investigate a hierarchical reinforcement learning approach
for StarCraft II. The hierarchy involves two levels of abstraction. One is the
macro-action automatically extracted from expert's trajectories, which reduces
the action space in an order of magnitude yet remains effective. The other is a
two-layer hierarchical architecture which is modular and easy to scale,
enabling a curriculum transferring from simpler tasks to more complex tasks.
The reinforcement training algorithm for this architecture is also
investigated. On a 64x64 map and using restrictive units, we achieve a winning
rate of more than 99\% against the difficulty level-1 built-in AI. Through the
curriculum transfer learning algorithm and a mixture of combat model, we can
achieve over 93\% winning rate of Protoss against the most difficult
non-cheating built-in AI (level-7) of Terran, training within two days using a
single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong
generalization performance, when tested against never seen opponents including
cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We
hope this study could shed some light on the future research of large-scale
reinforcement learning.Comment: Appeared in AAAI 201
On Efficient Reinforcement Learning for Full-length Game of StarCraft II
StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL),
of which the main difficulties include huge state space, varying action space,
and a long time horizon. In this work, we investigate a set of RL techniques
for the full-length game of StarCraft II. We investigate a hierarchical RL
approach involving extracted macro-actions and a hierarchical architecture of
neural networks. We investigate a curriculum transfer training procedure and
train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64
map and using restrictive units, we achieve a win rate of 99% against the
level-1 built-in AI. Through the curriculum transfer learning algorithm and a
mixture of combat models, we achieve a 93% win rate against the most difficult
non-cheating level built-in AI (level-7). In this extended version of the
paper, we improve our architecture to train the agent against the cheating
level AIs and achieve the win rate against the level-8, level-9, and level-10
AIs as 96%, 97%, and 94%, respectively. Our codes are at
https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the
AlphaStar for our work as well as the research and open-source community, we
reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version
of mAS is 1.07, which can be trained on the raw action space which has 564
actions. It is designed to run training on a single common machine, by making
the hyper-parameters adjustable. We then compare our work with mAS using the
same resources and show that our method is more effective. The codes of
mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our
study could shed some light on the future research of efficient reinforcement
learning on SC2 and other large-scale games.Comment: 48 pages,21 figure
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games
Many artificial intelligence (AI) applications often require multiple
intelligent agents to work in a collaborative effort. Efficient learning for
intra-agent communication and coordination is an indispensable step towards
general AI. In this paper, we take StarCraft combat game as a case study, where
the task is to coordinate multiple agents as a team to defeat their enemies. To
maintain a scalable yet effective communication protocol, we introduce a
Multiagent Bidirectionally-Coordinated Network (BiCNet ['bIknet]) with a
vectorised extension of actor-critic formulation. We show that BiCNet can
handle different types of combats with arbitrary numbers of AI agents for both
sides. Our analysis demonstrates that without any supervisions such as human
demonstrations or labelled data, BiCNet could learn various types of advanced
coordination strategies that have been commonly used by experienced game
players. In our experiments, we evaluate our approach against multiple
baselines under different scenarios; it shows state-of-the-art performance, and
possesses potential values for large-scale real-world applications.Comment: 10 pages, 10 figures. Previously as title: "Multiagent
Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat
Games", Mar 201
- …