258 research outputs found
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also considered as a testbed for AI
research, due to its enormous state space, hidden information, multi-agent
collaboration and so on. Thanks to the annual AIIDE and CIG competitions, a
growing number of bots are proposed and being continuously improved. However, a
big gap still remains between the top bot and the professional human players.
One vital reason is that current bots mainly rely on predefined rules to
perform macro actions. These rules are not scalable and efficient enough to
cope with the large but partially observed macro state space in SC. In this
paper, we propose a DRL based framework to do macro action selection. Our
framework combines the reinforcement learning approach Ape-X DQN with
Long-Short-Term-Memory (LSTM) to improve the macro action selection in bot. We
evaluate our bot, named as LastOrder, on the AIIDE 2017 StarCraft AI
competition bots set. Our bot achieves overall 83% win-rate, outperforming 26
bots in total 28 entrants
On Reinforcement Learning for Full-length Game of StarCraft
StarCraft II poses a grand challenge for reinforcement learning. The main
difficulties of it include huge state and action space and a long-time horizon.
In this paper, we investigate a hierarchical reinforcement learning approach
for StarCraft II. The hierarchy involves two levels of abstraction. One is the
macro-action automatically extracted from expert's trajectories, which reduces
the action space in an order of magnitude yet remains effective. The other is a
two-layer hierarchical architecture which is modular and easy to scale,
enabling a curriculum transferring from simpler tasks to more complex tasks.
The reinforcement training algorithm for this architecture is also
investigated. On a 64x64 map and using restrictive units, we achieve a winning
rate of more than 99\% against the difficulty level-1 built-in AI. Through the
curriculum transfer learning algorithm and a mixture of combat model, we can
achieve over 93\% winning rate of Protoss against the most difficult
non-cheating built-in AI (level-7) of Terran, training within two days using a
single machine with only 48 CPU cores and 8 K40 GPUs. It also shows strong
generalization performance, when tested against never seen opponents including
cheating levels built-in AI and all levels of Zerg and Protoss built-in AI. We
hope this study could shed some light on the future research of large-scale
reinforcement learning.Comment: Appeared in AAAI 201
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
A Study of AI Population Dynamics with Million-agent Reinforcement Learning
We conduct an empirical study on discovering the ordered collective dynamics
obtained by a population of intelligence agents, driven by million-agent
reinforcement learning. Our intention is to put intelligent agents into a
simulated natural context and verify if the principles developed in the real
world could also be used in understanding an artificially-created intelligent
population. To achieve this, we simulate a large-scale predator-prey world,
where the laws of the world are designed by only the findings or logical
equivalence that have been discovered in nature. We endow the agents with the
intelligence based on deep reinforcement learning (DRL). In order to scale the
population size up to millions agents, a large-scale DRL training platform with
redesigned experience buffer is proposed. Our results show that the population
dynamics of AI agents, driven only by each agent's individual self-interest,
reveals an ordered pattern that is similar to the Lotka-Volterra model studied
in population biology. We further discover the emergent behaviors of collective
adaptations in studying how the agents' grouping behaviors will change with the
environmental resources. Both of the two findings could be explained by the
self-organization theory in nature.Comment: Full version of the paper presented at AAMAS 2018 (International
Conference on Autonomous Agents and Multiagent Systems
On Efficient Reinforcement Learning for Full-length Game of StarCraft II
StarCraft II (SC2) poses a grand challenge for reinforcement learning (RL),
of which the main difficulties include huge state space, varying action space,
and a long time horizon. In this work, we investigate a set of RL techniques
for the full-length game of StarCraft II. We investigate a hierarchical RL
approach involving extracted macro-actions and a hierarchical architecture of
neural networks. We investigate a curriculum transfer training procedure and
train the agent on a single machine with 4 GPUs and 48 CPU threads. On a 64x64
map and using restrictive units, we achieve a win rate of 99% against the
level-1 built-in AI. Through the curriculum transfer learning algorithm and a
mixture of combat models, we achieve a 93% win rate against the most difficult
non-cheating level built-in AI (level-7). In this extended version of the
paper, we improve our architecture to train the agent against the cheating
level AIs and achieve the win rate against the level-8, level-9, and level-10
AIs as 96%, 97%, and 94%, respectively. Our codes are at
https://github.com/liuruoze/HierNet-SC2. To provide a baseline referring the
AlphaStar for our work as well as the research and open-source community, we
reproduce a scaled-down version of it, mini-AlphaStar (mAS). The latest version
of mAS is 1.07, which can be trained on the raw action space which has 564
actions. It is designed to run training on a single common machine, by making
the hyper-parameters adjustable. We then compare our work with mAS using the
same resources and show that our method is more effective. The codes of
mini-AlphaStar are at https://github.com/liuruoze/mini-AlphaStar. We hope our
study could shed some light on the future research of efficient reinforcement
learning on SC2 and other large-scale games.Comment: 48 pages,21 figure
- …