295 research outputs found
Deep learning for video game playing
In this article, we review recent Deep Learning advances in the context of
how they have been applied to play different types of video games such as
first-person shooters, arcade games, and real-time strategy games. We analyze
the unique requirements that different game genres pose to a deep learning
system and highlight important open challenges in the context of applying these
machine learning methods to video games, such as general game playing, dealing
with extremely large decision spaces and sparse rewards
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning
Deep reinforcement learning has been successful in a variety of tasks, such
as game playing and robotic manipulation. However, attempting to learn
\textit{tabula rasa} disregards the logical structure of many domains as well
as the wealth of readily available knowledge from domain experts that could
help "warm start" the learning process. We present a novel reinforcement
learning technique that allows for intelligent initialization of a neural
network weights and architecture. Our approach permits the encoding domain
knowledge directly into a neural decision tree, and improves upon that
knowledge with policy gradient updates. We empirically validate our approach on
two OpenAI Gym tasks and two modified StarCraft 2 tasks, showing that our novel
architecture outperforms multilayer-perceptron and recurrent architectures. Our
knowledge-based framework finds superior policies compared to imitation
learning-based and prior knowledge-based approaches. Importantly, we
demonstrate that our approach can be used by untrained humans to initially
provide >80% increase in expected reward relative to baselines prior to
training (p 60% increase in expected reward after
policy optimization (p = 0.011)
Learning a Behavioral Repertoire from Demonstrations
International audienceImitation Learning (IL) is a machine learning approach to learn a policy from a set of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. Despite the success of systems that use IL and RL, how such systems can adapt in-between game rounds is a neglected area of study but an important aspect of many strategy games. In this paper, we present a new approach called Behavioral Repertoire Imitation Learning (BRIL) that learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human demonstrations for the build-order planning task in StarCraft II. Dimensionality reduction is applied to construct a low-dimensional behavioral space from a high-dimensional description of the army unit composition of each human replay. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, the policy can adapt its behavior-in-between games-to reach a performance beyond that of the traditional IL baseline approach
AI and Wargaming
Recent progress in Game AI has demonstrated that given enough data from human
gameplay, or experience gained via simulations, machines can rival or surpass
the most skilled human players in classic games such as Go, or commercial
computer games such as Starcraft. We review the current state-of-the-art
through the lens of wargaming, and ask firstly what features of wargames
distinguish them from the usual AI testbeds, and secondly which recent AI
advances are best suited to address these wargame-specific features
TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game
StarCraft, one of the most difficult esport games with long-standing history
of professional tournaments, has attracted generations of players and fans, and
also, intense attentions in artificial intelligence research. Recently,
Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II.
In this paper, we introduce a new AI agent, named TStarBot-X, that is trained
under limited computation resources and can play competitively with expert
human players. TStarBot-X takes advantage of important techniques introduced in
AlphaStar, and also benefits from substantial innovations including new league
training methods, novel multi-agent roles, rule-guided policy search,
lightweight neural network architecture, and importance sampling in imitation
learning, etc. We show that with limited computation resources, a faithful
reimplementation of AlphaStar can not succeed and the proposed techniques are
necessary to ensure TStarBot-X's competitive performance. We reveal all
technical details that are complementary to those mentioned in AlphaStar,
showing the most sensitive parts in league training, reinforcement learning and
imitation learning that affect the performance of the agents. Most importantly,
this is an open-sourced study that all codes and resources (including the
trained model parameters) are publicly accessible via
https://github.com/tencent-ailab/tleague_projpage We expect this study could be
beneficial for both academic and industrial future research in solving complex
problems like StarCraft, and also, might provide a sparring partner for all
StarCraft II players and other AI agents.Comment: 26 page
Evolving Effective Micro Behaviors for Real-Time Strategy Games
Real-Time Strategy games have become a new frontier of artificial intelligence research. Advances in real-time strategy game AI, like with chess and checkers before, will significantly advance the state of the art in AI research. This thesis aims to investigate using heuristic search algorithms to generate effective micro behaviors in combat scenarios for real-time strategy games. Macro and micro management are two key aspects of real-time strategy games. While good macro helps a player collect more resources and build more units, good micro helps a player win skirmishes against equal numbers of opponent units or win even when outnumbered. In this research, we use influence maps and potential fields as a basis representation to evolve micro behaviors. We first compare genetic algorithms against two types of hill climbers for generating competitive unit micro management. Second, we investigated the use of case-injected genetic algorithms to quickly and reliably generate high quality micro behaviors. Then we compactly encoded micro behaviors including influence maps, potential fields, and reactive control into fourteen parameters and used genetic algorithms to search for a complete micro bot, ECSLBot. We compare the performance of our ECSLBot with two state of the art bots, UAlbertaBot and Nova, on several skirmish scenarios in a popular real-time strategy game StarCraft. The results show that the ECSLBot tuned by genetic algorithms outperforms UAlbertaBot and Nova in kiting efficiency, target selection, and fleeing. In addition, the same approach works to create competitive micro behaviors in another game SeaCraft. Using parallelized genetic algorithms to evolve parameters in SeaCraft we are able to speed up the evolutionary process from twenty one hours to nine minutes. We believe this work provides evidence that genetic algorithms and our representation may be a viable approach to creating effective micro behaviors for winning skirmishes in real-time strategy games
Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems
Many tasks in practice require the collaboration of multiple agents through
reinforcement learning. In general, cooperative multiagent reinforcement
learning algorithms can be classified into two paradigms: Joint Action Learners
(JALs) and Independent Learners (ILs). In many practical applications, agents
are unable to observe other agents' actions and rewards, making JALs
inapplicable. In this work, we focus on independent learning paradigm in which
each agent makes decisions based on its local observations only. However,
learning is challenging in independent settings due to the local viewpoints of
all agents, which perceive the world as a non-stationary environment due to the
concurrently exploring teammates. In this paper, we propose a novel framework
called Independent Generative Adversarial Self-Imitation Learning (IGASIL) to
address the coordination problems in fully cooperative multiagent environments.
To the best of our knowledge, we are the first to combine self-imitation
learning with generative adversarial imitation learning (GAIL) and apply it to
cooperative multiagent systems. Besides, we put forward a Sub-Curriculum
Experience Replay mechanism to pick out the past beneficial experiences as much
as possible and accelerate the self-imitation learning process. Evaluations
conducted in the testbed of StarCraft unit micromanagement and a commonly
adopted benchmark show that our IGASIL produces state-of-the-art results and
even outperforms JALs in terms of both convergence speed and final performance.Comment: accepted as a full paper by AAMAS 201
StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning
Real-time strategy games have been an important field of game artificial
intelligence in recent years. This paper presents a reinforcement learning and
curriculum transfer learning method to control multiple units in StarCraft
micromanagement. We define an efficient state representation, which breaks down
the complexity caused by the large state space in the game environment. Then a
parameter sharing multi-agent gradientdescent Sarsa({\lambda}) (PS-MAGDS)
algorithm is proposed to train the units. The learning policy is shared among
our units to encourage cooperative behaviors. We use a neural network as a
function approximator to estimate the action-value function, and propose a
reward function to help units balance their move and attack. In addition, a
transfer learning method is used to extend our model to more difficult
scenarios, which accelerates the training process and improves the learning
performance. In small scale scenarios, our units successfully learn to combat
and defeat the built-in AI with 100% win rates. In large scale scenarios,
curriculum transfer learning method is used to progressively train a group of
units, and shows superior performance over some baseline methods in target
scenarios. With reinforcement learning and curriculum transfer learning, our
units are able to learn appropriate strategies in StarCraft micromanagement
scenarios.Comment: 12 pages, 14 figures, accepted to IEEE Transactions on Emerging
Topics in Computational Intelligenc
A Study of AI Population Dynamics with Million-agent Reinforcement Learning
We conduct an empirical study on discovering the ordered collective dynamics
obtained by a population of intelligence agents, driven by million-agent
reinforcement learning. Our intention is to put intelligent agents into a
simulated natural context and verify if the principles developed in the real
world could also be used in understanding an artificially-created intelligent
population. To achieve this, we simulate a large-scale predator-prey world,
where the laws of the world are designed by only the findings or logical
equivalence that have been discovered in nature. We endow the agents with the
intelligence based on deep reinforcement learning (DRL). In order to scale the
population size up to millions agents, a large-scale DRL training platform with
redesigned experience buffer is proposed. Our results show that the population
dynamics of AI agents, driven only by each agent's individual self-interest,
reveals an ordered pattern that is similar to the Lotka-Volterra model studied
in population biology. We further discover the emergent behaviors of collective
adaptations in studying how the agents' grouping behaviors will change with the
environmental resources. Both of the two findings could be explained by the
self-organization theory in nature.Comment: Full version of the paper presented at AAMAS 2018 (International
Conference on Autonomous Agents and Multiagent Systems
- …