Search CORE

295 research outputs found

Deep learning for video game playing

Author: Bontrager Philip
Justesen Niels
Risi Sebastian
Togelius Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

In this article, we review recent Deep Learning advances in the context of how they have been applied to play different types of video games such as first-person shooters, arcade games, and real-time strategy games. We analyze the unique requirements that different game genres pose to a deep learning system and highlight important open challenges in the context of applying these machine learning methods to video games, such as general game playing, dealing with extremely large decision spaces and sparse rewards

arXiv.org e-Print Archive

Recommended from our members

Towards Informed Exploration for Deep Reinforcement Learning

Author: Tang Haoran
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact

eScholarship - University of California

Neural-encoding Human Experts' Domain Knowledge to Warm Start Reinforcement Learning

Author: Gombolay Matthew
Silva Andrew
Publication venue
Publication date: 23/09/2020
Field of study

Deep reinforcement learning has been successful in a variety of tasks, such as game playing and robotic manipulation. However, attempting to learn \textit{tabula rasa} disregards the logical structure of many domains as well as the wealth of readily available knowledge from domain experts that could help "warm start" the learning process. We present a novel reinforcement learning technique that allows for intelligent initialization of a neural network weights and architecture. Our approach permits the encoding domain knowledge directly into a neural decision tree, and improves upon that knowledge with policy gradient updates. We empirically validate our approach on two OpenAI Gym tasks and two modified StarCraft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures. Our knowledge-based framework finds superior policies compared to imitation learning-based and prior knowledge-based approaches. Importantly, we demonstrate that our approach can be used by untrained humans to initially provide >80% increase in expected reward relative to baselines prior to training (p 60% increase in expected reward after policy optimization (p = 0.011)

arXiv.org e-Print Archive

Learning a Behavioral Repertoire from Demonstrations

Author: Cabarcas Daniel
González-Duque Miguel
Justesen Niels
Mouret Jean-Baptiste
Risi Sebastian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/07/2019
Field of study

International audienceImitation Learning (IL) is a machine learning approach to learn a policy from a set of demonstrations. IL can be useful to kick-start learning before applying reinforcement learning (RL) but it can also be useful on its own, e.g. to learn to imitate human players in video games. Despite the success of systems that use IL and RL, how such systems can adapt in-between game rounds is a neglected area of study but an important aspect of many strategy games. In this paper, we present a new approach called Behavioral Repertoire Imitation Learning (BRIL) that learns a repertoire of behaviors from a set of demonstrations by augmenting the state-action pairs with behavioral descriptions. The outcome of this approach is a single neural network policy conditioned on a behavior description that can be precisely modulated. We apply this approach to train a policy on 7,777 human demonstrations for the build-order planning task in StarCraft II. Dimensionality reduction is applied to construct a low-dimensional behavioral space from a high-dimensional description of the army unit composition of each human replay. The results demonstrate that the learned policy can be effectively manipulated to express distinct behaviors. Additionally, by applying the UCB1 algorithm, the policy can adapt its behavior-in-between games-to reach a performance beyond that of the traditional IL baseline approach

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

AI and Wargaming

Author: Goodman James
Lucas Simon
Risi Sebastian
Publication venue
Publication date: 25/09/2020
Field of study

Recent progress in Game AI has demonstrated that given enough data from human gameplay, or experience gained via simulations, machines can rival or surpass the most skilled human players in classic games such as Go, or commercial computer games such as Starcraft. We review the current state-of-the-art through the lens of wargaming, and ask firstly what features of wargames distinguish them from the usual AI testbeds, and secondly which recent AI advances are best suited to address these wargame-specific features

arXiv.org e-Print Archive

TStarBot-X: An Open-Sourced and Comprehensive Study for Efficient League Training in StarCraft II Full Game

Author: Chen Qiaobo
Fang Meng
Guo Qingwei
Han Lei
Shi Tengfei
Sun Peng
Sun Xinghai
Xiong Jiechao
Yu Hongsheng
Zhang Zhengyou
Publication venue
Publication date: 27/11/2020
Field of study

StarCraft, one of the most difficult esport games with long-standing history of professional tournaments, has attracted generations of players and fans, and also, intense attentions in artificial intelligence research. Recently, Google's DeepMind announced AlphaStar, a grandmaster level AI in StarCraft II. In this paper, we introduce a new AI agent, named TStarBot-X, that is trained under limited computation resources and can play competitively with expert human players. TStarBot-X takes advantage of important techniques introduced in AlphaStar, and also benefits from substantial innovations including new league training methods, novel multi-agent roles, rule-guided policy search, lightweight neural network architecture, and importance sampling in imitation learning, etc. We show that with limited computation resources, a faithful reimplementation of AlphaStar can not succeed and the proposed techniques are necessary to ensure TStarBot-X's competitive performance. We reveal all technical details that are complementary to those mentioned in AlphaStar, showing the most sensitive parts in league training, reinforcement learning and imitation learning that affect the performance of the agents. Most importantly, this is an open-sourced study that all codes and resources (including the trained model parameters) are publicly accessible via https://github.com/tencent-ailab/tleague_projpage We expect this study could be beneficial for both academic and industrial future research in solving complex problems like StarCraft, and also, might provide a sparring partner for all StarCraft II players and other AI agents.Comment: 26 page

arXiv.org e-Print Archive

Evolving Effective Micro Behaviors for Real-Time Strategy Games

Author: Liu Siming
Publication venue
Publication date: 03/11/2017
Field of study

Real-Time Strategy games have become a new frontier of artificial intelligence research. Advances in real-time strategy game AI, like with chess and checkers before, will significantly advance the state of the art in AI research. This thesis aims to investigate using heuristic search algorithms to generate effective micro behaviors in combat scenarios for real-time strategy games. Macro and micro management are two key aspects of real-time strategy games. While good macro helps a player collect more resources and build more units, good micro helps a player win skirmishes against equal numbers of opponent units or win even when outnumbered. In this research, we use influence maps and potential fields as a basis representation to evolve micro behaviors. We first compare genetic algorithms against two types of hill climbers for generating competitive unit micro management. Second, we investigated the use of case-injected genetic algorithms to quickly and reliably generate high quality micro behaviors. Then we compactly encoded micro behaviors including influence maps, potential fields, and reactive control into fourteen parameters and used genetic algorithms to search for a complete micro bot, ECSLBot. We compare the performance of our ECSLBot with two state of the art bots, UAlbertaBot and Nova, on several skirmish scenarios in a popular real-time strategy game StarCraft. The results show that the ECSLBot tuned by genetic algorithms outperforms UAlbertaBot and Nova in kiting efficiency, target selection, and fleeing. In addition, the same approach works to create competitive micro behaviors in another game SeaCraft. Using parallelized genetic algorithms to evolve parameters in SeaCraft we are able to speed up the evolutionary process from twenty one hours to nine minutes. We believe this work provides evidence that genetic algorithms and our representation may be a viable approach to creating effective micro behaviors for winning skirmishes in real-time strategy games

Independent Generative Adversarial Self-Imitation Learning in Cooperative Multiagent Systems

Author: Hao Jianye
Hao Xiaotian
Wang Weixun
Yang Yaodong
Publication venue
Publication date: 25/09/2019
Field of study

Many tasks in practice require the collaboration of multiple agents through reinforcement learning. In general, cooperative multiagent reinforcement learning algorithms can be classified into two paradigms: Joint Action Learners (JALs) and Independent Learners (ILs). In many practical applications, agents are unable to observe other agents' actions and rewards, making JALs inapplicable. In this work, we focus on independent learning paradigm in which each agent makes decisions based on its local observations only. However, learning is challenging in independent settings due to the local viewpoints of all agents, which perceive the world as a non-stationary environment due to the concurrently exploring teammates. In this paper, we propose a novel framework called Independent Generative Adversarial Self-Imitation Learning (IGASIL) to address the coordination problems in fully cooperative multiagent environments. To the best of our knowledge, we are the first to combine self-imitation learning with generative adversarial imitation learning (GAIL) and apply it to cooperative multiagent systems. Besides, we put forward a Sub-Curriculum Experience Replay mechanism to pick out the past beneficial experiences as much as possible and accelerate the self-imitation learning process. Evaluations conducted in the testbed of StarCraft unit micromanagement and a commonly adopted benchmark show that our IGASIL produces state-of-the-art results and even outperforms JALs in terms of both convergence speed and final performance.Comment: accepted as a full paper by AAMAS 201

arXiv.org e-Print Archive

StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning

Author: Shao Kun
Zhao Dongbin
Zhu Yuanheng
Publication venue
Publication date: 02/04/2018
Field of study

Real-time strategy games have been an important field of game artificial intelligence in recent years. This paper presents a reinforcement learning and curriculum transfer learning method to control multiple units in StarCraft micromanagement. We define an efficient state representation, which breaks down the complexity caused by the large state space in the game environment. Then a parameter sharing multi-agent gradientdescent Sarsa({\lambda}) (PS-MAGDS) algorithm is proposed to train the units. The learning policy is shared among our units to encourage cooperative behaviors. We use a neural network as a function approximator to estimate the action-value function, and propose a reward function to help units balance their move and attack. In addition, a transfer learning method is used to extend our model to more difficult scenarios, which accelerates the training process and improves the learning performance. In small scale scenarios, our units successfully learn to combat and defeat the built-in AI with 100% win rates. In large scale scenarios, curriculum transfer learning method is used to progressively train a group of units, and shows superior performance over some baseline methods in target scenarios. With reinforcement learning and curriculum transfer learning, our units are able to learn appropriate strategies in StarCraft micromanagement scenarios.Comment: 12 pages, 14 figures, accepted to IEEE Transactions on Emerging Topics in Computational Intelligenc

arXiv.org e-Print Archive

A Study of AI Population Dynamics with Million-agent Reinforcement Learning

Author: Bai Yiwei
Wang Jun
Wen Ying
Yang Yaodong
Yu Lantao
Yu Yong
Zhang Weinan
Publication venue
Publication date: 14/05/2018
Field of study

We conduct an empirical study on discovering the ordered collective dynamics obtained by a population of intelligence agents, driven by million-agent reinforcement learning. Our intention is to put intelligent agents into a simulated natural context and verify if the principles developed in the real world could also be used in understanding an artificially-created intelligent population. To achieve this, we simulate a large-scale predator-prey world, where the laws of the world are designed by only the findings or logical equivalence that have been discovered in nature. We endow the agents with the intelligence based on deep reinforcement learning (DRL). In order to scale the population size up to millions agents, a large-scale DRL training platform with redesigned experience buffer is proposed. Our results show that the population dynamics of AI agents, driven only by each agent's individual self-interest, reveals an ordered pattern that is similar to the Lotka-Volterra model studied in population biology. We further discover the emergent behaviors of collective adaptations in studying how the agents' grouping behaviors will change with the environmental resources. Both of the two findings could be explained by the self-organization theory in nature.Comment: Full version of the paper presented at AAMAS 2018 (International Conference on Autonomous Agents and Multiagent Systems

arXiv.org e-Print Archive