Search CORE

82,637 research outputs found

Macro action selection with deep reinforcement learning in StarCraft

Author: Hu Renjie
Kuang Hongyu
Liu Yang
Sun Huyang
Xu Sijia
Zhuang Zhi
Publication venue
Publication date: 08/10/2019
Field of study

StarCraft (SC) is one of the most popular and successful Real Time Strategy (RTS) games. In recent years, SC is also widely accepted as a challenging testbed for AI research because of its enormous state space, partially observed information, multi-agent collaboration, and so on. With the help of annual AIIDE and CIG competitions, a growing number of SC bots are proposed and continuously improved. However, a large gap remains between the top-level bot and the professional human player. One vital reason is that current SC bots mainly rely on predefined rules to select macro actions during their games. These rules are not scalable and efficient enough to cope with the enormous yet partially observed state space in the game. In this paper, we propose a deep reinforcement learning (DRL) framework to improve the selection of macro actions. Our framework is based on the combination of the Ape-X DQN and the Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as LastOrder. Our evaluation, based on training against all bots from the AIIDE 2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning rate, outperforming 26 bots in total 28 entrants

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

The Art of War: Beyond Memory-one Strategies in Population Games

Author: Fryer Dashiell
Harper Marc
Lee Christopher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/05/2014
Field of study

We define a new strategy for population games based on techniques from machine learning and statistical inference that is essentially uninvadable and can successfully invade (significantly more likely than a neutral mutant) essentially all known memory-one strategies for the prisoner's dilemma and other population games, including ALLC (always cooperate), ALLD (always defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant (ZD) strategies, including extortionate and generous strategies. We will refer to a player using this strategy as an "information player" and the specific implementation as

IP_0

. Such players use the history of play to identify opponent's strategies and respond accordingly, and naturally learn to cooperate with each other.Comment: 16 pages, 4 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Recommended from our members

Infinite majesty : disabled and athletic métis in David Foster Wallace’s tennis writing

Author: Rabe Michelle Elizabeth
Publication venue
Publication date: 07/08/2018
Field of study

As John Jeremiah Sullivan remarks in his introduction to String Theory, a collection of David Foster Wallace’s essays on tennis, tennis “may be [Wallace’s] most consistent theme at the surface level.” As once an elite junior professional himself, Wallace reflects on and writes from his own involvement in the sport, with the conditioning, strategy, and body-mind training that goes into it. In other essays of String Theory, Wallace reaches beyond his personal playing experience, observes professional tennis players with their incredible grace, and creates his own tennis playing students in Infinite Jest. Throughout these fictional and nonfictional accounts, he conceptualizes what such eminent athleticism entails. This paper will show that celebrated athleticism in Wallace’s work exhibits an embodimental métis, or an acute, crafty body-mind knowledge of its movement through space. Beyond only characterizing athletic movement, however, this paper argues that the same concept of métis extends to people with disabilities, including characters with disabilities in Infinite Jest. The same hyperawareness of corporeality, versatile methods of adjusting to oppositional contexts, and extraordinary complexity are shared by both groups. Using rhetorical scholarship on métis and disability theories of embodiment and social representation, this paper will draw parallels between the moving body-minds of athletic and disabled bodies and trace the implications of this analogy for Wallace’s work and disability studies.Englis

Texas ScholarWorks

Learning with Opponent-Learning Awareness

Author: Abbeel Pieter
Al-Shedivat Maruan
Chen Richard Y.
Foerster Jakob N.
Mordatch Igor
Whiteson Shimon
Publication venue
Publication date: 15/07/2018
Field of study

Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes a term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. The method thus scales to large parameter and input spaces and nonlinear function approximators. We apply LOLA to a grid world task with an embedded social dilemma using recurrent policies and opponent modelling. By explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest. The code is at github.com/alshedivat/lola

arXiv.org e-Print Archive

Oxford University Research Archive

Using a theory of mind to find best responses to memory-one strategies

Author: Glynatsi Nikoleta E.
Knight Vincent A.
Publication venue
Publication date: 29/09/2020
Field of study

Memory-one strategies are a set of Iterated Prisoner's Dilemma strategies that have been praised for their mathematical tractability and performance against single opponents. This manuscript investigates best response memory-one strategies with a theory of mind for their opponents. The results add to the literature that has shown that extortionate play is not always optimal by showing that optimal play is often not extortionate. They also provide evidence that memory-one strategies suffer from their limited memory in multi agent interactions and can be out performed by optimised strategies with longer memory. We have developed a theory that has allowed to explore the entire space of memory-one strategies. The framework presented is suitable to study memory-one strategies in the Prisoner's Dilemma, but also in evolutionary processes such as the Moran process, Furthermore, results on the stability of defection in populations of memory-one strategies are also obtained

arXiv.org e-Print Archive

Online Research @ Cardiff

MPG.PuRe

ViZDoom Competitions: Playing Doom from Pixels

Author: Jaśkowski Wojciech
Kempka Michał
Wydmuch Marek
Publication venue
Publication date: 10/09/2018
Field of study

This paper presents the first two editions of Visual Doom AI Competition, held in 2016 and 2017. The challenge was to create bots that compete in a multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots had to make their decisions based solely on visual information, i.e., a raw screen buffer. To play well, the bots needed to understand their surroundings, navigate, explore, and handle the opponents at the same time. These aspects, together with the competitive multi-agent aspect of the game, make the competition a unique platform for evaluating the state of the art reinforcement learning algorithms. The paper discusses the rules, solutions, results, and statistics that give insight into the agents' behaviors. Best-performing agents are described in more detail. The results of the competition lead to the conclusion that, although reinforcement learning can produce capable Doom bots, they still are not yet able to successfully compete against humans in this game. The paper also revisits the ViZDoom environment, which is a flexible, easy to use, and efficient 3D platform for research for vision-based reinforcement learning, based on a well-recognized first-person perspective game Doom

arXiv.org e-Print Archive

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

Author: Heinrich Johannes
Silver David
Publication venue
Publication date: 03/03/2016
Field of study

Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

arXiv.org e-Print Archive

UCL Discovery