82,637 research outputs found
Macro action selection with deep reinforcement learning in StarCraft
StarCraft (SC) is one of the most popular and successful Real Time Strategy
(RTS) games. In recent years, SC is also widely accepted as a challenging
testbed for AI research because of its enormous state space, partially observed
information, multi-agent collaboration, and so on. With the help of annual
AIIDE and CIG competitions, a growing number of SC bots are proposed and
continuously improved. However, a large gap remains between the top-level bot
and the professional human player. One vital reason is that current SC bots
mainly rely on predefined rules to select macro actions during their games.
These rules are not scalable and efficient enough to cope with the enormous yet
partially observed state space in the game. In this paper, we propose a deep
reinforcement learning (DRL) framework to improve the selection of macro
actions. Our framework is based on the combination of the Ape-X DQN and the
Long-Short-Term-Memory (LSTM). We use this framework to build our bot, named as
LastOrder. Our evaluation, based on training against all bots from the AIIDE
2017 StarCraft AI competition set, shows that LastOrder achieves an 83% winning
rate, outperforming 26 bots in total 28 entrants
The Art of War: Beyond Memory-one Strategies in Population Games
We define a new strategy for population games based on techniques from
machine learning and statistical inference that is essentially uninvadable and
can successfully invade (significantly more likely than a neutral mutant)
essentially all known memory-one strategies for the prisoner's dilemma and
other population games, including ALLC (always cooperate), ALLD (always
defect), tit-for-tat (TFT), win-stay-lose-shift (WSLS), and zero determinant
(ZD) strategies, including extortionate and generous strategies. We will refer
to a player using this strategy as an "information player" and the specific
implementation as . Such players use the history of play to identify
opponent's strategies and respond accordingly, and naturally learn to cooperate
with each other.Comment: 16 pages, 4 figure
Recommended from our members
Infinite majesty : disabled and athletic métis in David Foster Wallace’s tennis writing
As John Jeremiah Sullivan remarks in his introduction to String Theory, a collection of David Foster Wallace’s essays on tennis, tennis “may be [Wallace’s] most consistent theme at the surface level.” As once an elite junior professional himself, Wallace reflects on and writes from his own involvement in the sport, with the conditioning, strategy, and body-mind training that goes into it. In other essays of String Theory, Wallace reaches beyond his personal playing experience, observes professional tennis players with their incredible grace, and creates his own tennis playing students in Infinite Jest. Throughout these fictional and nonfictional accounts, he conceptualizes what such eminent athleticism entails. This paper will show that celebrated athleticism in Wallace’s work exhibits an embodimental métis, or an acute, crafty body-mind knowledge of its movement through space. Beyond only characterizing athletic movement, however, this paper argues that the same concept of métis extends to people with disabilities, including characters with disabilities in Infinite Jest. The same hyperawareness of corporeality, versatile methods of adjusting to oppositional contexts, and extraordinary complexity are shared by both groups. Using rhetorical scholarship on métis and disability theories of embodiment and social representation, this paper will draw parallels between the moving body-minds of athletic and disabled bodies and trace the implications of this analogy for Wallace’s work and disability studies.Englis
Learning with Opponent-Learning Awareness
Multi-agent settings are quickly gathering importance in machine learning.
This includes a plethora of recent work on deep multi-agent reinforcement
learning, but also can be extended to hierarchical RL, generative adversarial
networks and decentralised optimisation. In all these settings the presence of
multiple learning agents renders the training problem non-stationary and often
leads to unstable training or undesired final results. We present Learning with
Opponent-Learning Awareness (LOLA), a method in which each agent shapes the
anticipated learning of the other agents in the environment. The LOLA learning
rule includes a term that accounts for the impact of one agent's policy on the
anticipated parameter update of the other agents. Results show that the
encounter of two LOLA agents leads to the emergence of tit-for-tat and
therefore cooperation in the iterated prisoners' dilemma, while independent
learning does not. In this domain, LOLA also receives higher payouts compared
to a naive learner, and is robust against exploitation by higher order
gradient-based methods. Applied to repeated matching pennies, LOLA agents
converge to the Nash equilibrium. In a round robin tournament we show that LOLA
agents successfully shape the learning of a range of multi-agent learning
algorithms from literature, resulting in the highest average returns on the
IPD. We also show that the LOLA update rule can be efficiently calculated using
an extension of the policy gradient estimator, making the method suitable for
model-free RL. The method thus scales to large parameter and input spaces and
nonlinear function approximators. We apply LOLA to a grid world task with an
embedded social dilemma using recurrent policies and opponent modelling. By
explicitly considering the learning of the other agent, LOLA agents learn to
cooperate out of self-interest. The code is at github.com/alshedivat/lola
Using a theory of mind to find best responses to memory-one strategies
Memory-one strategies are a set of Iterated Prisoner's Dilemma strategies
that have been praised for their mathematical tractability and performance
against single opponents. This manuscript investigates best response memory-one
strategies with a theory of mind for their opponents. The results add to the
literature that has shown that extortionate play is not always optimal by
showing that optimal play is often not extortionate. They also provide evidence
that memory-one strategies suffer from their limited memory in multi agent
interactions and can be out performed by optimised strategies with longer
memory. We have developed a theory that has allowed to explore the entire space
of memory-one strategies. The framework presented is suitable to study
memory-one strategies in the Prisoner's Dilemma, but also in evolutionary
processes such as the Moran process, Furthermore, results on the stability of
defection in populations of memory-one strategies are also obtained
ViZDoom Competitions: Playing Doom from Pixels
This paper presents the first two editions of Visual Doom AI Competition,
held in 2016 and 2017. The challenge was to create bots that compete in a
multi-player deathmatch in a first-person shooter (FPS) game, Doom. The bots
had to make their decisions based solely on visual information, i.e., a raw
screen buffer. To play well, the bots needed to understand their surroundings,
navigate, explore, and handle the opponents at the same time. These aspects,
together with the competitive multi-agent aspect of the game, make the
competition a unique platform for evaluating the state of the art reinforcement
learning algorithms. The paper discusses the rules, solutions, results, and
statistics that give insight into the agents' behaviors. Best-performing agents
are described in more detail. The results of the competition lead to the
conclusion that, although reinforcement learning can produce capable Doom bots,
they still are not yet able to successfully compete against humans in this
game. The paper also revisits the ViZDoom environment, which is a flexible,
easy to use, and efficient 3D platform for research for vision-based
reinforcement learning, based on a well-recognized first-person perspective
game Doom
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
- …