9,611 research outputs found
On the Utility of Learning about Humans for Human-AI Coordination
While we would like agents that can coordinate with humans, current
algorithms such as self-play and population-based training create agents that
can coordinate with themselves. Agents that assume their partner to be optimal
or similar to them can converge to coordination protocols that fail to
understand and be understood by humans. To demonstrate this, we introduce a
simple environment that requires challenging coordination, based on the popular
game Overcooked, and learn a simple model that mimics human play. We evaluate
the performance of agents trained via self-play and population-based training.
These agents perform very well when paired with themselves, but when paired
with our human model, they are significantly worse than agents designed to play
with the human model. An experiment with a planning algorithm yields the same
conclusion, though only when the human-aware planner is given the exact human
model that it is playing with. A user study with real humans shows this pattern
as well, though less strongly. Qualitatively, we find that the gains come from
having the agent adapt to the human's gameplay. Given this result, we suggest
several approaches for designing agents that learn about humans in order to
better coordinate with them. Code is available at
https://github.com/HumanCompatibleAI/overcooked_ai.Comment: Published at NeurIPS 2019
(http://papers.nips.cc/paper/8760-on-the-utility-of-learning-about-humans-for-human-ai-coordination
Heterogeneous Multi-agent Zero-Shot Coordination by Coevolution
Generating agents that can achieve zero-shot coordination (ZSC) with unseen
partners is a new challenge in cooperative multi-agent reinforcement learning
(MARL). Recently, some studies have made progress in ZSC by exposing the agents
to diverse partners during the training process. They usually involve self-play
when training the partners, implicitly assuming that the tasks are homogeneous.
However, many real-world tasks are heterogeneous, and hence previous methods
may be inefficient. In this paper, we study the heterogeneous ZSC problem for
the first time and propose a general method based on coevolution, which
coevolves two populations of agents and partners through three sub-processes:
pairing, updating and selection. Experimental results on various heterogeneous
tasks highlight the necessity of considering the heterogeneous setting and
demonstrate that our proposed method is a promising solution for heterogeneous
ZSC tasks
Generating and Adapting to Diverse Ad-Hoc Cooperation Agents in Hanabi
Hanabi is a cooperative game that brings the problem of modeling other
players to the forefront. In this game, coordinated groups of players can
leverage pre-established conventions to great effect, but playing in an ad-hoc
setting requires agents to adapt to its partner's strategies with no previous
coordination. Evaluating an agent in this setting requires a diverse population
of potential partners, but so far, the behavioral diversity of agents has not
been considered in a systematic way. This paper proposes Quality Diversity
algorithms as a promising class of algorithms to generate diverse populations
for this purpose, and generates a population of diverse Hanabi agents using
MAP-Elites. We also postulate that agents can benefit from a diverse population
during training and implement a simple "meta-strategy" for adapting to an
agent's perceived behavioral niche. We show this meta-strategy can work better
than generalist strategies even outside the population it was trained with if
its partner's behavioral niche can be correctly inferred, but in practice a
partner's behavior depends and interferes with the meta-agent's own behavior,
suggesting an avenue for future research in characterizing another agent's
behavior during gameplay.Comment: arXiv admin note: text overlap with arXiv:1907.0384
K-level Reasoning for Zero-Shot Coordination in Hanabi
The standard problem setting in cooperative multi-agent settings is self-play
(SP), where the goal is to train a team of agents that works well together.
However, optimal SP policies commonly contain arbitrary conventions
("handshakes") and are not compatible with other, independently trained agents
or humans. This latter desiderata was recently formalized by Hu et al. 2020 as
the zero-shot coordination (ZSC) setting and partially addressed with their
Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance
in the card game Hanabi. OP assumes access to the symmetries of the environment
and prevents agents from breaking these in a mutually incompatible way during
training. However, as the authors point out, discovering symmetries for a given
environment is a computationally hard problem. Instead, we show that through a
simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006,
synchronously training all levels, we can obtain competitive ZSC and ad-hoc
teamplay performance in Hanabi, including when paired with a human-like proxy
bot. We also introduce a new method, synchronous-k-level reasoning with a best
response (SyKLRBR), which further improves performance on our synchronous KLR
by co-training a best response.Comment: Neurips 2021. 15 pages. 2 figure
PECAN: Leveraging Policy Ensemble for Context-Aware Zero-Shot Human-AI Coordination
Zero-shot human-AI coordination holds the promise of collaborating with humans without human data. Prevailing methods try to train the ego agent with a population of partners via self-play. However, these methods suffer from two problems: 1) The diversity of a population with finite partners is limited, thereby limiting the capacity of the trained ego agent to collaborate with a novel human; 2) Current methods only provide a common best response for every partner in the population, which may result in poor zero-shot coordination performance with a novel partner or humans. To address these issues, we first propose the policy ensemble method to increase the diversity of partners in the population, and then develop a context-aware method enabling the ego agent to analyze and identify the partner's potential policy primitives so that it can take different actions accordingly. In this way, the ego agent is able to learn more universal cooperative behaviors for collaborating with diverse partners. We conduct experiments on the Overcooked environment, and evaluate the zero-shot human-AI coordination performance of our method with both behavior-cloned human proxies and real humans. The results demonstrate that our method significantly increases the diversity of partners and enables ego agents to learn more diverse behaviors than baselines, thus achieving state-of-the-art performance in all scenarios. We also open-source a human-AI coordination study framework on the Overcooked for the convenience of future studies. Codes and demo videos are available at https://sites.google.com/view/pecan-overcooked
Multiparty Dynamics and Failure Modes for Machine Learning and Artificial Intelligence
An important challenge for safety in machine learning and artificial
intelligence systems is a~set of related failures involving specification
gaming, reward hacking, fragility to distributional shifts, and Goodhart's or
Campbell's law. This paper presents additional failure modes for interactions
within multi-agent systems that are closely related. These multi-agent failure
modes are more complex, more problematic, and less well understood than the
single-agent case, and are also already occurring, largely unnoticed. After
motivating the discussion with examples from poker-playing artificial
intelligence (AI), the paper explains why these failure modes are in some
senses unavoidable. Following this, the paper categorizes failure modes,
provides definitions, and cites examples for each of the modes: accidental
steering, coordination failures, adversarial misalignment, input spoofing and
filtering, and goal co-option or direct hacking. The paper then discusses how
extant literature on multi-agent AI fails to address these failure modes, and
identifies work which may be useful for the mitigation of these failure modes.Comment: 12 Pages, This version re-submitted to Big Data and Cognitive
Computing, Special Issue "Artificial Superintelligence: Coordination &
Strategy
- …