266 research outputs found
Learning Existing Social Conventions via Observationally Augmented Self-Play
In order for artificial agents to coordinate effectively with people, they
must act consistently with existing conventions (e.g. how to navigate in
traffic, which language to speak, or how to coordinate with teammates). A
group's conventions can be viewed as a choice of equilibrium in a coordination
game. We consider the problem of an agent learning a policy for a coordination
game in a simulated environment and then using this policy when it enters an
existing group. When there are multiple possible conventions we show that
learning a policy via multi-agent reinforcement learning (MARL) is likely to
find policies which achieve high payoffs at training time but fail to
coordinate with the real group into which the agent enters. We assume access to
a small number of samples of behavior from the true convention and show that we
can augment the MARL objective to help it find policies consistent with the
real group's convention. In three environments from the literature - traffic,
communication, and team coordination - we observe that augmenting MARL with a
small amount of imitation learning greatly increases the probability that the
strategy found by MARL fits well with the existing social convention. We show
that this works even in an environment where standard training methods very
rarely find the true convention of the agent's partners.Comment: Published in AAAI-AIES2019 - Best Pape
Symmetry-Breaking Augmentations for Ad Hoc Teamwork
In many collaborative settings, artificial intelligence (AI) agents must be
able to adapt to new teammates that use unknown or previously unobserved
strategies. While often simple for humans, this can be challenging for AI
agents. For example, if an AI agent learns to drive alongside others (a
training set) that only drive on one side of the road, it may struggle to adapt
this experience to coordinate with drivers on the opposite side, even if their
behaviours are simply flipped along the left-right symmetry. To address this we
introduce symmetry-breaking augmentations (SBA), which increases diversity in
the behaviour of training teammates by applying a symmetry-flipping operation.
By learning a best-response to the augmented set of teammates, our agent is
exposed to a wider range of behavioural conventions, improving performance when
deployed with novel teammates. We demonstrate this experimentally in two
settings, and show that our approach improves upon previous ad hoc teamwork
results in the challenging card game Hanabi. We also propose a general metric
for estimating symmetry-dependency amongst a given set of policies.Comment: Currently in review for ICML 2024. 16 pages (including references and
appendix), 9 Figures, 11 table
Towards Few-shot Coordination: Revisiting Ad-hoc Teamplay Challenge In the Game of Hanabi
Cooperative Multi-agent Reinforcement Learning (MARL) algorithms with
Zero-Shot Coordination (ZSC) have gained significant attention in recent years.
ZSC refers to the ability of agents to coordinate zero-shot (without additional
interaction experience) with independently trained agents. While ZSC is crucial
for cooperative MARL agents, it might not be possible for complex tasks and
changing environments. Agents also need to adapt and improve their performance
with minimal interaction with other agents. In this work, we show empirically
that state-of-the-art ZSC algorithms have poor performance when paired with
agents trained with different learning methods, and they require millions of
interaction samples to adapt to these new partners. To investigate this issue,
we formally defined a framework based on a popular cooperative multi-agent game
called Hanabi to evaluate the adaptability of MARL methods. In particular, we
created a diverse set of pre-trained agents and defined a new metric called
adaptation regret that measures the agent's ability to efficiently adapt and
improve its coordination performance when paired with some held-out pool of
partners on top of its ZSC performance. After evaluating several SOTA
algorithms using our framework, our experiments reveal that naive Independent
Q-Learning (IQL) agents in most cases adapt as quickly as the SOTA ZSC
algorithm Off-Belief Learning (OBL). This finding raises an interesting
research question: How to design MARL algorithms with high ZSC performance and
capability of fast adaptation to unseen partners. As a first step, we studied
the role of different hyper-parameters and design choices on the adaptability
of current MARL algorithms. Our experiments show that two categories of
hyper-parameters controlling the training data diversity and optimization
process have a significant impact on the adaptability of Hanabi agents
RE:Learning
is project aims to visualize possible future scenarios for higher education learning and how it will be transformed through ubiquitous computing. The project draws on the theories of learning, a brief history of higher education, elements of ubiquitous computing and current trends in education, to build a foundation
for possible learning changes. The project generated three scenarios that depicted parameters from a
morphological analysis. These scenarios take readers to 2035 and give them a creative view of alternative learning landscapes. Three Fictional personas are introduced who live within each scenario. Readers are then exposed to possible curricula that encapsulate the changes and the ubiquity of computing in learning and
higher education. The aim is to to view learning as a lifelong experience and the currency by which we survive
- …