1,400 research outputs found
The Hanabi Challenge: A New Frontier for AI Research
From the early days of computing, games have been important testbeds for
studying how well machines can do sophisticated decision making. In recent
years, machine learning has made dramatic advances with artificial agents
reaching superhuman performance in challenge domains like Go, Atari, and some
variants of poker. As with their predecessors of chess, checkers, and
backgammon, these game domains have driven research by providing sophisticated
yet well-defined challenges for artificial intelligence practitioners. We
continue this tradition by proposing the game of Hanabi as a new challenge
domain with novel problems that arise from its combination of purely
cooperative gameplay with two to five players and imperfect information. In
particular, we argue that Hanabi elevates reasoning about the beliefs and
intentions of other agents to the foreground. We believe developing novel
techniques for such theory of mind reasoning will not only be crucial for
success in Hanabi, but also in broader collaborative efforts, especially those
with human partners. To facilitate future research, we introduce the
open-source Hanabi Learning Environment, propose an experimental framework for
the research community to evaluate algorithmic advances, and assess the
performance of current state-of-the-art techniques.Comment: 32 pages, 5 figures, In Press (Artificial Intelligence
Can bounded and self-interested agents be teammates? Application to planning in ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents collaborating without any prior coordination or communication. The focus is on principled methods for a single agent to cooperate with others. This motivates investigating the ad hoc teamwork problem in the context of self-interested decision-making frameworks. Agents engaged in individual decision making in multiagent settings face the task of having to reason about other agents’ actions, which may in turn involve reasoning about others. An established approximation that operationalizes this approach is to bound the infinite nesting from below by introducing level 0 models. For the purposes of this study, individual, self-interested decision making in multiagent settings is modeled using interactive dynamic influence diagrams (I-DID). These are graphical models with the benefit that they naturally offer a factored representation of the problem, allowing agents to ascribe dynamic models to others and reason about them. We demonstrate that an implication of bounded, finitely-nested reasoning by a self-interested agent is that we may not obtain optimal team solutions in cooperative settings, if it is part of a team. We address this limitation by including models at level 0 whose solutions involve reinforcement learning. We show how the learning is integrated into planning in the context of I-DIDs. This facilitates optimal teammate behavior, and we demonstrate its applicability to ad hoc teamwork on several problem domains and configurations
Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents
collaborating without any prior coordination or communication. The focus is on
principled methods for a single agent to cooperate with others. This motivates
investigating the ad hoc teamwork problem in the context of individual decision
making frameworks. However, individual decision making in multiagent settings
faces the task of having to reason about other agents' actions, which in turn
involves reasoning about others. An established approximation that
operationalizes this approach is to bound the infinite nesting from below by
introducing level 0 models. We show that a consequence of the finitely-nested
modeling is that we may not obtain optimal team solutions in cooperative
settings. We address this limitation by including models at level 0 whose
solutions involve learning. We demonstrate that the learning integrated into
planning in the context of interactive dynamic influence diagrams facilitates
optimal team behavior, and is applicable to ad hoc teamwork.Comment: 8 pages, Appeared in the MSDM Workshop at AAMAS 2014, Extended
Abstract version appeared at AAMAS 2014, Franc
Making friends on the fly : advances in ad hoc teamwork
textGiven the continuing improvements in design and manufacturing processes in addition to improvements in artificial intelligence, robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, it is expected that they will encounter and interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that any standards will lag behind the state-of-the-art protocols and robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. We argue that agents that effectively reason about ad hoc teamwork need to exhibit three capabilities: 1) robustness to teammate variety, 2) robustness to diverse tasks, and 3) fast adaptation. This thesis focuses on addressing all three of these challenges. In particular, this thesis introduces algorithms for quickly adapting to unknown teammates that enable agents to react to new teammates without extensive observations.
The majority of existing multiagent algorithms focus on scenarios where all agents share coordination and communication protocols. While previous research on ad hoc teamwork considers some of these three challenges, this thesis introduces a new algorithm, PLASTIC, that is the first to address all three challenges in a single algorithm. PLASTIC adapts quickly to unknown teammates by reusing knowledge it learns about previous teammates and exploiting any expert knowledge available. Given this knowledge, PLASTIC selects which previous teammates are most similar to the current ones online and uses this information to adapt to their behaviors. This thesis introduces two instantiations of PLASTIC. The first is a model-based approach, PLASTIC-Model, that builds models of previous teammates' behaviors and plans online to determine the best course of action. The second uses a policy-based approach, PLASTIC-Policy, in which it learns policies for cooperating with past teammates and selects from among these policies online. Furthermore, we introduce a new transfer learning algorithm, TwoStageTransfer, that allows transferring knowledge from many past teammates while considering how similar each teammate is to the current ones.
We theoretically analyze the computational tractability of PLASTIC-Model in a number of scenarios with unknown teammates. Additionally, we empirically evaluate PLASTIC in three domains that cover a spread of possible settings. Our evaluations show that PLASTIC can learn to communicate with unknown teammates using a limited set of messages, coordinate with externally-created teammates that do not reason about ad hoc teams, and act intelligently in domains with continuous states and actions. Furthermore, these evaluations show that TwoStageTransfer outperforms existing transfer learning algorithms and enables PLASTIC to adapt even better to new teammates. We also identify three dimensions that we argue best describe ad hoc teamwork scenarios. We hypothesize that these dimensions are useful for analyzing similarities among domains and determining which can be tackled by similar algorithms in addition to identifying avenues for future research. The work presented in this thesis represents an important step towards enabling agents to adapt to unknown teammates in the real world. PLASTIC significantly broadens the robustness of robots to their teammates and allows them to quickly adapt to new teammates by reusing previously learned knowledge.Computer Science
Learning to Coordinate with Anyone
In open multi-agent environments, the agents may encounter unexpected
teammates. Classical multi-agent learning approaches train agents that can only
coordinate with seen teammates. Recent studies attempted to generate diverse
teammates to enhance the generalizable coordination ability, but were
restricted by pre-defined teammates. In this work, our aim is to train agents
with strong coordination ability by generating teammates that fully cover the
teammate policy space, so that agents can coordinate with any teammates. Since
the teammate policy space is too huge to be enumerated, we find only dissimilar
teammates that are incompatible with controllable agents, which highly reduces
the number of teammates that need to be trained with. However, it is hard to
determine the number of such incompatible teammates beforehand. We therefore
introduce a continual multi-agent learning process, in which the agent learns
to coordinate with different teammates until no more incompatible teammates can
be found. The above idea is implemented in the proposed Macop (Multi-agent
compatible policy learning) algorithm. We conduct experiments in 8 scenarios
from 4 environments that have distinct coordination patterns. Experiments show
that Macop generates training teammates with much lower compatibility than
previous methods. As a result, in all scenarios Macop achieves the best overall
coordination ability while never significantly worse than the baselines,
showing strong generalization ability
A General Learning Framework for Open Ad Hoc Teamwork Using Graph-based Policy Learning
Open ad hoc teamwork is the problem of training a single agent to efficiently
collaborate with an unknown group of teammates whose composition may change
over time. A variable team composition creates challenges for the agent, such
as the requirement to adapt to new team dynamics and dealing with changing
state vector sizes. These challenges are aggravated in real-world applications
in which the controlled agent only has a partial view of the environment. In
this work, we develop a class of solutions for open ad hoc teamwork under full
and partial observability. We start by developing a solution for the fully
observable case that leverages graph neural network architectures to obtain an
optimal policy based on reinforcement learning. We then extend this solution to
partially observable scenarios by proposing different methodologies that
maintain belief estimates over the latent environment states and team
composition. These belief estimates are combined with our solution for the
fully observable case to compute an agent's optimal policy under partial
observability in open ad hoc teamwork. Empirical results demonstrate that our
solution can learn efficient policies in open ad hoc teamwork in fully and
partially observable cases. Further analysis demonstrates that our methods'
success is a result of effectively learning the effects of teammates' actions
while also inferring the inherent state of the environment under partial
observability
- …