8,450 research outputs found
Making friends on the fly : advances in ad hoc teamwork
textGiven the continuing improvements in design and manufacturing processes in addition to improvements in artificial intelligence, robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, it is expected that they will encounter and interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that any standards will lag behind the state-of-the-art protocols and robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. We argue that agents that effectively reason about ad hoc teamwork need to exhibit three capabilities: 1) robustness to teammate variety, 2) robustness to diverse tasks, and 3) fast adaptation. This thesis focuses on addressing all three of these challenges. In particular, this thesis introduces algorithms for quickly adapting to unknown teammates that enable agents to react to new teammates without extensive observations.
The majority of existing multiagent algorithms focus on scenarios where all agents share coordination and communication protocols. While previous research on ad hoc teamwork considers some of these three challenges, this thesis introduces a new algorithm, PLASTIC, that is the first to address all three challenges in a single algorithm. PLASTIC adapts quickly to unknown teammates by reusing knowledge it learns about previous teammates and exploiting any expert knowledge available. Given this knowledge, PLASTIC selects which previous teammates are most similar to the current ones online and uses this information to adapt to their behaviors. This thesis introduces two instantiations of PLASTIC. The first is a model-based approach, PLASTIC-Model, that builds models of previous teammates' behaviors and plans online to determine the best course of action. The second uses a policy-based approach, PLASTIC-Policy, in which it learns policies for cooperating with past teammates and selects from among these policies online. Furthermore, we introduce a new transfer learning algorithm, TwoStageTransfer, that allows transferring knowledge from many past teammates while considering how similar each teammate is to the current ones.
We theoretically analyze the computational tractability of PLASTIC-Model in a number of scenarios with unknown teammates. Additionally, we empirically evaluate PLASTIC in three domains that cover a spread of possible settings. Our evaluations show that PLASTIC can learn to communicate with unknown teammates using a limited set of messages, coordinate with externally-created teammates that do not reason about ad hoc teams, and act intelligently in domains with continuous states and actions. Furthermore, these evaluations show that TwoStageTransfer outperforms existing transfer learning algorithms and enables PLASTIC to adapt even better to new teammates. We also identify three dimensions that we argue best describe ad hoc teamwork scenarios. We hypothesize that these dimensions are useful for analyzing similarities among domains and determining which can be tackled by similar algorithms in addition to identifying avenues for future research. The work presented in this thesis represents an important step towards enabling agents to adapt to unknown teammates in the real world. PLASTIC significantly broadens the robustness of robots to their teammates and allows them to quickly adapt to new teammates by reusing previously learned knowledge.Computer Science
On-line Estimators for Ad-hoc Task Allocation:Extended Abstract
It is essential for agents to work together with others to accomplish common missions without previous knowledge of the team-mates, a challenge known as ad-hoc teamwork. In these systems, an agent estimates the algorithm and parameters of others in an on-line manner, in order to decide its own actions for effective teamwork. Meanwhile, agents often must coordinate in a decentralised fashion to complete tasks that are displaced in an environment (e.g., in foraging, demining, rescue or fire control), where each member autonomously chooses which task to perform. By harnessing this knowledge, better estimation techniques would lead to better performance. Hence, we present On-line Estimators for Ad-hoc Task Allocation, a novel algorithm for team-mates' type and parameter estimation in decentralised task allocation. We ran experiments in the level-based foraging domain, where we obtain lower error in parameter and type estimation than previous approaches, and a significantly better performance in finishing all tasks
Combining a Meta-Policy and Monte-Carlo Planning for Scalable Type-Based Reasoning in Partially Observable Environments
The design of autonomous agents that can interact effectively with other
agents without prior coordination is a core problem in multi-agent systems.
Type-based reasoning methods achieve this by maintaining a belief over a set of
potential behaviours for the other agents. However, current methods are limited
in that they assume full observability of the state and actions of the other
agent or do not scale efficiently to larger problems with longer planning
horizons. Addressing these limitations, we propose Partially Observable
Type-based Meta Monte-Carlo Planning (POTMMCP) - an online Monte-Carlo Tree
Search based planning method for type-based reasoning in large partially
observable environments. POTMMCP incorporates a novel meta-policy for guiding
search and evaluating beliefs, allowing it to search more effectively to longer
horizons using less planning time. We show that our method converges to the
optimal solution in the limit and empirically demonstrate that it effectively
adapts online to diverse sets of other agents across a range of environments.
Comparisons with the state-of-the art method on problems with up to
states and observations indicate that POTMMCP is able to compute better
solutions significantly faster.Comment: 24 page
A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems
Ad hoc teamwork is the well-established research problem of designing agents
that can collaborate with new teammates without prior coordination. This survey
makes a two-fold contribution. First, it provides a structured description of
the different facets of the ad hoc teamwork problem. Second, it discusses the
progress that has been made in the field so far, and identifies the immediate
and long-term open problems that need to be addressed in the field of ad hoc
teamwork
Team behavior in interactive dynamic influence diagrams with applications to ad hoc teams
Planning for ad hoc teamwork is challenging because it involves agents
collaborating without any prior coordination or communication. The focus is on
principled methods for a single agent to cooperate with others. This motivates
investigating the ad hoc teamwork problem in the context of individual decision
making frameworks. However, individual decision making in multiagent settings
faces the task of having to reason about other agents' actions, which in turn
involves reasoning about others. An established approximation that
operationalizes this approach is to bound the infinite nesting from below by
introducing level 0 models. We show that a consequence of the finitely-nested
modeling is that we may not obtain optimal team solutions in cooperative
settings. We address this limitation by including models at level 0 whose
solutions involve learning. We demonstrate that the learning integrated into
planning in the context of interactive dynamic influence diagrams facilitates
optimal team behavior, and is applicable to ad hoc teamwork.Comment: 8 pages, Appeared in the MSDM Workshop at AAMAS 2014, Extended
Abstract version appeared at AAMAS 2014, Franc
Recommended from our members
Ad-hoc teamwork with behavior-switching agents
As autonomous AI agents proliferate in the real world, they will increasingly need to cooperate with each other to achieve complex goals without always being able to coordinate in advance. This kind of cooperation, in which agents have to learn to cooperate on the fly, is called ad hoc teamwork. Many previous works investigating this setting assumed that teammates behave according to one of many predefined types that is fixed throughout the task. This assumption of stationarity in behaviors, is a strong assumption which cannot be guaranteed in many real-world settings. In this work, we relax this assumption and investigate settings in which teammates can change their types during the course of the task. This adds complexity to the planning problem as now an agent needs to recognize that a change has occurred in addition to figuring out what is the new type of the teammate it is interacting with. In this paper, we present a novel Convolutional-Neural-Network-based Change Point Detection (CPD) algorithm for ad hoc teamwork. When evaluating our algorithm on the modified predator prey domain, we find that it outperforms existing Bayesian CPD algorithms.Electrical and Computer Engineerin
- …