713 research outputs found
Learning to Coordinate Efficiently: A Model-based Approach
In common-interest stochastic games all players receive an identical payoff.
Players participating in such games must learn to coordinate with each other in
order to receive the highest-possible value. A number of reinforcement learning
algorithms have been proposed for this problem, and some have been shown to
converge to good solutions in the limit. In this paper we show that using very
simple model-based algorithms, much better (i.e., polynomial) convergence rates
can be attained. Moreover, our model-based algorithms are guaranteed to
converge to the optimal value, unlike many of the existing algorithms
Learning to coordinate in a complex and non-stationary world
We study analytically and by computer simulations a complex system of
adaptive agents with finite memory. Borrowing the framework of the Minority
Game and using the replica formalism we show the existence of an equilibrium
phase transition as a function of the ratio between the memory and
the learning rates of the agents. We show that, starting from a random
configuration, a dynamic phase transition also exists, which prevents the
system from reaching any Nash equilibria. Furthermore, in a non-stationary
environment, we show by numerical simulations that agents with infinite memory
play worst than others with less memory and that the dynamic transition
naturally arises independently from the initial conditions.Comment: 4 pages, 3 figure
Learning to Coordinate with Anyone
In open multi-agent environments, the agents may encounter unexpected
teammates. Classical multi-agent learning approaches train agents that can only
coordinate with seen teammates. Recent studies attempted to generate diverse
teammates to enhance the generalizable coordination ability, but were
restricted by pre-defined teammates. In this work, our aim is to train agents
with strong coordination ability by generating teammates that fully cover the
teammate policy space, so that agents can coordinate with any teammates. Since
the teammate policy space is too huge to be enumerated, we find only dissimilar
teammates that are incompatible with controllable agents, which highly reduces
the number of teammates that need to be trained with. However, it is hard to
determine the number of such incompatible teammates beforehand. We therefore
introduce a continual multi-agent learning process, in which the agent learns
to coordinate with different teammates until no more incompatible teammates can
be found. The above idea is implemented in the proposed Macop (Multi-agent
compatible policy learning) algorithm. We conduct experiments in 8 scenarios
from 4 environments that have distinct coordination patterns. Experiments show
that Macop generates training teammates with much lower compatibility than
previous methods. As a result, in all scenarios Macop achieves the best overall
coordination ability while never significantly worse than the baselines,
showing strong generalization ability
Stabilize to Act: Learning to Coordinate for Bimanual Manipulation
Key to rich, dexterous manipulation in the real world is the ability to
coordinate control across two hands. However, while the promise afforded by
bimanual robotic systems is immense, constructing control policies for dual arm
autonomous systems brings inherent difficulties. One such difficulty is the
high-dimensionality of the bimanual action space, which adds complexity to both
model-based and data-driven methods. We counteract this challenge by drawing
inspiration from humans to propose a novel role assignment framework: a
stabilizing arm holds an object in place to simplify the environment while an
acting arm executes the task. We instantiate this framework with BimanUal
Dexterity from Stabilization (BUDS), which uses a learned restabilizing
classifier to alternate between updating a learned stabilization position to
keep the environment unchanged, and accomplishing the task with an acting
policy learned from demonstrations. We evaluate BUDS on four bimanual tasks of
varying complexities on real-world robots, such as zipping jackets and cutting
vegetables. Given only 20 demonstrations, BUDS achieves 76.9% task success
across our task suite, and generalizes to out-of-distribution objects within a
class with a 52.7% success rate. BUDS is 56.0% more successful than an
unstructured baseline that instead learns a BC stabilizing policy due to the
precision required of these complex tasks. Supplementary material and videos
can be found at https://sites.google.com/view/stabilizetoact .Comment: Conference on Robot Learning, 202
Recommended from our members
Learning to coordinate in sparse asymmetric multiagent systems
Multiagent learning offers a rich framework to address challenging real-world problems such as remote exploration and healthcare coordination, which require autonomous agents to express elaborate interactions. To be effective in such systems, agents must collectively reason about and pursue high-level, long-term, and possibly nebulous objectives while adapting their strategy to changing environments, inter-agent relationships, and team dynamics.
This work introduces six contributions that address this multifaceted problem through the lens of two distinct perspectives: reward structures for high-level objectives that allow agents to consider behaviors before pursuing them, and diversity structures that incentivize asymmetric agents (agents with distinct capabilities and egocentric objectives) to discover complementary specializations required for robust teamwork. The first contribution, Asymmetric D++, distills sparse team feedback into dense informative rewards by encouraging agents to create asymmetric counterfactuals based on their likelihood to cooperate. The second contribution introduces an uncertainty-aware reward approximation that enables the application of Asymmetric D++ for exploration and learning in sparse reward settings. The third contribution, Behavior Refinement, presents a hierarchical framework that shifts focus from optimizing a single behavior to learning a repertoire of diverse behaviors required to complete variegated tasks. Behavior Refinement allows systematic exploration of the policy space via a combination of diversity search and team-objective maximization. The fourth contribution introduces the Island Model, a computational framework that builds on Behavior Refinement for informed behavior space exploration and team balancing for asymmetric agents. The final two contributions, expand upon the Island Model to develop an asynchronous learning framework that allows asymmetric agents to explore diverse environment-agnostic inter-agent relationships to balance multiple potentially conflicting objectives.
The amalgamation of this work facilitates asymmetric agents to learn diverse specializations, express complex trade-offs, and discover robust inter-agent relationships required to solve challenging coordination problems. Additionally, the techniques introduced in this work aid in investigating the rich tapestry of agent synergies that evolve in response to changes in the environment and team objectives.Keywords: Multiagent Reinforcement Learning, Multiagent Coordination, Asymmetric Multiagent Systems, Multiagent Evolutionary Learnin
Communicative Bottlenecks Lead to Maximal Information Transfer
This paper presents new analytic and numerical analysis of signalling games that give rise to informational bottlenecks—that is to say, signalling games with more state/act pairs than available signals to communicate information about the world. I show via simulation that agents learning to coordinate tend to favour partitions of nature which provide maximal information transfer. This is true despite the fact that nothing from an initial analysis of the stability properties of the underlying signalling game suggests that this should be the case. As a first pass to explain this, I note that the underlying structure of our model favours maximal information transfer in regard to the simple combinatorial properties of how the agents might partition nature into kinds. However, I suggest that this does not perfectly capture the empirical results; thus, several open questions remain
Measuring collaborative emergent behavior in multi-agent reinforcement learning
Multi-agent reinforcement learning (RL) has important implications for the
future of human-agent teaming. We show that improved performance with
multi-agent RL is not a guarantee of the collaborative behavior thought to be
important for solving multi-agent tasks. To address this, we present a novel
approach for quantitatively assessing collaboration in continuous spatial tasks
with multi-agent RL. Such a metric is useful for measuring collaboration
between computational agents and may serve as a training signal for
collaboration in future RL paradigms involving humans.Comment: 1st International Conference on Human Systems Engineering and Design,
6 pages, 2 figures, 1 tabl
- …