10,285 research outputs found
Simple Coalitional Games with Beliefs
We introduce coalitional games with beliefs (CGBs), a natural generalization of coalitional games to environments where agents possess private beliefs regarding the capabilities (or types) of others. We put forward a model to capture such agent-type uncertainty, and study coalitional stability in this setting. Specifically, we introduce a notion of the core for CGBs, both with and without coalition structures. For simple games without coalition structures, we then provide a characterization of the core that matches the one for the full information case, and use it to derive a polynomial-time algorithm to check core nonemptiness. In contrast, we demonstrate that in games with coalition structures allowing beliefs increases the computational complexity of stability-related problems. In doing so, we introduce and analyze weighted voting games with beliefs, which may be of independent interest. Finally, we discuss connections between our model and other classes of coalitional games
TransfQMix: transformers for leveraging the graph structure of multi-agent reinforcement learning problems
Coordination is one of the most difficult aspects of multi-agent reinforcement learning (MARL). One reason is that agents normally choose their actions independently of one another. In order to see coordination strategies emerging from the combination of independent policies, the recent research has focused on the use of a centralized function (CF) that learns each agent's contribution to the team reward. However, the structure in which the environment is presented to the agents and to the CF is typically overlooked. We have observed that the features used to describe the coordination problem can be represented as vertex features of a latent graph structure. Here, we present TransfQMix, a new approach that uses transformers to leverage this latent structure and learn better coordination policies. Our transformer agents perform a graph reasoning over the state of the observable entities. Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents. TransfQMix is designed to be entirely transferable, meaning that same parameters can be used to control and train larger or smaller teams of agents. This enables to deploy promising approaches to save training time and derive general policies in MARL, such as transfer learning, zero-shot transfer, and curriculum learning. We report TransfQMix's performances in the Spread and StarCraft II environments. In both settings, it outperforms state-of-the-art Q-Learning models, and it demonstrates effectiveness in solving problems that other methods can not solve.This project has received funding from the EUâs Horizon 2020 research and innovation programme under the Marie SkĆodowska-Curie grant agreement No 893089. This work acknowledges the âSevero Ochoa Centre of Excellenceâ accreditation (CEX2019-000928-S). We gratefully acknowledge the David and Lucile Packard Foundation.Peer ReviewedPostprint (author's final draft
Scalable Multi-Agent Reinforcement Learning through Intelligent Information Aggregation
We consider the problem of multi-agent navigation and collision avoidance
when observations are limited to the local neighborhood of each agent. We
propose InforMARL, a novel architecture for multi-agent reinforcement learning
(MARL) which uses local information intelligently to compute paths for all the
agents in a decentralized manner. Specifically, InforMARL aggregates
information about the local neighborhood of agents for both the actor and the
critic using a graph neural network and can be used in conjunction with any
standard MARL algorithm. We show that (1) in training, InforMARL has better
sample efficiency and performance than baseline approaches, despite using less
information, and (2) in testing, it scales well to environments with arbitrary
numbers of agents and obstacles.Comment: 11 pages, 5 figures, 2 tables, 3 pages appendix, Code:
https://github.com/nsidn98/InforMAR
Learning Transferable Cooperative Behavior in Multi-Agent Team
While multi-agent interactions can be naturally modeled as a graph, the environment has traditionally been considered as a black box. To better utilize the inherent structure of our environment, we propose to create a shared agent-entity graph, where agents and environmental entities form vertices, and edges exist between the vertices which can communicate with each other, allowing agents to selectively attend to different parts of the environment, while also introducing invariance to the number of agents or entities present in the system as well as permutation invariance. We present stateof- the-art results on coverage, formation and line control tasks for multi-agent teams in a fully decentralized execution framework
Cooperative Games with Overlapping Coalitions
In the usual models of cooperative game theory, the outcome of a coalition
formation process is either the grand coalition or a coalition structure that
consists of disjoint coalitions. However, in many domains where coalitions are
associated with tasks, an agent may be involved in executing more than one
task, and thus may distribute his resources among several coalitions. To tackle
such scenarios, we introduce a model for cooperative games with overlapping
coalitions--or overlapping coalition formation (OCF) games. We then explore the
issue of stability in this setting. In particular, we introduce a notion of the
core, which generalizes the corresponding notion in the traditional
(non-overlapping) scenario. Then, under some quite general conditions, we
characterize the elements of the core, and show that any element of the core
maximizes the social welfare. We also introduce a concept of balancedness for
overlapping coalitional games, and use it to characterize coalition structures
that can be extended to elements of the core. Finally, we generalize the notion
of convexity to our setting, and show that under some natural assumptions
convex games have a non-empty core. Moreover, we introduce two alternative
notions of stability in OCF that allow a wider range of deviations, and explore
the relationships among the corresponding definitions of the core, as well as
the classic (non-overlapping) core and the Aubin core. We illustrate the
general properties of the three cores, and also study them from a computational
perspective, thus obtaining additional insights into their fundamental
structure
Embodied Evolution in Collective Robotics: A Review
This paper provides an overview of evolutionary robotics techniques applied
to on-line distributed evolution for robot collectives -- namely, embodied
evolution. It provides a definition of embodied evolution as well as a thorough
description of the underlying concepts and mechanisms. The paper also presents
a comprehensive summary of research published in the field since its inception
(1999-2017), providing various perspectives to identify the major trends. In
particular, we identify a shift from considering embodied evolution as a
parallel search method within small robot collectives (fewer than 10 robots) to
embodied evolution as an on-line distributed learning method for designing
collective behaviours in swarm-like collectives. The paper concludes with a
discussion of applications and open questions, providing a milestone for past
and an inspiration for future research.Comment: 23 pages, 1 figure, 1 tabl
Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning
Multi-agent settings in the real world often involve tasks with varying types
and quantities of agents and non-agent entities; however, common patterns of
behavior often emerge among these agents/entities. Our method aims to leverage
these commonalities by asking the question: ``What is the expected utility of
each agent when only considering a randomly selected sub-group of its observed
entities?'' By posing this counterfactual question, we can recognize
state-action trajectories within sub-groups of entities that we may have
encountered in another task and use what we learned in that task to inform our
prediction in the current one. We then reconstruct a prediction of the full
returns as a combination of factors considering these disjoint groups of
entities and train this ``randomly factorized" value function as an auxiliary
objective for value-based multi-agent reinforcement learning. By doing so, our
model can recognize and leverage similarities across tasks to improve learning
efficiency in a multi-task setting. Our approach, Randomized Entity-wise
Factorization for Imagined Learning (REFIL), outperforms all strong baselines
by a significant margin in challenging multi-task StarCraft micromanagement
settings.Comment: ICML 2021 Camera Read
- âŠ