22,063 research outputs found
Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems.
International audienceIn the framework of fully cooperative multi-agent systems, independent (non-communicative) agents that learn by reinforcement must overcome several difficulties to manage to coordinate. This paper identifies several challenges responsible for the non-coordination of independent agents: Pareto-selection, nonstationarity, stochasticity, alter-exploration and shadowed equilibria. A selection of multi-agent domains is classified according to those challenges: matrix games, Boutilier's coordination game, predators pursuit domains and a special multi-state game. Moreover the performance of a range of algorithms for independent reinforcement learners is evaluated empirically. Those algorithms are Q-learning variants: decentralized Q-learning, distributed Q-learning, hysteretic Q-learning, recursive FMQ and WoLF PHC. An overview of the learning algorithms' strengths and weaknesses against each challenge concludes the paper and can serve as a basis for choosing the appropriate algorithm for a new domain. Furthermore, the distilled challenges may assist in the design of new learning algorithms that overcome these problems and achieve higher performance in multi-agent applications
Coordination of independent learners in cooperative Markov games.
In the framework of fully cooperative multi-agent systems, independent agents learning by reinforcement must overcome several difficulties as the coordination or the impact of exploration. The study of these issues allows first to synthesize the characteristics of existing reinforcement learning decentralized methods for independent learners in cooperative Markov games. Then, given the difficulties encountered by these approaches, we focus on two main skills: optimistic agents, which manage the coordination in deterministic environments, and the detection of the stochasticity of a game. Indeed, the key difficulty in stochastic environment is to distinguish between various causes of noise. The SOoN algorithm is so introduced, standing for “Swing between Optimistic or Neutral”, in which independent learners can adapt automatically to the environment stochasticity. Empirical results on various cooperative Markov games notably show that SOoN overcomes the main factors of non-coordination and is robust face to the exploration of other agents
Counterfactual Multi-Agent Policy Gradients
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents' policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent's action, while keeping the other agents' actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state
Learning to Collaborate by Grouping: a Consensus-oriented Strategy for Multi-agent Reinforcement Learning
Multi-agent systems require effective coordination between groups and
individuals to achieve common goals. However, current multi-agent reinforcement
learning (MARL) methods primarily focus on improving individual policies and do
not adequately address group-level policies, which leads to weak cooperation.
To address this issue, we propose a novel Consensus-oriented Strategy (CoS)
that emphasizes group and individual policies simultaneously. Specifically, CoS
comprises two main components: (a) the vector quantized group consensus module,
which extracts discrete latent embeddings that represent the stable and
discriminative group consensus, and (b) the group consensus-oriented strategy,
which integrates the group policy using a hypernet and the individual policies
using the group consensus, thereby promoting coordination at both the group and
individual levels. Through empirical experiments on cooperative navigation
tasks with both discrete and continuous spaces, as well as Google research
football, we demonstrate that CoS outperforms state-of-the-art MARL algorithms
and achieves better collaboration, thus providing a promising solution for
achieving effective coordination in multi-agent systems
Learning in Cooperative Multiagent Systems Using Cognitive and Machine Models
Developing effective Multi-Agent Systems (MAS) is critical for many
applications requiring collaboration and coordination with humans. Despite the
rapid advance of Multi-Agent Deep Reinforcement Learning (MADRL) in cooperative
MAS, one major challenge is the simultaneous learning and interaction of
independent agents in dynamic environments in the presence of stochastic
rewards. State-of-the-art MADRL models struggle to perform well in Coordinated
Multi-agent Object Transportation Problems (CMOTPs), wherein agents must
coordinate with each other and learn from stochastic rewards. In contrast,
humans often learn rapidly to adapt to nonstationary environments that require
coordination among people. In this paper, motivated by the demonstrated ability
of cognitive models based on Instance-Based Learning Theory (IBLT) to capture
human decisions in many dynamic decision making tasks, we propose three
variants of Multi-Agent IBL models (MAIBL). The idea of these MAIBL algorithms
is to combine the cognitive mechanisms of IBLT and the techniques of MADRL
models to deal with coordination MAS in stochastic environments from the
perspective of independent learners. We demonstrate that the MAIBL models
exhibit faster learning and achieve better coordination in a dynamic CMOTP task
with various settings of stochastic rewards compared to current MADRL models.
We discuss the benefits of integrating cognitive insights into MADRL models.Comment: 22 pages, 5 figures, 2 table
Off-the-Grid MARL: Datasets with Baselines for Offline Multi-Agent Reinforcement Learning
Being able to harness the power of large datasets for developing cooperative
multi-agent controllers promises to unlock enormous value for real-world
applications. Many important industrial systems are multi-agent in nature and
are difficult to model using bespoke simulators. However, in industry,
distributed processes can often be recorded during operation, and large
quantities of demonstrative data stored. Offline multi-agent reinforcement
learning (MARL) provides a promising paradigm for building effective
decentralised controllers from such datasets. However, offline MARL is still in
its infancy and therefore lacks standardised benchmark datasets and baselines
typically found in more mature subfields of reinforcement learning (RL). These
deficiencies make it difficult for the community to sensibly measure progress.
In this work, we aim to fill this gap by releasing off-the-grid MARL (OG-MARL):
a growing repository of high-quality datasets with baselines for cooperative
offline MARL research. Our datasets provide settings that are characteristic of
real-world systems, including complex environment dynamics, heterogeneous
agents, non-stationarity, many agents, partial observability, suboptimality,
sparse rewards and demonstrated coordination. For each setting, we provide a
range of different dataset types (e.g. Good, Medium, Poor, and Replay) and
profile the composition of experiences for each dataset. We hope that OG-MARL
will serve the community as a reliable source of datasets and help drive
progress, while also providing an accessible entry point for researchers new to
the field.Comment: Extended Abstract at Autonomous Agents and Multi-Agent Systems
Conference 202
- …