11 research outputs found
Approximate Multi-Agent Fitted Q Iteration
We formulate an efficient approximation for multi-agent batch reinforcement
learning, the approximate multi-agent fitted Q iteration (AMAFQI). We present a
detailed derivation of our approach. We propose an iterative policy search and
show that it yields a greedy policy with respect to multiple approximations of
the centralized, standard Q-function. In each iteration and policy evaluation,
AMAFQI requires a number of computations that scales linearly with the number
of agents whereas the analogous number of computations increase exponentially
for the fitted Q iteration (FQI), one of the most commonly used approaches in
batch reinforcement learning. This property of AMAFQI is fundamental for the
design of a tractable multi-agent approach. We evaluate the performance of
AMAFQI and compare it to FQI in numerical simulations. Numerical examples
illustrate the significant computation time reduction when using AMAFQI instead
of FQI in multi-agent problems and corroborate the similar decision-making
performance of both approaches
RODE: Learning Roles to Decompose Multi-Agent Tasks
Role-based learning holds the promise of achieving scalable multi-agent
learning by decomposing complex tasks using roles. However, it is largely
unclear how to efficiently discover such a set of roles. To solve this problem,
we propose to first decompose joint action spaces into restricted role action
spaces by clustering actions according to their effects on the environment and
other agents. Learning a role selector based on action effects makes role
discovery much easier because it forms a bi-level learning hierarchy -- the
role selector searches in a smaller role space and at a lower temporal
resolution, while role policies learn in significantly reduced primitive
action-observation spaces. We further integrate information about action
effects into the role policies to boost learning efficiency and policy
generalization. By virtue of these advances, our method (1) outperforms the
current state-of-the-art MARL algorithms on 10 of the 14 scenarios that
comprise the challenging StarCraft II micromanagement benchmark and (2)
achieves rapid transfer to new environments with three times the number of
agents. Demonstrative videos are available at
https://sites.google.com/view/rode-marl
Reinforcement Learning versus Swarm Intelligence for Autonomous Multi-HAPS Coordination
This work analyses the performance of Reinforcement Learning
(RL) versus Swarm Intelligence (SI) for coordinating multiple unmanned High Altitude Platform Stations (HAPS) for communications area coverage. It builds upon previous work which looked at various elements of both algorithms. The
main aim of this paper is to address the continuous state space challenge within this work by using partitioning to manage the high dimensionality problem. This enabled comparing the performance of the classical cases of both RL and SI establishing a baseline for future comparisons of improved versions. From previous work, SI was observed to perform better across various key performance indicators. However, after tuning parameters and empirically choosing
suitable partitioning ratio for the RL state space, it was observed that the SI algorithm still maintained superior coordination capability by achieving higher mean overall user coverage (about 20% better than the RL algorithm), in addition to faster convergence rates. Though the RL technique showed better average peak user coverage, the unpredictable coverage dips was a key weakness, making SI a more suitable algorithm within the context of this work
Context-Aware Sparse Deep Coordination Graphs
Learning sparse coordination graphs adaptive to the coordination dynamics
among agents is a long-standing problem in cooperative multi-agent learning.
This paper studies this problem and proposes a novel method using the variance
of payoff functions to construct context-aware sparse coordination topologies.
We theoretically consolidate our method by proving that the smaller the
variance of payoff functions is, the less likely action selection will change
after removing the corresponding edge. Moreover, we propose to learn action
representations to effectively reduce the influence of payoff functions'
estimation errors on graph construction. To empirically evaluate our method, we
present the Multi-Agent COordination (MACO) benchmark by collecting classic
coordination problems in the literature, increasing their difficulty, and
classifying them into different types. We carry out a case study and
experiments on the MACO and StarCraft II micromanagement benchmark to
demonstrate the dynamics of sparse graph learning, the influence of graph
sparseness, and the learning performance of our method