Stochastic partial observability poses a major challenge for decentralized
coordination in multi-agent reinforcement learning but is largely neglected in
state-of-the-art research due to a strong focus on state-based centralized
training for decentralized execution (CTDE) and benchmarks that lack sufficient
stochasticity like StarCraft Multi-Agent Challenge (SMAC). In this paper, we
propose Attention-based Embeddings of Recurrence In multi-Agent Learning
(AERIAL) to approximate value functions under stochastic partial observability.
AERIAL replaces the true state with a learned representation of multi-agent
recurrence, considering more accurate information about decentralized agent
decisions than state-based CTDE. We then introduce MessySMAC, a modified
version of SMAC with stochastic observations and higher variance in initial
states, to provide a more general and configurable benchmark regarding
stochastic partial observability. We evaluate AERIAL in Dec-Tiger as well as in
a variety of SMAC and MessySMAC maps, and compare the results with state-based
CTDE. Furthermore, we evaluate the robustness of AERIAL and state-based CTDE
against various stochasticity configurations in MessySMAC.Comment: Accepted at ICML 202