Discrete Message via Online Clustering Labels in Decentralized POMDP

Chen, Jingdi; Lan, Tian

Discrete Message via Online Clustering Labels in Decentralized POMDP

Authors: Jingdi Chen
Tian Lan
Publication date: 14 August 2023
Publisher

Abstract

Communication is crucial for solving cooperative Multi-Agent Reinforcement Learning tasks in Partially-Observable Markov Decision Processes. Existing works often rely on black-box methods to encode local information/features into messages shared with other agents. However, such black-box approaches are unable to provide any quantitative guarantees on the expected return and often lead to the generation of continuous messages with high communication overhead and poor interpretability. In this paper, we establish an upper bound on the return gap between an ideal policy with full observability and an optimal partially-observable policy with discrete communication. This result enables us to recast multi-agent communication into a novel online clustering problem over the local observations at each agent, with messages as cluster labels and the upper bound on the return gap as clustering loss. By minimizing the upper bound, we propose a surprisingly simple design of message generation functions in multi-agent communication and integrate it with reinforcement learning using a Regularized Information Maximization loss function. Evaluations show that the proposed discrete communication significantly outperforms state-of-the-art multi-agent communication baselines and can achieve nearly-optimal returns with few-bit messages that are naturally interpretable

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2308.03358

Last time updated on 18/08/2023