Deep cooperative multi-agent reinforcement learning has demonstrated its
remarkable success over a wide spectrum of complex control tasks. However,
recent advances in multi-agent learning mainly focus on value decomposition
while leaving entity interactions still intertwined, which easily leads to
over-fitting on noisy interactions between entities. In this work, we introduce
a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only
the joint value function into agent-wise value functions for decentralized
execution, but also the entity interactions into interaction prototypes, each
of which represents an underlying interaction pattern within a subgroup of the
entities. OPT facilitates filtering the noisy interactions between irrelevant
entities and thus significantly improves generalizability as well as
interpretability. Specifically, OPT introduces a sparse disagreement mechanism
to encourage sparsity and diversity among discovered interaction prototypes.
Then the model selectively restructures these prototypes into a compact
interaction pattern by an aggregator with learnable weights. To alleviate the
training instability issue caused by partial observability, we propose to
maximize the mutual information between the aggregation weights and the history
behaviors of each agent. Experiments on both single-task and multi-task
benchmarks demonstrate that the proposed method yields results superior to the
state-of-the-art counterparts. Our code is available at
https://github.com/liushunyu/OPT