The necessity for cooperation among intelligent machines has popularised
cooperative multi-agent reinforcement learning (MARL) in the artificial
intelligence (AI) research community. However, many research endeavors have
been focused on developing practical MARL algorithms whose effectiveness has
been studied only empirically, thereby lacking theoretical guarantees. As
recent studies have revealed, MARL methods often achieve performance that is
unstable in terms of reward monotonicity or suboptimal at convergence. To
resolve these issues, in this paper, we introduce a novel framework named
Heterogeneous-Agent Mirror Learning (HAML) that provides a general template for
MARL algorithmic designs. We prove that algorithms derived from the HAML
template satisfy the desired properties of the monotonic improvement of the
joint reward and the convergence to Nash equilibrium. We verify the
practicality of HAML by proving that the current state-of-the-art cooperative
MARL algorithms, HATRPO and HAPPO, are in fact HAML instances. Next, as a
natural outcome of our theory, we propose HAML extensions of two well-known RL
algorithms, HAA2C (for A2C) and HADDPG (for DDPG), and demonstrate their
effectiveness against strong baselines on StarCraftII and Multi-Agent MuJoCo
tasks