Multi-agent reinforcement learning (MARL) has become effective in tackling
discrete cooperative game scenarios. However, MARL has yet to penetrate
settings beyond those modelled by team and zero-sum games, confining it to a
small subset of multi-agent systems. In this paper, we introduce a new
generation of MARL learners that can handle nonzero-sum payoff structures and
continuous settings. In particular, we study the MARL problem in a class of
games known as stochastic potential games (SPGs) with continuous state-action
spaces. Unlike cooperative games, in which all agents share a common reward,
SPGs are capable of modelling real-world scenarios where agents seek to fulfil
their individual goals. We prove theoretically our learning method, SPot-AC,
enables independent agents to learn Nash equilibrium strategies in polynomial
time. We demonstrate our framework tackles previously unsolvable tasks such as
Coordination Navigation and large selfish routing games and that it outperforms
the state of the art MARL baselines such as MADDPG and COMIX in such scenarios.Comment: ICML 202