Adaptive human-agent and agent-agent cooperation are becoming more and more
critical in the research area of multi-agent reinforcement learning (MARL),
where remarked progress has been made with the help of deep neural networks.
However, many established algorithms can only perform well during the learning
paradigm but exhibit poor generalization during cooperation with other unseen
partners. The personality theory in cognitive psychology describes that humans
can well handle the above cooperation challenge by predicting others'
personalities first and then their complex actions. Inspired by this two-step
psychology theory, we propose a biologically plausible mixture of personality
(MoP) improved spiking actor network (SAN), whereby a determinantal point
process is used to simulate the complex formation and integration of different
types of personality in MoP, and dynamic and spiking neurons are incorporated
into the SAN for the efficient reinforcement learning. The benchmark Overcooked
task, containing a strong requirement for cooperative cooking, is selected to
test the proposed MoP-SAN. The experimental results show that the MoP-SAN can
achieve both high performances during not only the learning paradigm but also
the generalization test (i.e., cooperation with other unseen agents) paradigm
where most counterpart deep actor networks failed. Necessary ablation
experiments and visualization analyses were conducted to explain why MoP and
SAN are effective in multi-agent reinforcement learning scenarios while DNN
performs poorly in the generalization test.Comment: 20 pages, 7 figure