The effect of different types of internal rewards in distributed multi-agent deep reinforcement learning.pdf

Abstract

Distributed multiagent reinforcement learning in the same environment is prohibitively hard, due to the difficulty of assigning credit for the individual actions of the agent, especially when the agent is a member of a team. Meanwhile, the sparse delayed reward about the team from the environment such as winning makes the learning progress more challenging. To solve the credit assignment and sparse delayed reward problems which are common in multiagent reinforcement learning, researchers usually construct or learn an internal reward signal that acts as a proxy for winning and provides denser rewards for individual agent. To improve the learning effect of a typical multiagent learning task, we conducted three types of internal rewards for multiagent team members and evaluated the effect of these rewards. The results show that not all internal reward can improve the learning effect of multiagent reinforcement learning, it seems that when the learning task is not very complex and the time of finishing the task for the team is not very long, the sparse reward such as winning have the best learning effect, and the learning effect of the other two forms of reward is not as good as that of the simple sparse reward. To some extent, our research results can provide a reference for the design of reward function in the application of distributed multi-agent reinforcement learning

    Similar works