1 research outputs found
Multi-Agent Deep Reinforcement Learning with Adaptive Policies
We propose a novel approach to address one aspect of the non-stationarity
problem in multi-agent reinforcement learning (RL), where the other agents may
alter their policies due to environment changes during execution. This violates
the Markov assumption that governs most single-agent RL methods and is one of
the key challenges in multi-agent RL. To tackle this, we propose to train
multiple policies for each agent and postpone the selection of the best policy
at execution time. Specifically, we model the environment non-stationarity with
a finite set of scenarios and train policies fitting each scenario. In addition
to multiple policies, each agent also learns a policy predictor to determine
which policy is the best with its local information. By doing so, each agent is
able to adapt its policy when the environment changes and consequentially the
other agents alter their policies during execution. We empirically evaluated
our method on a variety of common benchmark problems proposed for multi-agent
deep RL in the literature. Our experimental results show that the agents
trained by our algorithm have better adaptiveness in changing environments and
outperform the state-of-the-art methods in all the tested environments.Comment: arXiv admin note: text overlap with arXiv:1706.02275 by other author