7 research outputs found
Recommended from our members
CLEAN learning to improve coordination and scalability in multiagent systems
Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available. This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.Keywords: Multiagent Coordination, Multiagent Learning, UAV Communication Network, Fractionated Satellites, UAV Swarms, Distributed Control, Multiagent Scalability, Learning based control, Reward Shaping, Cubesats, Multiagent systems, Solar Power UAVs, Satellite Constellation
CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning
Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent's reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent's reward signal. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards and empirically demonstrate their benefit
Factorized Q-Learning for Large-Scale Multi-Agent Systems
Deep Q-learning has achieved significant success in single-agent decision
making tasks. However, it is challenging to extend Q-learning to large-scale
multi-agent scenarios, due to the explosion of action space resulting from the
complex dynamics between the environment and the agents. In this paper, we
propose to make the computation of multi-agent Q-learning tractable by treating
the Q-function (w.r.t. state and joint-action) as a high-order high-dimensional
tensor and then approximate it with factorized pairwise interactions.
Furthermore, we utilize a composite deep neural network architecture for
computing the factorized Q-function, share the model parameters among all the
agents within the same group, and estimate the agents' optimal joint actions
through a coordinate descent type algorithm. All these simplifications greatly
reduce the model complexity and accelerate the learning process. Extensive
experiments on two different multi-agent problems demonstrate the performance
gain of our proposed approach in comparison with strong baselines, particularly
when there are a large number of agents.Comment: 7 pages, 5 figures, DAI 201