Search CORE

7 research outputs found

Recommended from our members

CLEAN learning to improve coordination and scalability in multiagent systems

Author: HolmesParker Chris
Publication venue: 'Oregon State University'
Publication date
Field of study

Recent advances in multiagent learning have led to exciting new capabilities spanning fields as diverse as planetary exploration, air traffic control, military reconnaissance, and airport security. Such algorithms provide a tangible benefit over traditional control algorithms in that they allow fast responses, adapt to dynamic environments, and generally scale well. Unfortunately, because many existing multiagent learning methods are extensions of single agent approaches, they are inhibited by three key issues: i) they treat the actions of other agents as "environmental noise" in an attempt to simplify the problem complexity, ii) they are slow to converge in large systems as the joint action space grows exponentially in the number of agents, and iii) they frequently rely upon the presence of an accurate system model being readily available. This work addresses these three issues sequentially. First, we improve overall learning performance compared to existing state-of-the-art techniques in the field by embracing the exploration in learning rather than ignoring it or approximating it away. Within multiagent systems, exploration by individual agents significantly alters the dynamics of the environment in which all agents learn. To address this, we introduce the concept of "private" exploration, which enables each agent to present a stationary baseline policy to other agents in order to allow other agents in the system to learn more efficiently. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards which improve coordination and performance by utilizing the concept of private exploration in order to remove the negative impact of traditional "public" exploration strategies from learning in multiagent systems. Next, we leverage the fundamental properties of CLEAN rewards that enable private exploration to allow agents to explore multiple potential actions concurrently in a "batch mode" in order to significantly improve learning speed over the state-of-the-art. Finally, we improve the real-world applicability of the proposed techniques by reducing their requirements. Specifically, the CLEAN rewards developed require an accurate partial model (i.e., an accurate model of the system objective) of the system in order to be computed. Unfortunately, many real-world systems are too complex to be modeled or are not known in advance, so an accurate system model is not available a priori. We address this shortcoming by employing model-based reinforcement learning techniques to enable agents to construct their own approximate model of the system objective based upon their observations and use this approximate model to calculate their CLEAN rewards.Keywords: Multiagent Coordination, Multiagent Learning, UAV Communication Network, Fractionated Satellites, UAV Swarms, Distributed Control, Multiagent Scalability, Learning based control, Reward Shaping, Cubesats, Multiagent systems, Solar Power UAVs, Satellite Constellation

ScholarsArchive@OSU

CLEANing the Reward: Counterfactual Actions to Remove Exploratory Action Noise in Multiagent Learning

Author: Agogino Adrian
HolmesParker Chris
Taylor Mathew E.
Tumer Kagan
Publication venue
Publication date
Field of study

Learning in multiagent systems can be slow because agents must learn both how to behave in a complex environment and how to account for the actions of other agents. The inability of an agent to distinguish between the true environmental dynamics and those caused by the stochastic exploratory actions of other agents creates noise in each agent's reward signal. This learning noise can have unforeseen and often undesirable effects on the resultant system performance. We define such noise as exploratory action noise, demonstrate the critical impact it can have on the learning process in multiagent settings, and introduce a reward structure to effectively remove such noise from each agent's reward signal. In particular, we introduce Coordinated Learning without Exploratory Action Noise (CLEAN) rewards and empirically demonstrate their benefit

NASA Technical Reports Server

Factorized Q-Learning for Large-Scale Multi-Agent Systems

Author: Claus Caroline
Foerster Jakob N.
HolmesParker Chris
Jelle
Lample Guillaume
Littman Michael L.
Lowe Ryan
Tesauro Gerald
van Hasselt Hado
van Hasselt Hado
Wang Ziyu
Watkins Christopher J. C. H.
Yang Yaodong
Zheng Lianmin
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/10/2019
Field of study

Deep Q-learning has achieved significant success in single-agent decision making tasks. However, it is challenging to extend Q-learning to large-scale multi-agent scenarios, due to the explosion of action space resulting from the complex dynamics between the environment and the agents. In this paper, we propose to make the computation of multi-agent Q-learning tractable by treating the Q-function (w.r.t. state and joint-action) as a high-order high-dimensional tensor and then approximate it with factorized pairwise interactions. Furthermore, we utilize a composite deep neural network architecture for computing the factorized Q-function, share the model parameters among all the agents within the same group, and estimate the agents' optimal joint actions through a coordinate descent type algorithm. All these simplifications greatly reduce the model complexity and accelerate the learning process. Extensive experiments on two different multi-agent problems demonstrate the performance gain of our proposed approach in comparison with strong baselines, particularly when there are a large number of agents.Comment: 7 pages, 5 figures, DAI 201

arXiv.org e-Print Archive

Crossref

Combining reward shaping and hierarchies for scaling to large multiagent systems

Author: Adrian K. Agogino
Bharathidasan
Chris HolmesParker
Kagan Tumer
Kok
Sutton
Tumer
Vinyals
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref