26,782 research outputs found
Lenient multi-agent deep reinforcement learning
Much of the success of single agent deep reinforcement learning (DRL) in recent years can be attributed to the use of experience replay memories (ERM), which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deep reinforcement learning (MA-DRL), as stored transitions can become outdated because agents update their policies in parallel [11]. In this work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to decaying temperature values that control the amount of leniency applied towards negative policy updates that are sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation in tabular fully-cooperative multi-agent reinforcement learning problems. We evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN (HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN, that uses average reward learning near terminal states. Evaluations take place in extended variations of the Coordinated Multi-Agent Object Transportation Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic rewards. We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents
Dynamic Controller Assignment in Software Defined Internet of Vehicles through Multi-Agent Deep Reinforcement Learning
International audienceIn this paper, we introduce a novel dynamic controller assignment algorithm targeting connected vehicle services and applications, also known as Internet of Vehicles (IoV). The proposed approach considers a hierarchically distributed control plane, decoupled from the data plane, and uses vehicle location and control traffic load to perform controller assignment dynamically. We model the dynamic controller assignment problem as a multi-agent Markov game and solve it with cooperative multi-agent deep reinforcement learning. Simulation results using real-world vehicle mobility traces show that the proposed approach outperforms existing ones by reducing control delay as well as packet loss. Index Terms-Internet of Vehicles (IoV), Software Defined Networking (SDN), multi-agent deep reinforcement learning, controller assignment
Lenient Multi-Agent Deep Reinforcement Learning.
Much of the success of single agent deep reinforcement learning (DRL) in
recent years can be attributed to the use of experience replay memories (ERM),
which allow Deep Q-Networks (DQNs) to be trained efficiently through sampling
stored state transitions. However, care is required when using ERMs for
multi-agent deep reinforcement learning (MA-DRL), as stored transitions can
become outdated because agents update their policies in parallel [11]. In this
work we apply leniency [23] to MA-DRL. Lenient agents map state-action pairs to
decaying temperature values that control the amount of leniency applied towards
negative policy updates that are sampled from the ERM. This introduces optimism
in the value-function update, and has been shown to facilitate cooperation in
tabular fully-cooperative multi-agent reinforcement learning problems. We
evaluate our Lenient-DQN (LDQN) empirically against the related Hysteretic-DQN
(HDQN) algorithm [22] as well as a modified version we call scheduled-HDQN,
that uses average reward learning near terminal states. Evaluations take place
in extended variations of the Coordinated Multi-Agent Object Transportation
Problem (CMOTP) [8] which include fully-cooperative sub-tasks and stochastic
rewards. We find that LDQN agents are more likely to converge to the optimal
policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN
agents.Comment: 9 pages, 6 figures, AAMAS2018 Conference Proceeding
Deep Reinforcement Learning for Swarm Systems
Recently, deep reinforcement learning (RL) methods have been applied
successfully to multi-agent scenarios. Typically, these methods rely on a
concatenation of agent states to represent the information content required for
decentralized decision making. However, concatenation scales poorly to swarm
systems with a large number of homogeneous agents as it does not exploit the
fundamental properties inherent to these systems: (i) the agents in the swarm
are interchangeable and (ii) the exact number of agents in the swarm is
irrelevant. Therefore, we propose a new state representation for deep
multi-agent RL based on mean embeddings of distributions. We treat the agents
as samples of a distribution and use the empirical mean embedding as input for
a decentralized policy. We define different feature spaces of the mean
embedding using histograms, radial basis functions and a neural network learned
end-to-end. We evaluate the representation on two well known problems from the
swarm literature (rendezvous and pursuit evasion), in a globally and locally
observable setup. For the local setup we furthermore introduce simple
communication protocols. Of all approaches, the mean embedding representation
using neural network features enables the richest information exchange between
neighboring agents facilitating the development of more complex collective
strategies.Comment: 31 pages, 12 figures, version 3 (published in JMLR Volume 20
- …