28,396 research outputs found
Multi-Agent Quantum Reinforcement Learning using Evolutionary Optimization
Multi-Agent Reinforcement Learning is becoming increasingly more important in
times of autonomous driving and other smart industrial applications.
Simultaneously a promising new approach to Reinforcement Learning arises using
the inherent properties of quantum mechanics, reducing the trainable parameters
of a model significantly. However, gradient-based Multi-Agent Quantum
Reinforcement Learning methods often have to struggle with barren plateaus,
holding them back from matching the performance of classical approaches. We
build upon an existing approach for gradient free Quantum Reinforcement
Learning and propose three genetic variations with Variational Quantum Circuits
for Multi-Agent Reinforcement Learning using evolutionary optimization. We
evaluate our genetic variations in the Coin Game environment and also compare
them to classical approaches. We showed that our Variational Quantum Circuit
approaches perform significantly better compared to a neural network with a
similar amount of trainable parameters. Compared to the larger neural network,
our approaches archive similar results using less parameters
Recommended from our members
Load Frequency Control: A Deep Multi-Agent Reinforcement Learning Approach
The paradigm shift in energy generation towards microgrid-based architectures is changing the landscape of the energy control structure heavily in distribution systems. More specifically, distributed generation is deployed in the network demanding decentralised control mechanisms to ensure reliable power system operations. In this work, a Multi-Agent Reinforcement Learning approach is proposed to deliver an agentbased solution to implement load frequency control without the need of a centralised authority. Multi-Agent Deep Deterministic Policy Gradient is used to approximate the frequency control at the primary and the secondary levels. Each generation unit is represented as an agent that is modelled by a Recurrent Neural Network. Agents learn the optimal way of acting and interacting with the environment to maximise their long term performance and to balance generation and load, thus restoring frequency. In this paper we prove using three test systems, with two, four and eight generators, that our Multi-Agent Reinforcement Learning approach can efficiently be used to perform frequency control in a decentralised way
Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning
In this paper, we propose a new learning technique named message-dropout to
improve the performance for multi-agent deep reinforcement learning under two
application scenarios: 1) classical multi-agent reinforcement learning with
direct message communication among agents and 2) centralized training with
decentralized execution. In the first application scenario of multi-agent
systems in which direct message communication among agents is allowed, the
message-dropout technique drops out the received messages from other agents in
a block-wise manner with a certain probability in the training phase and
compensates for this effect by multiplying the weights of the dropped-out block
units with a correction probability. The applied message-dropout technique
effectively handles the increased input dimension in multi-agent reinforcement
learning with communication and makes learning robust against communication
errors in the execution phase. In the second application scenario of
centralized training with decentralized execution, we particularly consider the
application of the proposed message-dropout to Multi-Agent Deep Deterministic
Policy Gradient (MADDPG), which uses a centralized critic to train a
decentralized actor for each agent. We evaluate the proposed message-dropout
technique for several games, and numerical results show that the proposed
message-dropout technique with proper dropout rate improves the reinforcement
learning performance significantly in terms of the training speed and the
steady-state performance in the execution phase.Comment: The 33rd AAAI Conference on Artificial Intelligence (AAAI) 201
Learning with Opponent-Learning Awareness
Multi-agent settings are quickly gathering importance in machine learning.
This includes a plethora of recent work on deep multi-agent reinforcement
learning, but also can be extended to hierarchical RL, generative adversarial
networks and decentralised optimisation. In all these settings the presence of
multiple learning agents renders the training problem non-stationary and often
leads to unstable training or undesired final results. We present Learning with
Opponent-Learning Awareness (LOLA), a method in which each agent shapes the
anticipated learning of the other agents in the environment. The LOLA learning
rule includes a term that accounts for the impact of one agent's policy on the
anticipated parameter update of the other agents. Results show that the
encounter of two LOLA agents leads to the emergence of tit-for-tat and
therefore cooperation in the iterated prisoners' dilemma, while independent
learning does not. In this domain, LOLA also receives higher payouts compared
to a naive learner, and is robust against exploitation by higher order
gradient-based methods. Applied to repeated matching pennies, LOLA agents
converge to the Nash equilibrium. In a round robin tournament we show that LOLA
agents successfully shape the learning of a range of multi-agent learning
algorithms from literature, resulting in the highest average returns on the
IPD. We also show that the LOLA update rule can be efficiently calculated using
an extension of the policy gradient estimator, making the method suitable for
model-free RL. The method thus scales to large parameter and input spaces and
nonlinear function approximators. We apply LOLA to a grid world task with an
embedded social dilemma using recurrent policies and opponent modelling. By
explicitly considering the learning of the other agent, LOLA agents learn to
cooperate out of self-interest. The code is at github.com/alshedivat/lola
Effective Multi-Agent Deep Reinforcement Learning Control with Relative Entropy Regularization
In this paper, a novel Multi-agent Reinforcement Learning (MARL) approach,
Multi-Agent Continuous Dynamic Policy Gradient (MACDPP) was proposed to tackle
the issues of limited capability and sample efficiency in various scenarios
controlled by multiple agents. It alleviates the inconsistency of multiple
agents' policy updates by introducing the relative entropy regularization to
the Centralized Training with Decentralized Execution (CTDE) framework with the
Actor-Critic (AC) structure. Evaluated by multi-agent cooperation and
competition tasks and traditional control tasks including OpenAI benchmarks and
robot arm manipulation, MACDPP demonstrates significant superiority in learning
capability and sample efficiency compared with both related multi-agent and
widely implemented signal-agent baselines and therefore expands the potential
of MARL in effectively learning challenging control scenarios
- …