4,253 research outputs found
A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
This paper extends off-policy reinforcement learning to the multi-agent case
in which a set of networked agents communicating with their neighbors according
to a time-varying graph collaboratively evaluates and improves a target policy
while following a distinct behavior policy. To this end, the paper develops a
multi-agent version of emphatic temporal difference learning for off-policy
policy evaluation, and proves convergence under linear function approximation.
The paper then leverages this result, in conjunction with a novel multi-agent
off-policy policy gradient theorem and recent work in both multi-agent
on-policy and single-agent off-policy actor-critic methods, to develop and give
convergence guarantees for a new multi-agent off-policy actor-critic algorithm
Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
In this paper, we propose a distributed off-policy actor critic method to
solve multi-agent reinforcement learning problems. Specifically, we assume that
all agents keep local estimates of the global optimal policy parameter and
update their local value function estimates independently. Then, we introduce
an additional consensus step to let all the agents asymptotically achieve
agreement on the global optimal policy function. The convergence analysis of
the proposed algorithm is provided and the effectiveness of the proposed
algorithm is validated using a distributed resource allocation example.
Compared to relevant distributed actor critic methods, here the agents do not
share information about their local tasks, but instead they coordinate to
estimate the global policy function
A Review of Reinforcement Learning for Autonomous Building Energy Management
The area of building energy management has received a significant amount of
interest in recent years. This area is concerned with combining advancements in
sensor technologies, communications and advanced control algorithms to optimize
energy utilization. Reinforcement learning is one of the most prominent machine
learning algorithms used for control problems and has had many successful
applications in the area of building energy management. This research gives a
comprehensive review of the literature relating to the application of
reinforcement learning to developing autonomous building energy management
systems. The main direction for future research and challenges in reinforcement
learning are also outlined.Comment: 17 pages, 3 figure
Asynchronous Methods for Deep Reinforcement Learning
We propose a conceptually simple and lightweight framework for deep
reinforcement learning that uses asynchronous gradient descent for optimization
of deep neural network controllers. We present asynchronous variants of four
standard reinforcement learning algorithms and show that parallel
actor-learners have a stabilizing effect on training allowing all four methods
to successfully train neural network controllers. The best performing method,
an asynchronous variant of actor-critic, surpasses the current state-of-the-art
on the Atari domain while training for half the time on a single multi-core CPU
instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds
on a wide variety of continuous motor control problems as well as on a new task
of navigating random 3D mazes using a visual input
Meta reinforcement learning as task inference
Humans achieve efficient learning by relying on prior knowledge about the
structure of naturally occurring tasks. There is considerable interest in
designing reinforcement learning (RL) algorithms with similar properties. This
includes proposals to learn the learning algorithm itself, an idea also known
as meta learning. One formal interpretation of this idea is as a partially
observable multi-task RL problem in which task information is hidden from the
agent. Such unknown task problems can be reduced to Markov decision processes
(MDPs) by augmenting an agent's observations with an estimate of the belief
about the task based on past experience. However estimating the belief state is
intractable in most partially-observed MDPs. We propose a method that
separately learns the policy and the task belief by taking advantage of various
kinds of privileged information. Our approach can be very effective at solving
standard meta-RL environments, as well as a complex continuous control
environment with sparse rewards and requiring long-term memory
Two-stage Deep Reinforcement Learning for Inverter-based Volt-VAR Control in Active Distribution Networks
Model-based Vol/VAR optimization method is widely used to eliminate voltage
violations and reduce network losses. However, the parameters of active
distribution networks(ADNs) are not onsite identified, so significant errors
may be involved in the model and make the model-based method infeasible. To
cope with this critical issue, we propose a novel two-stage deep reinforcement
learning (DRL) method to improve the voltage profile by regulating
inverter-based energy resources, which consists of offline stage and online
stage. In the offline stage, a highly efficient adversarial reinforcement
learning algorithm is developed to train an offline agent robust to the model
mismatch. In the sequential online stage, we transfer the offline agent safely
as the online agent to perform continuous learning and controlling online with
significantly improved safety and efficiency. Numerical simulations on IEEE
test cases not only demonstrate that the proposed adversarial reinforcement
learning algorithm outperforms the state-of-art algorithm, but also show that
our proposed two-stage method achieves much better performance than the
existing DRL based methods in the online application.Comment: 8 page
A Survey and Critique of Multiagent Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved outstanding results in recent
years. This has led to a dramatic increase in the number of applications and
methods. Recent works have explored learning beyond single-agent scenarios and
have considered multiagent learning (MAL) scenarios. Initial results report
successes in complex multiagent domains, although there are several challenges
to be addressed. The primary goal of this article is to provide a clear
overview of current multiagent deep reinforcement learning (MDRL) literature.
Additionally, we complement the overview with a broader analysis: (i) we
revisit previous key components, originally presented in MAL and RL, and
highlight how they have been adapted to multiagent deep reinforcement learning
settings. (ii) We provide general guidelines to new practitioners in the area:
describing lessons learned from MDRL works, pointing to recent benchmarks, and
outlining open avenues of research. (iii) We take a more critical tone raising
practical challenges of MDRL (e.g., implementation and computational demands).
We expect this article will help unify and motivate future research to take
advantage of the abundant literature that exists (e.g., RL and MAL) in a joint
effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the
title: "Is multiagent deep reinforcement learning the answer or the question?
A brief survey
Learning to Schedule Communication in Multi-agent Reinforcement Learning
Many real-world reinforcement learning tasks require multiple agents to make
sequential decisions under the agents' interaction, where well-coordinated
actions among the agents are crucial to achieve the target goal better at these
tasks. One way to accelerate the coordination effect is to enable multiple
agents to communicate with each other in a distributed manner and behave as a
group. In this paper, we study a practical scenario when (i) the communication
bandwidth is limited and (ii) the agents share the communication medium so that
only a restricted number of agents are able to simultaneously use the medium,
as in the state-of-the-art wireless networking standards. This calls for a
certain form of communication scheduling. In that regard, we propose a
multi-agent deep reinforcement learning framework, called SchedNet, in which
agents learn how to schedule themselves, how to encode the messages, and how to
select actions based on received messages. SchedNet is capable of deciding
which agents should be entitled to broadcasting their (encoded) messages, by
learning the importance of each agent's partially observed information. We
evaluate SchedNet against multiple baselines under two different applications,
namely, cooperative communication and navigation, and predator-prey. Our
experiments show a non-negligible performance gap between SchedNet and other
mechanisms such as the ones without communication and with vanilla scheduling
methods, e.g., round robin, ranging from 32% to 43%.Comment: Accepted in ICLR 201
A Brief Survey of Deep Reinforcement Learning
Deep reinforcement learning is poised to revolutionise the field of AI and
represents a step towards building autonomous systems with a higher level
understanding of the visual world. Currently, deep learning is enabling
reinforcement learning to scale to problems that were previously intractable,
such as learning to play video games directly from pixels. Deep reinforcement
learning algorithms are also applied to robotics, allowing control policies for
robots to be learned directly from camera inputs in the real world. In this
survey, we begin with an introduction to the general field of reinforcement
learning, then progress to the main streams of value-based and policy-based
methods. Our survey will cover central algorithms in deep reinforcement
learning, including the deep -network, trust region policy optimisation, and
asynchronous advantage actor-critic. In parallel, we highlight the unique
advantages of deep neural networks, focusing on visual understanding via
reinforcement learning. To conclude, we describe several current areas of
research within the field.Comment: IEEE Signal Processing Magazine, Special Issue on Deep Learning for
Image Understanding (arXiv extended version
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
This paper presents a comprehensive literature review on applications of deep
reinforcement learning in communications and networking. Modern networks, e.g.,
Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become
more decentralized and autonomous. In such networks, network entities need to
make decisions locally to maximize the network performance under uncertainty of
network environment. Reinforcement learning has been efficiently used to enable
the network entities to obtain the optimal policy including, e.g., decisions or
actions, given their states when the state and action spaces are small.
However, in complex and large-scale networks, the state and action spaces are
usually large, and the reinforcement learning may not be able to find the
optimal policy in reasonable time. Therefore, deep reinforcement learning, a
combination of reinforcement learning with deep learning, has been developed to
overcome the shortcomings. In this survey, we first give a tutorial of deep
reinforcement learning from fundamental concepts to advanced models. Then, we
review deep reinforcement learning approaches proposed to address emerging
issues in communications and networking. The issues include dynamic network
access, data rate control, wireless caching, data offloading, network security,
and connectivity preservation which are all important to next generation
networks such as 5G and beyond. Furthermore, we present applications of deep
reinforcement learning for traffic routing, resource sharing, and data
collection. Finally, we highlight important challenges, open issues, and future
research directions of applying deep reinforcement learning.Comment: 37 pages, 13 figures, 6 tables, 174 reference paper
- …