68,549 research outputs found
Accelerating Online Reinforcement Learning with Offline Datasets
Reinforcement learning (RL) provides an appealing formalism for learning
control policies from experience. However, the classic active formulation of RL
necessitates a lengthy active exploration process for each behavior, making it
difficult to apply in real-world settings such as robotic control. If we can
instead allow RL algorithms to effectively use previously collected data to aid
the online learning process, such applications could be made substantially more
practical: the prior data would provide a starting point that mitigates
challenges due to exploration and sample complexity, while the online training
enables the agent to perfect the desired skill. Such prior data could either
constitute expert demonstrations or, more generally, sub-optimal prior data
that illustrates potentially useful transitions. But it remains difficult to
train a policy with potentially sub-optimal offline data and improve it further
with online RL. In this paper we systematically analyze why this problem is so
challenging, and propose an algorithm that combines sample-efficient dynamic
programming with maximum likelihood policy updates, providing a simple and
effective framework that is able to leverage large amounts of offline data and
then quickly perform online fine-tuning of RL policies. We show that our
method, advantage weighted actor critic (AWAC), enables rapid learning of
skills with a combination of prior demonstration data and online experience.Comment: 17 pages. Website: https://awacrl.github.io
Variational Adaptive-Newton Method for Explorative Learning
We present the Variational Adaptive Newton (VAN) method which is a black-box
optimization method especially suitable for explorative-learning tasks such as
active learning and reinforcement learning. Similar to Bayesian methods, VAN
estimates a distribution that can be used for exploration, but requires
computations that are similar to continuous optimization methods. Our
theoretical contribution reveals that VAN is a second-order method that unifies
existing methods in distinct fields of continuous optimization, variational
inference, and evolution strategies. Our experimental results show that VAN
performs well on a wide-variety of learning tasks. This work presents a
general-purpose explorative-learning method that has the potential to improve
learning in areas such as active learning and reinforcement learning
Planning to Explore via Self-Supervised World Models
Reinforcement learning allows solving complex tasks, however, the learning
tends to be task-specific and the sample efficiency remains a challenge. We
present Plan2Explore, a self-supervised reinforcement learning agent that
tackles both these challenges through a new approach to self-supervised
exploration and fast adaptation to new tasks, which need not be known during
exploration. During exploration, unlike prior methods which retrospectively
compute the novelty of observations after the agent has already reached them,
our agent acts efficiently by leveraging planning to seek out expected future
novelty. After exploration, the agent quickly adapts to multiple downstream
tasks in a zero or a few-shot manner. We evaluate on challenging control tasks
from high-dimensional image inputs. Without any training supervision or
task-specific interaction, Plan2Explore outperforms prior self-supervised
exploration methods, and in fact, almost matches the performances oracle which
has access to rewards. Videos and code at
https://ramanans1.github.io/plan2explore/Comment: Accepted at ICML 2020. Videos and code at
https://ramanans1.github.io/plan2explore
Active Deep Q-learning with Demonstration
Recent research has shown that although Reinforcement Learning (RL) can
benefit from expert demonstration, it usually takes considerable efforts to
obtain enough demonstration. The efforts prevent training decent RL agents with
expert demonstration in practice. In this work, we propose Active Reinforcement
Learning with Demonstration (ARLD), a new framework to streamline RL in terms
of demonstration efforts by allowing the RL agent to query for demonstration
actively during training. Under the framework, we propose Active Deep
Q-Network, a novel query strategy which adapts to the dynamically-changing
distributions during the RL training process by estimating the uncertainty of
recent states. The expert demonstration data within Active DQN are then
utilized by optimizing supervised max-margin loss in addition to temporal
difference loss within usual DQN training. We propose two methods of estimating
the uncertainty based on two state-of-the-art DQN models, namely the divergence
of bootstrapped DQN and the variance of noisy DQN. The empirical results
validate that both methods not only learn faster than other passive expert
demonstration methods with the same amount of demonstration and but also reach
super-expert level of performance across four different tasks
A Short Survey on Probabilistic Reinforcement Learning
A reinforcement learning agent tries to maximize its cumulative payoff by
interacting in an unknown environment. It is important for the agent to explore
suboptimal actions as well as to pick actions with highest known rewards. Yet,
in sensitive domains, collecting more data with exploration is not always
possible, but it is important to find a policy with a certain performance
guaranty. In this paper, we present a brief survey of methods available in the
literature for balancing exploration-exploitation trade off and computing
robust solutions from fixed samples in reinforcement learning.Comment: 7 pages, originally written as a literature survey for PhD candidacy
exa
Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction
Model-free reinforcement learning based methods such as Proximal Policy
Optimization, or Q-learning typically require thousands of interactions with
the environment to approximate the optimum controller which may not always be
feasible in robotics due to safety and time consumption. Model-based methods
such as PILCO or BlackDrops, while data-efficient, provide solutions with
limited robustness and complexity. To address this tradeoff, we introduce
active uncertainty reduction-based virtual environments, which are formed
through limited trials conducted in the original environment. We provide an
efficient method for uncertainty management, which is used as a metric for
self-improvement by identification of the points with maximum expected
improvement through adaptive sampling. Capturing the uncertainty also allows
for better mimicking of the reward responses of the original system. Our
approach enables the use of complex policy structures and reward functions
through a unique combination of model-based and model-free methods, while still
retaining the data efficiency. We demonstrate the validity of our method on
several classic reinforcement learning problems in OpenAI gym. We prove that
our approach offers a better modeling capacity for complex system dynamics as
compared to established methods
Mobile Edge Computation Offloading Using Game Theory and Reinforcement Learning
Due to the ever-increasing popularity of resource-hungry and
delay-constrained mobile applications, the computation and storage capabilities
of remote cloud has partially migrated towards the mobile edge, giving rise to
the concept known as Mobile Edge Computing (MEC). While MEC servers enjoy the
close proximity to the end-users to provide services at reduced latency and
lower energy costs, they suffer from limitations in computational and radio
resources, which calls for fair efficient resource management in the MEC
servers. The problem is however challenging due to the ultra-high density,
distributed nature, and intrinsic randomness of next generation wireless
networks. In this article, we focus on the application of game theory and
reinforcement learning for efficient distributed resource management in MEC, in
particular, for computation offloading. We briefly review the cutting-edge
research and discuss future challenges. Furthermore, we develop a
game-theoretical model for energy-efficient distributed edge server activation
and study several learning techniques. Numerical results are provided to
illustrate the performance of these distributed learning techniques. Also, open
research issues in the context of resource management in MEC servers are
discussed
Exploration in Interactive Personalized Music Recommendation: A Reinforcement Learning Approach
Current music recommender systems typically act in a greedy fashion by
recommending songs with the highest user ratings. Greedy recommendation,
however, is suboptimal over the long term: it does not actively gather
information on user preferences and fails to recommend novel songs that are
potentially interesting. A successful recommender system must balance the needs
to explore user preferences and to exploit this information for recommendation.
This paper presents a new approach to music recommendation by formulating this
exploration-exploitation trade-off as a reinforcement learning task called the
multi-armed bandit. To learn user preferences, it uses a Bayesian model, which
accounts for both audio content and the novelty of recommendations. A
piecewise-linear approximation to the model and a variational inference
algorithm are employed to speed up Bayesian inference. One additional benefit
of our approach is a single unified model for both music recommendation and
playlist generation. Both simulation results and a user study indicate strong
potential for the new approach
Deep Reinforcement Learning based Optimal Control of Hot Water Systems
Energy consumption for hot water production is a major draw in high
efficiency buildings. Optimizing this has typically been approached from a
thermodynamics perspective, decoupled from occupant influence. Furthermore,
optimization usually presupposes existence of a detailed dynamics model for the
hot water system. These assumptions lead to suboptimal energy efficiency in the
real world. In this paper, we present a novel reinforcement learning based
methodology which optimizes hot water production. The proposed methodology is
completely generalizable, and does not require an offline step or human domain
knowledge to build a model for the hot water vessel or the heating element.
Occupant preferences too are learnt on the fly. The proposed system is applied
to a set of 32 houses in the Netherlands where it reduces energy consumption
for hot water production by roughly 20% with no loss of occupant comfort.
Extrapolating, this translates to absolute savings of roughly 200 kWh for a
single household on an annual basis. This performance can be replicated to any
domestic hot water system and optimization objective, given that the fairly
minimal requirements on sensor data are met. With millions of hot water systems
operational worldwide, the proposed framework has the potential to reduce
energy consumption in existing and new systems on a multi Gigawatt-hour scale
in the years to come
A Brief Survey of Deep Reinforcement Learning
Deep reinforcement learning is poised to revolutionise the field of AI and
represents a step towards building autonomous systems with a higher level
understanding of the visual world. Currently, deep learning is enabling
reinforcement learning to scale to problems that were previously intractable,
such as learning to play video games directly from pixels. Deep reinforcement
learning algorithms are also applied to robotics, allowing control policies for
robots to be learned directly from camera inputs in the real world. In this
survey, we begin with an introduction to the general field of reinforcement
learning, then progress to the main streams of value-based and policy-based
methods. Our survey will cover central algorithms in deep reinforcement
learning, including the deep -network, trust region policy optimisation, and
asynchronous advantage actor-critic. In parallel, we highlight the unique
advantages of deep neural networks, focusing on visual understanding via
reinforcement learning. To conclude, we describe several current areas of
research within the field.Comment: IEEE Signal Processing Magazine, Special Issue on Deep Learning for
Image Understanding (arXiv extended version
- …