6,799 research outputs found
Distributional Reinforcement Learning for Efficient Exploration
In distributional reinforcement learning (RL), the estimated distribution of
value function models both the parametric and intrinsic uncertainties. We
propose a novel and efficient exploration method for deep RL that has two
components. The first is a decaying schedule to suppress the intrinsic
uncertainty. The second is an exploration bonus calculated from the upper
quantiles of the learned distribution. In Atari 2600 games, our method
outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain
across 49 games in cumulative rewards over QR-DQN with a big win in Venture).
We also compared our algorithm with QR-DQN in a challenging 3D driving
simulator (CARLA). Results show that our algorithm achieves near-optimal safety
rewards twice faster than QRDQN
Estimating Risk and Uncertainty in Deep Reinforcement Learning
Reinforcement learning agents are faced with two types of uncertainty.
Epistemic uncertainty stems from limited data and is useful for exploration,
whereas aleatoric uncertainty arises from stochastic environments and must be
accounted for in risk-sensitive applications. We highlight the challenges
involved in simultaneously estimating both of them, and propose a framework for
disentangling and estimating these uncertainties on learned Q-values. We derive
unbiased estimators of these uncertainties and introduce an uncertainty-aware
DQN algorithm, which we show exhibits safe learning behavior and outperforms
other DQN variants on the MinAtar testbed.Comment: Work presented at the ICML 2020 Workshop on Uncertainty and
Robustness in Deep Learnin
Randomized Prior Functions for Deep Reinforcement Learning
Dealing with uncertainty is essential for efficient reinforcement learning.
There is a growing literature on uncertainty estimation for deep learning from
fixed datasets, but many of the most popular approaches are poorly-suited to
sequential decision problems. Other methods, such as bootstrap sampling, have
no mechanism for uncertainty that does not come from the observed data. We
highlight why this can be a crucial shortcoming and propose a simple remedy
through addition of a randomized untrainable `prior' network to each ensemble
member. We prove that this approach is efficient with linear representations,
provide simple illustrations of its efficacy with nonlinear representations and
show that this approach scales to large-scale problems far better than previous
attempts
DSAC: Distributional Soft Actor Critic for Risk-Sensitive Reinforcement Learning
In this paper, we present a new reinforcement learning (RL) algorithm called
Distributional Soft Actor Critic (DSAC), which exploits the distributional
information of accumulated rewards to achieve better performance. Seamlessly
integrating SAC (which uses entropy to encourage exploration) with a principled
distributional view of the underlying objective, DSAC takes into consideration
the randomness in both action and rewards, and beats the state-of-the-art
baselines in several continuous control benchmarks. Moreover, with the
distributional information of rewards, we propose a unified framework for
risk-sensitive learning, one that goes beyond maximizing only expected
accumulated rewards. Under this framework we discuss three specific
risk-related metrics: percentile, mean-variance and distorted expectation. Our
extensive experiments demonstrate that with distribution modeling in RL, the
agent performs better for both risk-averse and risk-seeking control tasks
The Potential of the Return Distribution for Exploration in RL
This paper studies the potential of the return distribution for exploration
in deterministic reinforcement learning (RL) environments. We study network
losses and propagation mechanisms for Gaussian, Categorical and Gaussian
mixture distributions. Combined with exploration policies that leverage this
return distribution, we solve, for example, a randomized Chain task of length
100, which has not been reported before when learning with neural networks.Comment: Published at the Exploration in Reinforcement Learning Workshop at
the 35th International Conference on Machine Learning, Stockholm, Swede
QUOTA: The Quantile Option Architecture for Reinforcement Learning
In this paper, we propose the Quantile Option Architecture (QUOTA) for
exploration based on recent advances in distributional reinforcement learning
(RL). In QUOTA, decision making is based on quantiles of a value distribution,
not only the mean. QUOTA provides a new dimension for exploration via making
use of both optimism and pessimism of a value distribution. We demonstrate the
performance advantage of QUOTA in both challenging video games and physical
robot simulators.Comment: AAAI 201
Efficient exploration with Double Uncertain Value Networks
This paper studies directed exploration for reinforcement learning agents by
tracking uncertainty about the value of each available action. We identify two
sources of uncertainty that are relevant for exploration. The first originates
from limited data (parametric uncertainty), while the second originates from
the distribution of the returns (return uncertainty). We identify methods to
learn these distributions with deep neural networks, where we estimate
parametric uncertainty with Bayesian drop-out, while return uncertainty is
propagated through the Bellman equation as a Gaussian distribution. Then, we
identify that both can be jointly estimated in one network, which we call the
Double Uncertain Value Network. The policy is directly derived from the learned
distributions based on Thompson sampling. Experimental results show that both
types of uncertainty may vastly improve learning in domains with a strong
exploration challenge.Comment: Deep Reinforcement Learning Symposium @ Conference on Neural
Information Processing Systems (NIPS) 201
Rainbow: Combining Improvements in Deep Reinforcement Learning
The deep reinforcement learning community has made several independent
improvements to the DQN algorithm. However, it is unclear which of these
extensions are complementary and can be fruitfully combined. This paper
examines six extensions to the DQN algorithm and empirically studies their
combination. Our experiments show that the combination provides
state-of-the-art performance on the Atari 2600 benchmark, both in terms of data
efficiency and final performance. We also provide results from a detailed
ablation study that shows the contribution of each component to overall
performance.Comment: Under review as a conference paper at AAAI 201
Towards Better Interpretability in Deep Q-Networks
Deep reinforcement learning techniques have demonstrated superior performance
in a wide variety of environments. As improvements in training algorithms
continue at a brisk pace, theoretical or empirical studies on understanding
what these networks seem to learn, are far behind. In this paper we propose an
interpretable neural network architecture for Q-learning which provides a
global explanation of the model's behavior using key-value memories, attention
and reconstructible embeddings. With a directed exploration strategy, our model
can reach training rewards comparable to the state-of-the-art deep Q-learning
models. However, results suggest that the features extracted by the neural
network are extremely shallow and subsequent testing using out-of-sample
examples shows that the agent can easily overfit to trajectories seen during
training.Comment: Accepted at AAAI-19; (16 pages, 18 figures
Implicit Quantile Networks for Distributional Reinforcement Learning
In this work, we build on recent advances in distributional reinforcement
learning to give a generally applicable, flexible, and state-of-the-art
distributional variant of DQN. We achieve this by using quantile regression to
approximate the full quantile function for the state-action return
distribution. By reparameterizing a distribution over the sample space, this
yields an implicitly defined return distribution and gives rise to a large
class of risk-sensitive policies. We demonstrate improved performance on the 57
Atari 2600 games in the ALE, and use our algorithm's implicitly defined
distributions to study the effects of risk-sensitive policies in Atari games.Comment: ICML 201
- …