16,778 research outputs found
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Marginalized importance sampling (MIS), which measures the density ratio
between the state-action occupancy of a target policy and that of a sampling
distribution, is a promising approach for off-policy evaluation. However,
current state-of-the-art MIS methods rely on complex optimization tricks and
succeed mostly on simple toy problems. We bridge the gap between MIS and deep
reinforcement learning by observing that the density ratio can be computed from
the successor representation of the target policy. The successor representation
can be trained through deep reinforcement learning methodology and decouples
the reward optimization from the dynamics of the environment, making the
resulting algorithm stable and applicable to high-dimensional domains. We
evaluate the empirical performance of our approach on a variety of challenging
Atari and MuJoCo environments.Comment: ICML 202
An introduction to reinforcement learning for neuroscience
Reinforcement learning has a rich history in neuroscience, from early work on
dopamine as a reward prediction error signal for temporal difference learning
(Schultz et al., 1997) to recent work suggesting that dopamine could implement
a form of 'distributional reinforcement learning' popularized in deep learning
(Dabney et al., 2020). Throughout this literature, there has been a tight link
between theoretical advances in reinforcement learning and neuroscientific
experiments and findings. As a result, the theories describing our experimental
data have become increasingly complex and difficult to navigate. In this
review, we cover the basic theory underlying classical work in reinforcement
learning and build up to an introductory overview of methods used in modern
deep reinforcement learning that have found applications in systems
neuroscience. We start with an overview of the reinforcement learning problem
and classical temporal difference algorithms, followed by a discussion of
'model-free' and 'model-based' reinforcement learning together with methods
such as DYNA and successor representations that fall in between these two
categories. Throughout these sections, we highlight the close parallels between
the machine learning methods and related work in both experimental and
theoretical neuroscience. We then provide an introduction to deep reinforcement
learning with examples of how these methods have been used to model different
learning phenomena in the systems neuroscience literature, such as
meta-reinforcement learning (Wang et al., 2018) and distributional
reinforcement learning (Dabney et al., 2020). Code that implements the methods
discussed in this work and generates the figures is also provided.Comment: Code available at:
https://colab.research.google.com/drive/1kWOz2Uxn0cf2c4YizqIXQKWyxeYd6wvL?usp=sharin
Count-Based Exploration with the Successor Representation
In this paper we introduce a simple approach for exploration in reinforcement
learning (RL) that allows us to develop theoretically justified algorithms in
the tabular case but that is also extendable to settings where function
approximation is required. Our approach is based on the successor
representation (SR), which was originally introduced as a representation
defining state generalization by the similarity of successor states. Here we
show that the norm of the SR, while it is being learned, can be used as a
reward bonus to incentivize exploration. In order to better understand this
transient behavior of the norm of the SR we introduce the substochastic
successor representation (SSR) and we show that it implicitly counts the number
of times each state (or feature) has been observed. We use this result to
introduce an algorithm that performs as well as some theoretically
sample-efficient approaches. Finally, we extend these ideas to a deep RL
algorithm and show that it achieves state-of-the-art performance in Atari 2600
games when in a low sample-complexity regime.Comment: This paper appears in the Proceedings of the 34th AAAI Conference on
Artificial Intelligence (AAAI 2020
- …