4 research outputs found
A Deep Reinforcement Learning Approach to Marginalized Importance Sampling with the Successor Representation
Marginalized importance sampling (MIS), which measures the density ratio
between the state-action occupancy of a target policy and that of a sampling
distribution, is a promising approach for off-policy evaluation. However,
current state-of-the-art MIS methods rely on complex optimization tricks and
succeed mostly on simple toy problems. We bridge the gap between MIS and deep
reinforcement learning by observing that the density ratio can be computed from
the successor representation of the target policy. The successor representation
can be trained through deep reinforcement learning methodology and decouples
the reward optimization from the dynamics of the environment, making the
resulting algorithm stable and applicable to high-dimensional domains. We
evaluate the empirical performance of our approach on a variety of challenging
Atari and MuJoCo environments.Comment: ICML 202