Deep reinforcement learning (DRL) gives the promise that an agent learns good
policy from high-dimensional information, whereas representation learning
removes irrelevant and redundant information and retains pertinent information.
In this work, we demonstrate that the learned representation of the Q-network
and its target Q-network should, in theory, satisfy a favorable
distinguishable representation property. Specifically, there exists an upper
bound on the representation similarity of the value functions of two adjacent
time steps in a typical DRL setting. However, through illustrative experiments,
we show that the learned DRL agent may violate this property and lead to a
sub-optimal policy. Therefore, we propose a simple yet effective regularizer
called Policy Evaluation with Easy Regularization on Representation (PEER),
which aims to maintain the distinguishable representation property via explicit
regularization on internal representations. And we provide the convergence rate
guarantee of PEER. Implementing PEER requires only one line of code. Our
experiments demonstrate that incorporating PEER into DRL can significantly
improve performance and sample efficiency. Comprehensive experiments show that
PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9
out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of
our knowledge, PEER is the first work to study the inherent representation
property of Q-network and its target. Our code is available at
https://sites.google.com/view/peer-cvpr2023/.Comment: Accepted to CVPR23. Website:
https://sites.google.com/view/peer-cvpr2023