Frustratingly Easy Regularization on Representation Can Boost Deep
  Reinforcement Learning

He, Qiang; Hou, Xinwen; Su, Huangyuan; Zhang, Jieyu

Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

Authors: Qiang He
Xinwen Hou
Huangyuan Su
Jieyu Zhang
Publication date: 23 April 2023
Publisher

Abstract

Deep reinforcement learning (DRL) gives the promise that an agent learns good policy from high-dimensional information, whereas representation learning removes irrelevant and redundant information and retains pertinent information. In this work, we demonstrate that the learned representation of the

Q

-network and its target

Q

-network should, in theory, satisfy a favorable distinguishable representation property. Specifically, there exists an upper bound on the representation similarity of the value functions of two adjacent time steps in a typical DRL setting. However, through illustrative experiments, we show that the learned DRL agent may violate this property and lead to a sub-optimal policy. Therefore, we propose a simple yet effective regularizer called Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations. And we provide the convergence rate guarantee of PEER. Implementing PEER requires only one line of code. Our experiments demonstrate that incorporating PEER into DRL can significantly improve performance and sample efficiency. Comprehensive experiments show that PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of our knowledge, PEER is the first work to study the inherent representation property of Q-network and its target. Our code is available at https://sites.google.com/view/peer-cvpr2023/.Comment: Accepted to CVPR23. Website: https://sites.google.com/view/peer-cvpr2023

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2205.14557

Last time updated on 14/08/2022