Search CORE

31,292 research outputs found

CrossNorm: Normalization for Off-Policy TD Reinforcement Learning

Author: Amiranashvili Artemij
Argus Max
Bhatt Aditya
Brox Thomas
Publication venue
Publication date: 17/10/2019
Field of study

Off-policy temporal difference (TD) methods are a powerful class of reinforcement learning (RL) algorithms. Intriguingly, deep off-policy TD algorithms are not commonly used in combination with feature normalization techniques, despite positive effects of normalization in other domains. We show that naive application of existing normalization techniques is indeed not effective, but that well-designed normalization improves optimization stability and removes the necessity of target networks. In particular, we introduce a normalization based on a mixture of on- and off-policy transitions, which we call cross-normalization. It can be regarded as an extension of batch normalization that re-centers data for two different distributions, as present in off-policy learning. Applied to DDPG and TD3, cross-normalization improves over the state of the art across a range of MuJoCo benchmark tasks

arXiv.org e-Print Archive