A Regularized Implicit Policy for Offline Reinforcement Learning

Feng, Yihao; Wang, Zhendong; Yang, Shentao; Zheng, Huangjie; Zhou, Mingyuan

A Regularized Implicit Policy for Offline Reinforcement Learning

Authors: Yihao Feng
Zhendong Wang
Shentao Yang
Huangjie Zheng
Mingyuan Zhou
Publication date: 19 February 2022
Publisher

Abstract

Offline reinforcement learning enables learning from a fixed dataset, without further interactions with the environment. The lack of environmental interactions makes the policy training vulnerable to state-action pairs far from the training dataset and prone to missing rewarding actions. For training more effective agents, we propose a framework that supports learning a flexible yet well-regularized fully-implicit policy. We further propose a simple modification to the classical policy-matching methods for regularizing with respect to the dual form of the Jensen--Shannon divergence and the integral probability metrics. We theoretically show the correctness of the policy-matching approach, and the correctness and a good finite-sample property of our modification. An effective instantiation of our framework through the GAN structure is provided, together with techniques to explicitly smooth the state-action mapping for robust generalization beyond the static dataset. Extensive experiments and ablation study on the D4RL dataset validate our framework and the effectiveness of our algorithmic designs

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2202.09673

Last time updated on 08/04/2022