Dual RL: Unification and New Methods for Reinforcement and Imitation
  Learning

Niekum, Scott; Sikchi, Harshit; Zhang, Amy; Zheng, Qinqing

Dual RL: Unification and New Methods for Reinforcement and Imitation Learning

Authors: Scott Niekum
Harshit Sikchi
Amy Zhang
Qinqing Zheng
Publication date: 22 June 2023
Publisher

Abstract

The goal of reinforcement learning (RL) is to maximize the expected cumulative return. It has been shown that this objective can be represented by an optimization problem of the state-action visitation distribution under linear constraints. The dual problem of this formulation, which we refer to as dual RL, is unconstrained and easier to optimize. We show that several state-of-the-art off-policy deep reinforcement learning (RL) algorithms, under both online and offline, RL and imitation learning (IL) settings, can be viewed as dual RL approaches in a unified framework. This unification provides a common ground to study and identify the components that contribute to the success of these methods and also reveals the common shortcomings across methods with new insights for improvement. Our analysis shows that prior off-policy imitation learning methods are based on an unrealistic coverage assumption and are minimizing a particular f-divergence between the visitation distributions of the learned policy and the expert policy. We propose a new method using a simple modification to the dual RL framework that allows for performant imitation learning with arbitrary off-policy data to obtain near-expert performance, without learning a discriminator. Further, by framing a recent SOTA offline RL method XQL in the dual RL framework, we propose alternative choices to replace the Gumbel regression loss, which achieve improved performance and resolve the training instability issue of XQL. Project code and details can be found at this https://hari-sikchi.github.io/dual-rl.Comment: 46 pages. Under revie

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2302.08560

Last time updated on 16/03/2023