Interpreting Distributional Reinforcement Learning: A Regularization Perspective

Abstract

Distributional reinforcement learning~(RL) is a class of state-of-the-art algorithms that estimate the whole distribution of the total return rather than only its expectation. Despite the remarkable performance of distributional RL, a theoretical understanding of its advantages over expectation-based RL remains elusive. In this paper, we attribute the superiority of distributional RL to its regularization effect in terms of the value distribution information regardless of its expectation. Firstly, by leverage of a variant of the gross error model in robust statistics, we decompose the value distribution into its expectation and the remaining distribution part. As such, the extra benefit of distributional RL compared with expectation-based RL is mainly interpreted as the impact of a \textit{risk-sensitive entropy regularization} within the Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between the risk-sensitive entropy regularization of distributional RL and the vanilla entropy in maximum entropy RL, focusing specifically on actor-critic algorithms. It reveals that distributional RL induces a corrected reward function and thus promotes a risk-sensitive exploration against the intrinsic uncertainty of the environment. Finally, extensive experiments corroborate the role of the regularization effect of distributional RL and uncover mutual impacts of different entropy regularization. Our research paves a way towards better interpreting the efficacy of distributional RL algorithms, especially through the lens of regularization

    Similar works

    Full text

    thumbnail-image

    Available Versions