Distributional reinforcement learning~(RL) is a class of state-of-the-art
algorithms that estimate the whole distribution of the total return rather than
only its expectation. Despite the remarkable performance of distributional RL,
a theoretical understanding of its advantages over expectation-based RL remains
elusive. In this paper, we attribute the superiority of distributional RL to
its regularization effect in terms of the value distribution information
regardless of its expectation. Firstly, by leverage of a variant of the gross
error model in robust statistics, we decompose the value distribution into its
expectation and the remaining distribution part. As such, the extra benefit of
distributional RL compared with expectation-based RL is mainly interpreted as
the impact of a \textit{risk-sensitive entropy regularization} within the
Neural Fitted Z-Iteration framework. Meanwhile, we establish a bridge between
the risk-sensitive entropy regularization of distributional RL and the vanilla
entropy in maximum entropy RL, focusing specifically on actor-critic
algorithms. It reveals that distributional RL induces a corrected reward
function and thus promotes a risk-sensitive exploration against the intrinsic
uncertainty of the environment. Finally, extensive experiments corroborate the
role of the regularization effect of distributional RL and uncover mutual
impacts of different entropy regularization. Our research paves a way towards
better interpreting the efficacy of distributional RL algorithms, especially
through the lens of regularization