Since their introduction a year ago, distributional approaches to
reinforcement learning (distributional RL) have produced strong results
relative to the standard approach which models expected values (expected RL).
However, aside from convergence guarantees, there have been few theoretical
results investigating the reasons behind the improvements distributional RL
provides. In this paper we begin the investigation into this fundamental
question by analyzing the differences in the tabular, linear approximation, and
non-linear approximation settings. We prove that in many realizations of the
tabular and linear approximation settings, distributional RL behaves exactly
the same as expected RL. In cases where the two methods behave differently,
distributional RL can in fact hurt performance when it does not induce
identical behaviour. We then continue with an empirical analysis comparing
distributional and expected RL methods in control settings with non-linear
approximators to tease apart where the improvements from distributional RL
methods are coming from.Comment: To appear in the Proceedings of the Thirty-Third AAAI Conference on
Artificial Intelligenc