In real scenarios, state observations that an agent observes may contain
measurement errors or adversarial noises, misleading the agent to take
suboptimal actions or even collapse while training. In this paper, we study the
training robustness of distributional Reinforcement Learning~(RL), a class of
state-of-the-art methods that estimate the whole distribution, as opposed to
only the expectation, of the total return. Firstly, we validate the contraction
of distributional Bellman operators in the State-Noisy Markov Decision
Process~(SN-MDP), a typical tabular case that incorporates both random and
adversarial state observation noises. In the noisy setting with function
approximation, we then analyze the vulnerability of least squared loss in
expectation-based RL with either linear or nonlinear function approximation. By
contrast, we theoretically characterize the bounded gradient norm of
distributional RL loss based on the categorical parameterization equipped with
the Kullback-Leibler~(KL) divergence. The resulting stable gradients while the
optimization in distributional RL accounts for its better training robustness
against state observation noises. Finally, extensive experiments on the suite
of environments verified that distributional RL is less vulnerable against both
random and adversarial noisy state observations compared with its
expectation-based counterpart