Recent years saw a plethora of work on explaining complex intelligent agents.
One example is the development of several algorithms that generate saliency
maps which show how much each pixel attributed to the agents' decision.
However, most evaluations of such saliency maps focus on image classification
tasks. As far as we know, there is no work which thoroughly compares different
saliency maps for Deep Reinforcement Learning agents. This paper compares four
perturbation-based approaches to create saliency maps for Deep Reinforcement
Learning agents trained on four different Atari 2600 games. All four approaches
work by perturbing parts of the input and measuring how much this affects the
agent's output. The approaches are compared using three computational metrics:
dependence on the learned parameters of the agent (sanity checks), faithfulness
to the agent's reasoning (input degradation), and run-time.Comment: Presented on the Explainable Agency in Artificial Intelligence
Workshop during the 35th AAAI Conference on Artificial Intelligenc