Despite the impressive capabilities of Deep Reinforcement Learning (DRL)
agents in many challenging scenarios, their black-box decision-making process
significantly limits their deployment in safety-sensitive domains. Several
previous self-interpretable works focus on revealing the critical states of the
agent's decision. However, they cannot pinpoint the error-prone states. To
address this issue, we propose a novel self-interpretable structure, named
Backbone Extract Tree (BET), to better explain the agent's behavior by identify
the error-prone states. At a high level, BET hypothesizes that states in which
the agent consistently executes uniform decisions exhibit a reduced propensity
for errors. To effectively model this phenomenon, BET expresses these states
within neighborhoods, each defined by a curated set of representative states.
Therefore, states positioned at a greater distance from these representative
benchmarks are more prone to error. We evaluate BET in various popular RL
environments and show its superiority over existing self-interpretable models
in terms of explanation fidelity. Furthermore, we demonstrate a use case for
providing explanations for the agents in StarCraft II, a sophisticated
multi-agent cooperative game. To the best of our knowledge, we are the first to
explain such a complex scenarios using a fully transparent structure.Comment: This is an early version of a paper that submitted to IJCAI 2024 8
pages, 4 figures and 1 tabl