As we deploy autonomous agents in safety-critical domains, it becomes
important to develop an understanding of their internal mechanisms and
representations. We outline an approach to imitation learning for
reverse-engineering black box agent policies in MDP environments, yielding
simplified, interpretable models in the form of decision trees. As part of this
process, we explicitly model and learn agents' latent state representations by
selecting from a large space of candidate features constructed from the Markov
state. We present initial promising results from an implementation in a
multi-agent traffic environment.Comment: 6 pages, 3 figures; under review for the 1st TAILOR Workshop, due to
take place 29-30 August 2020 in Santiago de Compostel