In recent years, reinforcement learning (RL) has become increasingly
successful in its application to science and the process of scientific
discovery in general. However, while RL algorithms learn to solve increasingly
complex problems, interpreting the solutions they provide becomes ever more
challenging. In this work, we gain insights into an RL agent's learned behavior
through a post-hoc analysis based on sequence mining and clustering.
Specifically, frequent and compact subroutines, used by the agent to solve a
given task, are distilled as gadgets and then grouped by various metrics. This
process of gadget discovery develops in three stages: First, we use an RL agent
to generate data, then, we employ a mining algorithm to extract gadgets and
finally, the obtained gadgets are grouped by a density-based clustering
algorithm. We demonstrate our method by applying it to two quantum-inspired RL
environments. First, we consider simulated quantum optics experiments for the
design of high-dimensional multipartite entangled states where the algorithm
finds gadgets that correspond to modern interferometer setups. Second, we
consider a circuit-based quantum computing environment where the algorithm
discovers various gadgets for quantum information processing, such as quantum
teleportation. This approach for analyzing the policy of a learned agent is
agent and environment agnostic and can yield interesting insights into any
agent's policy