23 research outputs found
Explainability in Deep Reinforcement Learning
A large set of the explainable Artificial Intelligence (XAI) literature is
emerging on feature relevance techniques to explain a deep neural network (DNN)
output or explaining models that ingest image source data. However, assessing
how XAI techniques can help understand models beyond classification tasks, e.g.
for reinforcement learning (RL), has not been extensively studied. We review
recent works in the direction to attain Explainable Reinforcement Learning
(XRL), a relatively new subfield of Explainable Artificial Intelligence,
intended to be used in general public applications, with diverse audiences,
requiring ethical, responsible and trustable algorithms. In critical situations
where it is essential to justify and explain the agent's behaviour, better
explainability and interpretability of RL models could help gain scientific
insight on the inner workings of what is still considered a black box. We
evaluate mainly studies directly linking explainability to RL, and split these
into two categories according to the way the explanations are generated:
transparent algorithms and post-hoc explainaility. We also review the most
prominent XAI works from the lenses of how they could potentially enlighten the
further deployment of the latest advances in RL, in the demanding present and
future of everyday problems.Comment: Article accepted at Knowledge-Based System
IxDRL: A Novel Explainable Deep Reinforcement Learning Toolkit based on Analyses of Interestingness
In recent years, advances in deep learning have resulted in a plethora of
successes in the use of reinforcement learning (RL) to solve complex sequential
decision tasks with high-dimensional inputs. However, existing systems lack the
necessary mechanisms to provide humans with a holistic view of their
competence, presenting an impediment to their adoption, particularly in
critical applications where the decisions an agent makes can have significant
consequences. Yet, existing RL-based systems are essentially competency-unaware
in that they lack the necessary interpretation mechanisms to allow human
operators to have an insightful, holistic view of their competency. Towards
more explainable Deep RL (xDRL), we propose a new framework based on analyses
of interestingness. Our tool provides various measures of RL agent competence
stemming from interestingness analysis and is applicable to a wide range of RL
algorithms, natively supporting the popular RLLib toolkit. We showcase the use
of our framework by applying the proposed pipeline in a set of scenarios of
varying complexity. We empirically assess the capability of the approach in
identifying agent behavior patterns and competency-controlling conditions, and
the task elements mostly responsible for an agent's competence, based on global
and local analyses of interestingness. Overall, we show that our framework can
provide agent designers with insights about RL agent competence, both their
capabilities and limitations, enabling more informed decisions about
interventions, additional training, and other interactions in collaborative
human-machine settings.Comment: To be published in the Proceedings of the 1st World Conference on
eXplainable Artificial Intelligence (xAI 2023). arXiv admin note: substantial
text overlap with arXiv:2211.0637
Explainability in Deep Reinforcement Learning
International audienceA large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems
Outcome-Guided Counterfactuals for Reinforcement Learning Agents from a Jointly Trained Generative Latent Space
We present a novel generative method for producing unseen and plausible
counterfactual examples for reinforcement learning (RL) agents based upon
outcome variables that characterize agent behavior. Our approach uses a
variational autoencoder to train a latent space that jointly encodes
information about the observations and outcome variables pertaining to an
agent's behavior. Counterfactuals are generated using traversals in this latent
space, via gradient-driven updates as well as latent interpolations against
cases drawn from a pool of examples. These include updates to raise the
likelihood of generated examples, which improves the plausibility of generated
counterfactuals. From experiments in three RL environments, we show that these
methods produce counterfactuals that are more plausible and proximal to their
queries compared to purely outcome-driven or case-based baselines. Finally, we
show that a latent jointly trained to reconstruct both the input observations
and behavioral outcome variables produces higher-quality counterfactuals over
latents trained solely to reconstruct the observation inputs
Reinforcement Learning Your Way : Agent Characterization through Policy Regularization
The increased complexity of state-of-the-art reinforcement learning (RL) algorithms has resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post hoc explainability methods that aim to extract information from learned policies, thus aiding explainability. These methods rely on empirical observations of the policy, and thus aim to generalize a characterization of agents’ behaviour. In this study, we have instead developed a method to imbue agents’ policies with a characteristic behaviour through regularization of their objective functions. Our method guides the agents’ behaviour during learning, which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customers’ investment portfolios based on their spending personalities.publishedVersio
Notions of explainability and evaluation approaches for explainable artificial intelligence
Explainable Artificial Intelligence (XAI) has experienced a significant growth over the last few years. This is due to the widespread application of machine learning, particularly deep learning, that has led to the development of highly accurate models that lack explainability and interpretability. A plethora of methods to tackle this problem have been proposed, developed and tested, coupled with several studies attempting to define the concept of explainability and its evaluation. This systematic review contributes to the body of knowledge by clustering all the scientific studies via a hierarchical system that classifies theories and notions related to the concept of explainability and the evaluation approaches for XAI methods. The structure of this hierarchy builds on top of an exhaustive analysis of existing taxonomies and peer-reviewed scientific material. Findings suggest that scholars have identified numerous notions and requirements that an explanation should meet in order to be easily understandable by end-users and to provide actionable information that can inform decision making. They have also suggested various approaches to assess to what degree machine-generated explanations meet these demands. Overall, these approaches can be clustered into human-centred evaluations and evaluations with more objective metrics. However, despite the vast body of knowledge developed around the concept of explainability, there is not a general consensus among scholars on how an explanation should be defined, and how its validity and reliability assessed. Eventually, this review concludes by critically discussing these gaps and limitations, and it defines future research directions with explainability as the starting component of any artificial intelligent system
Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps
With advances in reinforcement learning (RL), agents are now being developed
in high-stakes application domains such as healthcare and transportation.
Explaining the behavior of these agents is challenging, as the environments in
which they act have large state spaces, and their decision-making can be
affected by delayed rewards, making it difficult to analyze their behavior. To
address this problem, several approaches have been developed. Some approaches
attempt to convey the behavior of the agent, describing the
actions it takes in different states. Other approaches devised
explanations which provide information regarding the agent's decision-making in
a particular state. In this paper, we combine global and local explanation
methods, and evaluate their joint and separate contributions, providing (to the
best of our knowledge) the first user study of combined local and global
explanations for RL agents. Specifically, we augment strategy summaries that
extract important trajectories of states from simulations of the agent with
saliency maps which show what information the agent attends to. Our results
show that the choice of what states to include in the summary (global
information) strongly affects people's understanding of agents: participants
shown summaries that included important states significantly outperformed
participants who were presented with agent behavior in a randomly set of chosen
world-states. We find mixed results with respect to augmenting demonstrations
with saliency maps (local information), as the addition of saliency maps did
not significantly improve performance in most cases. However, we do find some
evidence that saliency maps can help users better understand what information
the agent relies on in its decision making, suggesting avenues for future work
that can further improve explanations of RL agents
Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability
The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities.
Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio