1,470 research outputs found
Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps
With advances in reinforcement learning (RL), agents are now being developed
in high-stakes application domains such as healthcare and transportation.
Explaining the behavior of these agents is challenging, as the environments in
which they act have large state spaces, and their decision-making can be
affected by delayed rewards, making it difficult to analyze their behavior. To
address this problem, several approaches have been developed. Some approaches
attempt to convey the behavior of the agent, describing the
actions it takes in different states. Other approaches devised
explanations which provide information regarding the agent's decision-making in
a particular state. In this paper, we combine global and local explanation
methods, and evaluate their joint and separate contributions, providing (to the
best of our knowledge) the first user study of combined local and global
explanations for RL agents. Specifically, we augment strategy summaries that
extract important trajectories of states from simulations of the agent with
saliency maps which show what information the agent attends to. Our results
show that the choice of what states to include in the summary (global
information) strongly affects people's understanding of agents: participants
shown summaries that included important states significantly outperformed
participants who were presented with agent behavior in a randomly set of chosen
world-states. We find mixed results with respect to augmenting demonstrations
with saliency maps (local information), as the addition of saliency maps did
not significantly improve performance in most cases. However, we do find some
evidence that saliency maps can help users better understand what information
the agent relies on in its decision making, suggesting avenues for future work
that can further improve explanations of RL agents
Explainability in Deep Reinforcement Learning
A large set of the explainable Artificial Intelligence (XAI) literature is
emerging on feature relevance techniques to explain a deep neural network (DNN)
output or explaining models that ingest image source data. However, assessing
how XAI techniques can help understand models beyond classification tasks, e.g.
for reinforcement learning (RL), has not been extensively studied. We review
recent works in the direction to attain Explainable Reinforcement Learning
(XRL), a relatively new subfield of Explainable Artificial Intelligence,
intended to be used in general public applications, with diverse audiences,
requiring ethical, responsible and trustable algorithms. In critical situations
where it is essential to justify and explain the agent's behaviour, better
explainability and interpretability of RL models could help gain scientific
insight on the inner workings of what is still considered a black box. We
evaluate mainly studies directly linking explainability to RL, and split these
into two categories according to the way the explanations are generated:
transparent algorithms and post-hoc explainaility. We also review the most
prominent XAI works from the lenses of how they could potentially enlighten the
further deployment of the latest advances in RL, in the demanding present and
future of everyday problems.Comment: Article accepted at Knowledge-Based System
Testing reinforcement learning explainability methods in a multi-agent cooperative environment
The adoption of algorithms based on Artificial Intelligence (AI) has been rapidly increasing during the last years. However, some aspects of AI techniques are under heavy scrutiny. For instance, in many cases, it is not clear whether the decisions of an algorithm are well-informed and reliable. Having an answer to these concerns is crucial in many domains, such as those in were humans and intelligent agents must cooperate in a shared environment. In this paper, we introduce an application of an explainability method based on the creation of a Policy Graph (PG) based on discrete predicates that represent and explain a trained agentâs behaviour in a multi-agent cooperative environment. We also present a method to measure the similarity between the explanations obtained and the agentâs behaviour, by building an agent with a policy based on the PG and comparing the behaviour of the two agents.This work has been partially supported by the H2020 knowlEdge European project (Grant agreement ID: 957331).Peer ReviewedPostprint (published version
Reinforcement Learning Your Way : Agent Characterization through Policy Regularization
The increased complexity of state-of-the-art reinforcement learning (RL) algorithms has resulted in an opacity that inhibits explainability and understanding. This has led to the development of several post hoc explainability methods that aim to extract information from learned policies, thus aiding explainability. These methods rely on empirical observations of the policy, and thus aim to generalize a characterization of agentsâ behaviour. In this study, we have instead developed a method to imbue agentsâ policies with a characteristic behaviour through regularization of their objective functions. Our method guides the agentsâ behaviour during learning, which results in an intrinsic characterization; it connects the learning process with model explanation. We provide a formal argument and empirical evidence for the viability of our method. In future work, we intend to employ it to develop agents that optimize individual financial customersâ investment portfolios based on their spending personalities.publishedVersio
Experiential Explanations for Reinforcement Learning
Reinforcement Learning (RL) approaches are becoming increasingly popular in
various key disciplines, including robotics and healthcare. However, many of
these systems are complex and non-interpretable, making it challenging for
non-AI experts to understand or intervene. One of the challenges of explaining
RL agent behavior is that, when learning to predict future expected reward,
agents discard contextual information about their experiences when training in
an environment and rely solely on expected utility. We propose a technique,
Experiential Explanations, for generating local counterfactual explanations
that can answer users' why-not questions by explaining qualitatively the
effects of the various environmental rewards on the agent's behavior. We
achieve this by training additional modules alongside the policy. These models,
called influence predictors, model how different reward sources influence the
agent's policy, thus restoring lost contextual information about how the policy
reflects the environment. To generate explanations, we use these models in
addition to the policy to contrast between the agent's intended behavior
trajectory and a counterfactual trajectory suggested by the user
Inherently Explainable Reinforcement Learning in Natural Language
We focus on the task of creating a reinforcement learning agent that is
inherently explainable -- with the ability to produce immediate local
explanations by thinking out loud while performing a task and analyzing entire
trajectories post-hoc to produce causal explanations. This Hierarchically
Explainable Reinforcement Learning agent (HEX-RL), operates in Interactive
Fictions, text-based game environments in which an agent perceives and acts
upon the world using textual natural language. These games are usually
structured as puzzles or quests with long-term dependencies in which an agent
must complete a sequence of actions to succeed -- providing ideal environments
in which to test an agent's ability to explain its actions. Our agent is
designed to treat explainability as a first-class citizen, using an extracted
symbolic knowledge graph-based state representation coupled with a Hierarchical
Graph Attention mechanism that points to the facts in the internal graph
representation that most influenced the choice of actions. Experiments show
that this agent provides significantly improved explanations over strong
baselines, as rated by human participants generally unfamiliar with the
environment, while also matching state-of-the-art task performance
Explaining classifiersâ outputs with causal models and argumentation
We introduce a conceptualisation for generating argumentation frameworks (AFs) from causal models for the purpose of forging explanations for mod-elsâ outputs. The conceptualisation is based on reinterpreting properties of semantics of AFs as explanation moulds, which are means for characterising argumentative relations. We demonstrate our methodology by reinterpreting the property of bi-variate reinforcement in bipolar AFs, showing how the ex-tracted bipolar AFs may be used as relation-based explanations for the outputs of causal models. We then evaluate our method empirically when the causal models represent (Bayesian and neural network) machine learning models for classification. The results show advantages over a popular approach from the literature, both in highlighting specific relationships between feature and classification variables and in generating counterfactual explanations with respect to a commonly used metric
- âŠ