5 research outputs found
Explainability in Deep Reinforcement Learning
A large set of the explainable Artificial Intelligence (XAI) literature is
emerging on feature relevance techniques to explain a deep neural network (DNN)
output or explaining models that ingest image source data. However, assessing
how XAI techniques can help understand models beyond classification tasks, e.g.
for reinforcement learning (RL), has not been extensively studied. We review
recent works in the direction to attain Explainable Reinforcement Learning
(XRL), a relatively new subfield of Explainable Artificial Intelligence,
intended to be used in general public applications, with diverse audiences,
requiring ethical, responsible and trustable algorithms. In critical situations
where it is essential to justify and explain the agent's behaviour, better
explainability and interpretability of RL models could help gain scientific
insight on the inner workings of what is still considered a black box. We
evaluate mainly studies directly linking explainability to RL, and split these
into two categories according to the way the explanations are generated:
transparent algorithms and post-hoc explainaility. We also review the most
prominent XAI works from the lenses of how they could potentially enlighten the
further deployment of the latest advances in RL, in the demanding present and
future of everyday problems.Comment: Article accepted at Knowledge-Based System
Acquisition of Chess Knowledge in AlphaZero
What is learned by sophisticated neural network agents such as AlphaZero?
This question is of both scientific and practical interest. If the
representations of strong neural networks bear no resemblance to human
concepts, our ability to understand faithful explanations of their decisions
will be restricted, ultimately limiting what we can achieve with neural network
interpretability. In this work we provide evidence that human knowledge is
acquired by the AlphaZero neural network as it trains on the game of chess. By
probing for a broad range of human chess concepts we show when and where these
concepts are represented in the AlphaZero network. We also provide a
behavioural analysis focusing on opening play, including qualitative analysis
from chess Grandmaster Vladimir Kramnik. Finally, we carry out a preliminary
investigation looking at the low-level details of AlphaZero's representations,
and make the resulting behavioural and representational analyses available
online.Comment: 69 pages, 44 figure
Explainability in Deep Reinforcement Learning
International audienceA large set of the explainable Artificial Intelligence (XAI) literature is emerging on feature relevance techniques to explain a deep neural network (DNN) output or explaining models that ingest image source data. However, assessing how XAI techniques can help understand models beyond classification tasks, e.g. for reinforcement learning (RL), has not been extensively studied. We review recent works in the direction to attain Explainable Reinforcement Learning (XRL), a relatively new subfield of Explainable Artificial Intelligence, intended to be used in general public applications, with diverse audiences, requiring ethical, responsible and trustable algorithms. In critical situations where it is essential to justify and explain the agent's behaviour, better explainability and interpretability of RL models could help gain scientific insight on the inner workings of what is still considered a black box. We evaluate mainly studies directly linking explainability to RL, and split these into two categories according to the way the explanations are generated: transparent algorithms and post-hoc explainaility. We also review the most prominent XAI works from the lenses of how they could potentially enlighten the further deployment of the latest advances in RL, in the demanding present and future of everyday problems
Learning structured task related abstractions
As robots and autonomous agents are to assist people with more tasks in various
domains they need the ability to quickly gain contextual awareness in unseen environments
and learn new tasks. Current state of the art methods rely predominantly
on statistical learning techniques which tend to overfit to sensory signals and often
fail to extract structured task related abstractions. The obtained environment and task
models are typically represented as black box objects that cannot be easily updated or
inspected and provide limited generalisation capabilities.
We address the aforementioned shortcomings of current methods by explicitly
studying the problem of learning structured task related abstractions. In particular, we
are interested in extracting symbolic representations of the environment from sensory
signals and encoding the task to be executed as a computer program. We consider the
standard problem of learning to solve a task by mapping sensory signals to actions
and propose the decomposition of such a mapping into two stages: i) perceiving
symbols from sensory data and ii) using a program to manipulate those symbols in
order to make decisions. This thesis studies the bidirectional interactions between the
agentâs capabilities to perceive symbols and the programs it can execute in order to
solve a task.
In the first part of the thesis we demonstrate that access to a programmatic
description of the task provides a strong inductive bias which facilitates the learning
of structured task related representations of the environment. In order to do so, we first
consider a collaborative human-robot interaction setup and propose a framework for
Grounding and Learning Instances through Demonstration and Eye tracking (GLIDE)
which enables robots to learn symbolic representations of the environment from few
demonstrations. In order to relax the constraints on the task encoding program which
GLIDE assumes, we introduce the perceptor gradients algorithm and prove that it can
be applied with any task encoding program.
In the second part of the thesis we investigate the complement problem of inducing
task encoding programs assuming that a symbolic representations of the
environment is available. Therefore, we propose the p-machine â a novel program
induction framework which combines standard enumerative search techniques with a
stochastic gradient descent optimiser in order to obtain an efficient program synthesiser.
We show that the induction of task encoding programs is applicable to various
problems such as learning physics laws, inspecting neural networks and learning in
human-robot interaction setups