545 research outputs found
Can Differentiable Decision Trees Learn Interpretable Reward Functions?
There is an increasing interest in learning reward functions that model human
intent and human preferences. However, many frameworks use blackbox learning
methods that, while expressive, are difficult to interpret. We propose and
evaluate a novel approach for learning expressive and interpretable reward
functions from preferences using Differentiable Decision Trees (DDTs) for both
low- and high-dimensional state inputs. We explore and discuss the viability of
learning interpretable reward functions using DDTs by evaluating our algorithm
on Cartpole, Visual Gridworld environments, and Atari games. We provide
evidence that that the tree structure of our learned reward function is useful
in determining the extent to which a reward function is aligned with human
preferences. We visualize the learned reward DDTs and find that they are
capable of learning interpretable reward functions but that the discrete nature
of the trees hurts the performance of reinforcement learning at test time.
However, we also show evidence that using soft outputs (averaged over all leaf
nodes) results in competitive performance when compared with larger capacity
deep neural network reward functions
Specifying and Interpreting Reinforcement Learning Policies through Simulatable Machine Learning
Human-AI collaborative policy synthesis is a procedure in which (1) a human
initializes an autonomous agent's behavior, (2) Reinforcement Learning improves
the human specified behavior, and (3) the agent can explain the final optimized
policy to the user. This paradigm leverages human expertise and facilitates a
greater insight into the learned behaviors of an agent. Existing approaches to
enabling collaborative policy specification involve black box methods which are
unintelligible and are not catered towards non-expert end-users. In this paper,
we develop a novel collaborative framework to enable humans to initialize and
interpret an autonomous agent's behavior, rooted in principles of
human-centered design. Through our framework, we enable humans to specify an
initial behavior model in the form of unstructured, natural language, which we
then convert to lexical decision trees. Next, we are able to leverage these
human-specified policies, to warm-start reinforcement learning and further
allow the agent to optimize the policies through reinforcement learning.
Finally, to close the loop on human-specification, we produce explanations of
the final learned policy, in multiple modalities, to provide the user a final
depiction about the learned policy of the agent. We validate our approach by
showing that our model can produce >80% accuracy, and that human-initialized
policies are able to successfully warm-start RL. We then conduct a novel
human-subjects study quantifying the relative subjective and objective benefits
of varying XAI modalities(e.g., Tree, Language, and Program) for explaining
learned policies to end-users, in terms of usability and interpretability and
identify the circumstances that influence these measures. Our findings
emphasize the need for personalized explainable systems that can facilitate
user-centric policy explanations for a variety of end-users
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning
Combining deep model-free reinforcement learning with on-line planning is a
promising approach to building on the successes of deep RL. On-line planning
with look-ahead trees has proven successful in environments where transition
models are known a priori. However, in complex environments where transition
models need to be learned from data, the deficiencies of learned models have
limited their utility for planning. To address these challenges, we propose
TreeQN, a differentiable, recursive, tree-structured model that serves as a
drop-in replacement for any value function network in deep RL with discrete
actions. TreeQN dynamically constructs a tree by recursively applying a
transition model in a learned abstract state space and then aggregating
predicted rewards and state-values using a tree backup to estimate Q-values. We
also propose ATreeC, an actor-critic variant that augments TreeQN with a
softmax layer to form a stochastic policy network. Both approaches are trained
end-to-end, such that the learned model is optimised for its actual use in the
tree. We show that TreeQN and ATreeC outperform n-step DQN and A2C on a
box-pushing task, as well as n-step DQN and value prediction networks (Oh et
al. 2017) on multiple Atari games. Furthermore, we present ablation studies
that demonstrate the effect of different auxiliary losses on learning
transition models
- …