Search CORE

4 research outputs found

A Notation for Markov Decision Processes

Author: Okal Billy
Thomas Philip S.
Publication venue
Publication date: 08/09/2016
Field of study

This paper specifies a notation for Markov decision processes

arXiv.org e-Print Archive

Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines

Author: Brunskill Emma
Thomas Philip S.
Publication venue
Publication date: 20/06/2017
Field of study

We show how an action-dependent baseline can be used by the policy gradient theorem using function approximation, originally presented with action-independent baselines by (Sutton et al. 2000)

arXiv.org e-Print Archive

TripleTree: A Versatile Interpretable Representation of Black Box Agents and their Environments

Author: Bewley Tom
Lawry Jonathan
Publication venue
Publication date: 21/09/2020
Field of study

In explainable artificial intelligence, there is increasing interest in understanding the behaviour of autonomous agents to build trust and validate performance. Modern agent architectures, such as those trained by deep reinforcement learning, are currently so lacking in interpretable structure as to effectively be black boxes, but insights may still be gained from an external, behaviourist perspective. Inspired by conceptual spaces theory, we suggest that a versatile first step towards general understanding is to discretise the state space into convex regions, jointly capturing similarities over the agent's action, value function and temporal dynamics within a dataset of observations. We create such a representation using a novel variant of the CART decision tree algorithm, and demonstrate how it facilitates practical understanding of black box agents through prediction, visualisation and rule-based explanation.Comment: 12 pages (incl. references and appendices), 15 figures. Pre-print, under revie

arXiv.org e-Print Archive

Data-Efficient Off-Policy Policy Evaluation for Reinforcement Learning

Author: Brunskill Emma
Thomas Philip S.
Publication venue
Publication date: 04/04/2016
Field of study

In this paper we present a new way of predicting the performance of a reinforcement learning policy given historical data that may have been generated by a different policy. The ability to evaluate a policy from historical data is important for applications where the deployment of a bad policy can be dangerous or costly. We show empirically that our algorithm produces estimates that often have orders of magnitude lower mean squared error than existing methods---it makes more efficient use of the available data. Our new estimator is based on two advances: an extension of the doubly robust estimator (Jiang and Li, 2015), and a new way to mix between model based estimates and importance sampling based estimates

arXiv.org e-Print Archive