10,180 research outputs found
Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning
In the field of reinforcement learning there has been recent progress towards
safety and high-confidence bounds on policy performance. However, to our
knowledge, no practical methods exist for determining high-confidence policy
performance bounds in the inverse reinforcement learning setting---where the
true reward function is unknown and only samples of expert behavior are given.
We propose a sampling method based on Bayesian inverse reinforcement learning
that uses demonstrations to determine practical high-confidence upper bounds on
the -worst-case difference in expected return between any evaluation
policy and the optimal policy under the expert's unknown reward function. We
evaluate our proposed bound on both a standard grid navigation task and a
simulated driving task and achieve tighter and more accurate bounds than a
feature count-based baseline. We also give examples of how our proposed bound
can be utilized to perform risk-aware policy selection and risk-aware policy
improvement. Because our proposed bound requires several orders of magnitude
fewer demonstrations than existing high-confidence bounds, it is the first
practical method that allows agents that learn from demonstration to express
confidence in the quality of their learned policy.Comment: In proceedings AAAI-1
Recommended from our members
Kevlar: A Mapping-Free Framework for Accurate Discovery of De Novo Variants.
De novo genetic variants are an important source of causative variation in complex genetic disorders. Many methods for variant discovery rely on mapping reads to a reference genome, detecting numerous inherited variants irrelevant to the phenotype of interest. To distinguish between inherited and de novo variation, sequencing of families (parents and siblings) is commonly pursued. However, standard mapping-based approaches tend to have a high false-discovery rate for de novo variant prediction. Kevlar is a mapping-free method for de novo variant discovery, based on direct comparison of sequences between related individuals. Kevlar identifies high-abundance k-mers unique to the individual of interest. Reads containing these k-mers are partitioned into disjoint sets by shared k-mer content for variant calling, and preliminary variant predictions are sorted using a probabilistic score. We evaluated Kevlar on simulated and real datasets, demonstrating its ability to detect both de novo single-nucleotide variants and indels with high accuracy
Can Differentiable Decision Trees Learn Interpretable Reward Functions?
There is an increasing interest in learning reward functions that model human
intent and human preferences. However, many frameworks use blackbox learning
methods that, while expressive, are difficult to interpret. We propose and
evaluate a novel approach for learning expressive and interpretable reward
functions from preferences using Differentiable Decision Trees (DDTs) for both
low- and high-dimensional state inputs. We explore and discuss the viability of
learning interpretable reward functions using DDTs by evaluating our algorithm
on Cartpole, Visual Gridworld environments, and Atari games. We provide
evidence that that the tree structure of our learned reward function is useful
in determining the extent to which a reward function is aligned with human
preferences. We visualize the learned reward DDTs and find that they are
capable of learning interpretable reward functions but that the discrete nature
of the trees hurts the performance of reinforcement learning at test time.
However, we also show evidence that using soft outputs (averaged over all leaf
nodes) results in competitive performance when compared with larger capacity
deep neural network reward functions
Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications
Inverse reinforcement learning (IRL) infers a reward function from
demonstrations, allowing for policy improvement and generalization. However,
despite much recent interest in IRL, little work has been done to understand
the minimum set of demonstrations needed to teach a specific sequential
decision-making task. We formalize the problem of finding maximally informative
demonstrations for IRL as a machine teaching problem where the goal is to find
the minimum number of demonstrations needed to specify the reward equivalence
class of the demonstrator. We extend previous work on algorithmic teaching for
sequential decision-making tasks by showing a reduction to the set cover
problem which enables an efficient approximation algorithm for determining the
set of maximally-informative demonstrations. We apply our proposed machine
teaching algorithm to two novel applications: providing a lower bound on the
number of queries needed to learn a policy using active IRL and developing a
novel IRL algorithm that can learn more efficiently from informative
demonstrations than a standard IRL approach.Comment: In proceedings of the AAAI Conference on Artificial Intelligence,
201
- …