Nonparametric Bayesian Policy Priors for Reinforcement Learning

Doshi-Velez, Finale P.; Roy, Nicholas; Tenenbaum, Joshua B.; Wingate, David

research

Nonparametric Bayesian Policy Priors for Reinforcement Learning

Authors: Finale P. Doshi-Velez
Nicholas Roy
Joshua B. Tenenbaum
David Wingate
Publication date: 1 January 2010
Publisher: Neural Information Processing Systems Foundation

Abstract

We consider reinforcement learning in partially observable domains where the agent can query an expert for demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

CiteSeerX

oai:CiteSeerX.psu:10.1.1.958.4...

Last time updated on 01/11/2017

DSpace@MIT

oai:dspace.mit.edu:1721.1/6610...

Last time updated on 11/06/2012