1,037 research outputs found
Nonparametric Bayesian Policy Priors for Reinforcement Learning
We consider reinforcement learning in partially observable domains where the agent can query an expert for
demonstrations. Our nonparametric Bayesian approach combines model knowledge, inferred from expert information and independent exploration, with policy knowledge inferred from expert trajectories. We introduce priors that bias the agent towards models with both simple representations and simple policies, resulting in improved policy and model learning
Stick-Breaking Policy Learning in Dec-POMDPs
Expectation maximization (EM) has recently been shown to be an efficient
algorithm for learning finite-state controllers (FSCs) in large decentralized
POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often
converge to maxima that are far from optimal. This paper considers a
variable-size FSC to represent the local policy of each agent. These
variable-size FSCs are constructed using a stick-breaking prior, leading to a
new framework called \emph{decentralized stick-breaking policy representation}
(Dec-SBPR). This approach learns the controller parameters with a variational
Bayesian algorithm without having to assume that the Dec-POMDP model is
available. The performance of Dec-SBPR is demonstrated on several benchmark
problems, showing that the algorithm scales to large problems while
outperforming other state-of-the-art methods
Universal Reinforcement Learning Algorithms: Survey and Experiments
Many state-of-the-art reinforcement learning (RL) algorithms typically assume
that the environment is an ergodic Markov Decision Process (MDP). In contrast,
the field of universal reinforcement learning (URL) is concerned with
algorithms that make as few assumptions as possible about the environment. The
universal Bayesian agent AIXI and a family of related URL algorithms have been
developed in this setting. While numerous theoretical optimality results have
been proven for these agents, there has been no empirical investigation of
their behavior to date. We present a short and accessible survey of these URL
algorithms under a unified notation and framework, along with results of some
experiments that qualitatively illustrate some properties of the resulting
policies, and their relative performance on partially-observable gridworld
environments. We also present an open-source reference implementation of the
algorithms which we hope will facilitate further understanding of, and
experimentation with, these ideas.Comment: 8 pages, 6 figures, Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17
Learning Models of Sequential Decision-Making without Complete State Specification using Bayesian Nonparametric Inference and Active Querying
Learning models of decision-making behavior during sequential tasks is useful across a variety of applications, including human-machine interaction. In this paper, we present an approach to learning such models within Markovian domains based on observing and querying a decision-making agent. In contrast to classical approaches to behavior learning, we do not assume complete knowledge of the state features that impact an agent's decisions. Using tools from Bayesian nonparametric inference and time series of agents decisions, we first provide an inference algorithm to identify the presence of any unmodeled state features that impact decision making, as well as likely candidate models. In order to identify the best model among these candidates, we next provide an active querying approach that resolves model ambiguity by querying the decision maker. Results from our evaluations demonstrate that, using the proposed algorithms, an observer can identify the presence of latent state features, recover their dynamics, and estimate their impact on decisions during sequential tasks
A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning
We present a tutorial on Bayesian optimization, a method of finding the
maximum of expensive cost functions. Bayesian optimization employs the Bayesian
technique of setting a prior over the objective function and combining it with
evidence to get a posterior function. This permits a utility-based selection of
the next observation to make on the objective function, which must take into
account both exploration (sampling from areas of high uncertainty) and
exploitation (sampling areas likely to offer improvement over the current best
observation). We also present two detailed extensions of Bayesian optimization,
with experiments---active user modelling with preferences, and hierarchical
reinforcement learning---and a discussion of the pros and cons of Bayesian
optimization based on our experiences
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Bayesian nonparametric multivariate convex regression
In many applications, such as economics, operations research and
reinforcement learning, one often needs to estimate a multivariate regression
function f subject to a convexity constraint. For example, in sequential
decision processes the value of a state under optimal subsequent decisions may
be known to be convex or concave. We propose a new Bayesian nonparametric
multivariate approach based on characterizing the unknown regression function
as the max of a random collection of unknown hyperplanes. This specification
induces a prior with large support in a Kullback-Leibler sense on the space of
convex functions, while also leading to strong posterior consistency. Although
we assume that f is defined over R^p, we show that this model has a convergence
rate of log(n)^{-1} n^{-1/(d+2)} under the empirical L2 norm when f actually
maps a d dimensional linear subspace to R. We design an efficient reversible
jump MCMC algorithm for posterior computation and demonstrate the methods
through application to value function approximation
Bibliographic Analysis on Research Publications using Authors, Categorical Labels and the Citation Network
Bibliographic analysis considers the author's research areas, the citation
network and the paper content among other things. In this paper, we combine
these three in a topic model that produces a bibliographic model of authors,
topics and documents, using a nonparametric extension of a combination of the
Poisson mixed-topic link model and the author-topic model. This gives rise to
the Citation Network Topic Model (CNTM). We propose a novel and efficient
inference algorithm for the CNTM to explore subsets of research publications
from CiteSeerX. The publication datasets are organised into three corpora,
totalling to about 168k publications with about 62k authors. The queried
datasets are made available online. In three publicly available corpora in
addition to the queried datasets, our proposed model demonstrates an improved
performance in both model fitting and document clustering, compared to several
baselines. Moreover, our model allows extraction of additional useful knowledge
from the corpora, such as the visualisation of the author-topics network.
Additionally, we propose a simple method to incorporate supervision into topic
modelling to achieve further improvement on the clustering task.Comment: Preprint for Journal Machine Learnin
- …