8 research outputs found
Bayesian multitask inverse reinforcement learning
We generalise the problem of inverse reinforcement learning to multiple
tasks, from multiple demonstrations. Each one may represent one expert trying
to solve a different task, or as different experts trying to solve the same
task. Our main contribution is to formalise the problem as statistical
preference elicitation, via a number of structured priors, whose form captures
our biases about the relatedness of different tasks or expert policies. In
doing so, we introduce a prior on policy optimality, which is more natural to
specify. We show that our framework allows us not only to learn to efficiently
from multiple experts but to also effectively differentiate between the goals
of each. Possible applications include analysing the intrinsic motivations of
subjects in behavioural experiments and learning from multiple teachers.Comment: Corrected version. 13 pages, 8 figure
Epistemic risk-sensitive reinforcement learning
We develop a framework for risk-sensitive behaviour in reinforcement learning (RL) due to uncertainty about the environment dynamics by leveraging utility-based definitions of risk sensitivity. In this framework, the preference for risk can be tuned by varying the utility function, for which we develop dynamic programming (DP) and policy gradient-based algorithms. The risk-averse behavior is compared with the behavior of risk-neutral policy in environments with epistemic risk
Epistemic Risk-Sensitive Reinforcement Learning
We develop a framework for interacting with uncertain environments in
reinforcement learning (RL) by leveraging preferences in the form of utility
functions. We claim that there is value in considering different risk measures
during learning. In this framework, the preference for risk can be tuned by
variation of the parameter and the resulting behavior can be
risk-averse, risk-neutral or risk-taking depending on the parameter choice. We
evaluate our framework for learning problems with model uncertainty. We measure
and control for \emph{epistemic} risk using dynamic programming (DP) and policy
gradient-based algorithms. The risk-averse behavior is then compared with the
behavior of the optimal risk-neutral policy in environments with epistemic
risk.Comment: 8 pages, 2 figure
Cover Tree Bayesian Reinforcement Learning
This paper proposes an online tree-based Bayesian approach for reinforcement
learning. For inference, we employ a generalised context tree model. This
defines a distribution on multivariate Gaussian piecewise-linear models, which
can be updated in closed form. The tree structure itself is constructed using
the cover tree method, which remains efficient in high dimensional spaces. We
combine the model with Thompson sampling and approximate dynamic programming to
obtain effective exploration policies in unknown environments. The flexibility
and computational simplicity of the model render it suitable for many
reinforcement learning problems in continuous state spaces. We demonstrate this
in an experimental comparison with least squares policy iteration
Sample Efficient Bayesian Reinforcement Learning
Artificial Intelligence (AI) has been an active field of research for over a century now. The research field of AI may be grouped into various tasks that are expected from an intelligent agent; two major ones being learning & inference and planning. The act of storing new knowledge is known as learning while inference refers to the act to extracting conclusions given agent’s limited knowledge base. They are tightly knit by the design of its knowledge base. The process of deciding long-term actions or plans given its current knowledge is called planning.Reinforcement Learning (RL) brings together these two tasks by posing a seemingly benign question “How to act optimally in an unknown environment?”. This requires the agent to learn about its environment as well as plan actions given its current knowledge about it. In RL, the environment can be represented by a mathematical model and we associate an intrinsic value to the actions that the agent may choose.In this thesis, we present a novel Bayesian algorithm for the problem of RL. Bayesian RL is a widely explored area of research but is constrained by scalability and performance issues. We provide first steps towards rigorous analysis of these types of algorithms. Bayesian algorithms are characterized by the belief that they maintain over their unknowns; which is updated based on the collected evidence. This is different from the traditional approach in RL in terms of problem formulation and formal guarantees. Our novel algorithm combines aspects of planning and learning due to its inherent Bayesian formulation. It does so in a more scalable fashion, with formal PAC guarantees. We also give insights on the application of Bayesian framework for the estimation of model and value, in a joint work on Bayesian backward induction for RL
Inferential Induction: A Novel Framework for Bayesian Reinforcement Learning
Bayesian reinforcement learning (BRL) offers a decision-theoretic solution
for reinforcement learning. While "model-based" BRL algorithms have focused
either on maintaining a posterior distribution on models or value functions and
combining this with approximate dynamic programming or tree search, previous
Bayesian "model-free" value function distribution approaches implicitly make
strong assumptions or approximations. We describe a novel Bayesian framework,
Inferential Induction, for correctly inferring value function distributions
from data, which leads to the development of a new class of BRL algorithms. We
design an algorithm, Bayesian Backwards Induction, with this framework. We
experimentally demonstrate that the proposed algorithm is competitive with
respect to the state of the art.Comment: 28 pages, 12 figure
Applications of Probabilistic Inference to Planning & Reinforcement Learning
Optimal control is a profound and fascinating subject that regularly attracts interest from numerous scien-
tific disciplines, including both pure and applied Mathematics, Computer Science, Artificial Intelligence,
Psychology, Neuroscience and Economics. In
1960 Rudolf Kalman discovered that there exists a dual-
ity between the problems of filtering and optimal control in linear systems [84]. This is now regarded
as a seminal piece of work and it has since motivated a large amount of research into the discovery of
similar dualities between optimal control and statistical inference. This is especially true of recent years
where there has been much research into recasting problems of optimal control into problems of statis-
tical/approximate inference. Broadly speaking this is the perspective that we take in this work and in
particular we present various applications of methods from the fields of statistical/approximate inference
to optimal control, planning and Reinforcement Learning. Some of the methods would be more accu-
rately described to originate from other fields of research, such as the dual decomposition
techniques used in chapter(5) which originate from convex optimisation. However, the original motivation for the
application of these techniques was from the field of approximate inference. The study of dualities be-
tween optimal control and statistical inference has been a subject of research for over 50
years and we do not claim to encompass the entire subject. Instead, we present what we consider to be a range of
interesting and novel applications from this field of researc