58,922 research outputs found
Optimal control as a graphical model inference problem
We reformulate a class of non-linear stochastic optimal control problems
introduced by Todorov (2007) as a Kullback-Leibler (KL) minimization problem.
As a result, the optimal control computation reduces to an inference
computation and approximate inference methods can be applied to efficiently
compute approximate optimal controls. We show how this KL control theory
contains the path integral control method as a special case. We provide an
example of a block stacking task and a multi-agent cooperative game where we
demonstrate how approximate inference can be successfully applied to instances
that are too complex for exact computation. We discuss the relation of the KL
control approach to other inference approaches to control.Comment: 26 pages, 12 Figures; Machine Learning Journal (2012
On probabilistic inference approaches to stochastic optimal control
While stochastic optimal control, together with associate formulations like Reinforcement
Learning, provides a formal approach to, amongst other, motor control,
it remains computationally challenging for most practical problems. This thesis
is concerned with the study of relations between stochastic optimal control and
probabilistic inference. Such dualities { exempli ed by the classical Kalman Duality
between the Linear-Quadratic-Gaussian control problem and the filtering
problem in Linear-Gaussian dynamical systems { make it possible to exploit advances
made within the separate fields. In this context, the emphasis in this work
lies with utilisation of approximate inference methods for the control problem.
Rather then concentrating on special cases which yield analytical inference
problems, we propose a novel interpretation of stochastic optimal control in the
general case in terms of minimisation of certain Kullback-Leibler divergences. Although
these minimisations remain analytically intractable, we show that natural
relaxations of the exact dual lead to new practical approaches. We introduce two
particular general iterative methods ψ-Learning, which has global convergence
guarantees and provides a unifying perspective on several previously proposed
algorithms, and Posterior Policy Iteration, which allows direct application of inference
methods. From these, practical algorithms for Reinforcement Learning,
based on a Monte Carlo approximation to ψ-Learning, and model based stochastic
optimal control, using a variational approximation of posterior policy iteration,
are derived.
In order to overcome the inherent limitations of parametric variational approximations,
we furthermore introduce a new approach for none parametric approximate
stochastic optimal control based on a reproducing kernel Hilbert space
embedding of the control problem.
Finally, we address the general problem of temporal optimisation, i.e., joint
optimisation of controls and temporal aspects, e.g., duration, of the task. Specifically, we introduce a formulation of temporal optimisation based on a generalised
form of the finite horizon problem. Importantly, we show that the generalised
problem has a dual finite horizon problem of the standard form, thus bringing
temporal optimisation within the reach of most commonly used algorithms.
Throughout, problems from the area of motor control of robotic systems are
used to evaluate the proposed methods and demonstrate their practical utility
- …