172 research outputs found
Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control
We present two nonparametric approaches to Kullback-Leibler (KL) control, or
linearly-solvable Markov decision problem (LMDP) based on Gaussian processes
(GP) and Nystr\"{o}m approximation. Compared to recently developed parametric
methods, the proposed data-driven frameworks feature accurate function
approximation and efficient on-line operations. Theoretically, we derive the
mathematical connection of KL control based on dynamic programming with earlier
work in control theory which relies on information theoretic dualities for the
infinite time horizon case. Algorithmically, we give explicit optimal control
policies in nonparametric forms, and propose on-line update schemes with
budgeted computational costs. Numerical results demonstrate the effectiveness
and usefulness of the proposed frameworks
Differential Dynamic Programming for time-delayed systems
Trajectory optimization considers the problem of deciding how to control a
dynamical system to move along a trajectory which minimizes some cost function.
Differential Dynamic Programming (DDP) is an optimal control method which
utilizes a second-order approximation of the problem to find the control. It is
fast enough to allow real-time control and has been shown to work well for
trajectory optimization in robotic systems. Here we extend classic DDP to
systems with multiple time-delays in the state. Being able to find optimal
trajectories for time-delayed systems with DDP opens up the possibility to use
richer models for system identification and control, including recurrent neural
networks with multiple timesteps in the state. We demonstrate the algorithm on
a two-tank continuous stirred tank reactor. We also demonstrate the algorithm
on a recurrent neural network trained to model an inverted pendulum with
position information only.Comment: 7 pages, 6 figures, conference, Decision and Control (CDC), 2016 IEEE
55th Conference o
Information-Theoretic Stochastic Optimal Control via Incremental Sampling-based Algorithms
This paper considers optimal control of dynamical systems which are
represented by nonlinear stochastic differential equations. It is well-known
that the optimal control policy for this problem can be obtained as a function
of a value function that satisfies a nonlinear partial differential equation,
namely, the Hamilton-Jacobi-Bellman equation. This nonlinear PDE must be solved
backwards in time, and this computation is intractable for large scale systems.
Under certain assumptions, and after applying a logarithmic transformation, an
alternative characterization of the optimal policy can be given in terms of a
path integral. Path Integral (PI) based control methods have recently been
shown to provide elegant solutions to a broad class of stochastic optimal
control problems. One of the implementation challenges with this formalism is
the computation of the expectation of a cost functional over the trajectories
of the unforced dynamics. Computing such expectation over trajectories that are
sampled uniformly may induce numerical instabilities due to the exponentiation
of the cost. Therefore, sampling of low-cost trajectories is essential for the
practical implementation of PI-based methods. In this paper, we use incremental
sampling-based algorithms to sample useful trajectories from the unforced
system dynamics, and make a novel connection between Rapidly-exploring Random
Trees (RRTs) and information-theoretic stochastic optimal control. We show the
results from the numerical implementation of the proposed approach to several
examples.Comment: 18 page
Safe Learning of Quadrotor Dynamics Using Barrier Certificates
To effectively control complex dynamical systems, accurate nonlinear models
are typically needed. However, these models are not always known. In this
paper, we present a data-driven approach based on Gaussian processes that
learns models of quadrotors operating in partially unknown environments. What
makes this challenging is that if the learning process is not carefully
controlled, the system will go unstable, i.e., the quadcopter will crash. To
this end, barrier certificates are employed for safe learning. The barrier
certificates establish a non-conservative forward invariant safe region, in
which high probability safety guarantees are provided based on the statistics
of the Gaussian Process. A learning controller is designed to efficiently
explore those uncertain states and expand the barrier certified safe region
based on an adaptive sampling scheme. In addition, a recursive Gaussian Process
prediction method is developed to learn the complex quadrotor dynamics in
real-time. Simulation results are provided to demonstrate the effectiveness of
the proposed approach.Comment: Submitted to ICRA 2018, 8 page
- …