Search CORE

5,655 research outputs found

Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

Author: Pan Yunpeng
Theodorou Evangelos
Publication venue
Publication date: 15/06/2016
Field of study

We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystr\"{o}m approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks

arXiv.org e-Print Archive

CiteSeerX

Differential Dynamic Programming for time-delayed systems

Author: Fan David D.
Theodorou Evangelos A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/01/2017
Field of study

Trajectory optimization considers the problem of deciding how to control a dynamical system to move along a trajectory which minimizes some cost function. Differential Dynamic Programming (DDP) is an optimal control method which utilizes a second-order approximation of the problem to find the control. It is fast enough to allow real-time control and has been shown to work well for trajectory optimization in robotic systems. Here we extend classic DDP to systems with multiple time-delays in the state. Being able to find optimal trajectories for time-delayed systems with DDP opens up the possibility to use richer models for system identification and control, including recurrent neural networks with multiple timesteps in the state. We demonstrate the algorithm on a two-tank continuous stirred tank reactor. We also demonstrate the algorithm on a recurrent neural network trained to model an inverted pendulum with position information only.Comment: 7 pages, 6 figures, conference, Decision and Control (CDC), 2016 IEEE 55th Conference o

arXiv.org e-Print Archive

Crossref

An investigation of the problems in a channelled electron image intensifier

Author: Theodorou Dimitri
Theodorou Dimitri
Publication venue
Publication date: 01/01/1963
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

Information-Theoretic Stochastic Optimal Control via Incremental Sampling-based Algorithms

Author: Arslan Oktay
Theodorou Evangelos
Tsiotras Panagiotis
Publication venue
Publication date: 28/05/2014
Field of study

This paper considers optimal control of dynamical systems which are represented by nonlinear stochastic differential equations. It is well-known that the optimal control policy for this problem can be obtained as a function of a value function that satisfies a nonlinear partial differential equation, namely, the Hamilton-Jacobi-Bellman equation. This nonlinear PDE must be solved backwards in time, and this computation is intractable for large scale systems. Under certain assumptions, and after applying a logarithmic transformation, an alternative characterization of the optimal policy can be given in terms of a path integral. Path Integral (PI) based control methods have recently been shown to provide elegant solutions to a broad class of stochastic optimal control problems. One of the implementation challenges with this formalism is the computation of the expectation of a cost functional over the trajectories of the unforced dynamics. Computing such expectation over trajectories that are sampled uniformly may induce numerical instabilities due to the exponentiation of the cost. Therefore, sampling of low-cost trajectories is essential for the practical implementation of PI-based methods. In this paper, we use incremental sampling-based algorithms to sample useful trajectories from the unforced system dynamics, and make a novel connection between Rapidly-exploring Random Trees (RRTs) and information-theoretic stochastic optimal control. We show the results from the numerical implementation of the proposed approach to several examples.Comment: 18 page

arXiv.org e-Print Archive

Scholarly Materials And Research @ Georgia Tech

Crossref