Search CORE

900 research outputs found

Direct Policy Optimization using Deterministic Sampling and Collocation

Author: Fu Chunjiang
Howell Taylor A.
Manchester Zachary
Publication venue
Publication date: 14/01/2021
Field of study

We present an approach for approximately solving discrete-time stochastic optimal-control problems by combining direct trajectory optimization, deterministic sampling, and policy optimization. Our feedback motion-planning algorithm uses a quasi-Newton method to simultaneously optimize a reference trajectory, a set of deterministically chosen sample trajectories, and a parameterized policy. We demonstrate that this approach exactly recovers LQR policies in the case of linear dynamics, quadratic objective, and Gaussian disturbances. We also demonstrate the algorithm on several nonlinear, underactuated robotic systems to highlight its performance and ability to handle control limits, safely avoid obstacles, and generate robust plans in the presence of unmodeled dynamics.Comment: revisions for RA-L 202

arXiv.org e-Print Archive

An Online-Computation Approach to Optimal Finite-Horizon State-Feedback Control of Nonlinear Stochastic Systems

Author: Deisenroth M.
Publication venue: Universität Karlsruhe (TH)
Publication date: 01/08/2006
Field of study

MPG.PuRe

Optimal PMU Placement for Power System Dynamic State Estimation by Using Empirical Observability Gramian

Author: Kang Wei
Qi Junjian
Sun Kai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

In this paper the empirical observability Gramian calculated around the operating region of a power system is used to quantify the degree of observability of the system states under specific phasor measurement unit (PMU) placement. An optimal PMU placement method for power system dynamic state estimation is further formulated as an optimization problem which maximizes the determinant of the empirical observability Gramian and is efficiently solved by the NOMAD solver, which implements the Mesh Adaptive Direct Search (MADS) algorithm. The implementation, validation, and also the robustness to load fluctuations and contingencies of the proposed method are carefully discussed. The proposed method is tested on WSCC 3-machine 9-bus system and NPCC 48-machine 140-bus system by performing dynamic state estimation with square-root unscented Kalman filter. The simulation results show that the determined optimal PMU placements by the proposed method can guarantee good observability of the system states, which further leads to smaller estimation errors and larger number of convergent states for dynamic state estimation compared with random PMU placements. Under optimal PMU placements an obvious observability transition can be observed. The proposed method is also validated to be very robust to both load fluctuations and contingencies.Comment: Accepted by IEEE Transactions on Power System

arXiv.org e-Print Archive

CiteSeerX

Crossref

Calhoun, Institutional Archive of the Naval Postgraduate School

Analysis, Design, and Optimization of Robust Trajectories in Cislunar Environment for Limited-Capability Spacecraft

Author: Giordano Carmine
Topputo Francesco
Publication venue
Publication date: 01/01/2023
Field of study

Nowadays, the space exploration is going in the direction of exploiting small platforms to get high scientific return at significantly lower costs. However, miniaturized spacecraft pose different challenges both from the technological and mission analysis point of view. While the former is in constant evolution due to the manufacturers, the latter is an open point, since it is still based on a traditional approach, not able to cope with the new platforms' peculiarities. In this work, a revised preliminary mission analysis approach, merging the nominal trajectory optimization with a complete navigation assessment, is formulated in a general form and three main blocks composing it are identified. Then, the integrated approach is specialized for a cislunar test case scenario, represented by the transfer trajectory from a low lunar orbit to an halo orbit of the CubeSat LUMIO, and each block is modeled with mathematical means. Eventually, optimal solutions, minimizing the total costs, are sought, showing the benefits of an integrated approach

Archivio istituzionale della ricerca - Politecnico di Milano

Computational guidance using sparse Gauss-Hermite quadrature differential dynamic programming

Author: He Shaoming
Shin Hyo-sang
Tsourdos Antonios
Publication venue: 'Elsevier BV'
Publication date: 25/11/2019
Field of study

This paper proposes a new computational guidance algorithm using differential dynamic programming and sparse Gauss-Hermite quadrature rule. By the application of sparse Gauss-Hermite quadrature rule, numerical differentiation in the calculation of Hessian matrices and gradients in differential dynamic programming is avoided. Based on the new differential dynamic programming approach developed, a three-dimensional computational algorithm is proposed to control the impact angle and impact time for an air-to-surface interceptor. Extensive numerical simulations are performed to show the effectiveness of the proposed approach

Cranfield CERES

Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives

Author: Hong JunGee
Kazim Muhammad
Kim Kwang-Ki K.
Kim Min-Gyeom
Publication venue
Publication date: 21/09/2023
Field of study

This paper presents a tutorial overview of path integral (PI) control approaches for stochastic optimal control and trajectory optimization. We concisely summarize the theoretical development of path integral control to compute a solution for stochastic optimal control and provide algorithmic descriptions of the cross-entropy (CE) method, an open-loop controller using the receding horizon scheme known as the model predictive path integral (MPPI), and a parameterized state feedback controller based on the path integral control theory. We discuss policy search methods based on path integral control, efficient and stable sampling strategies, extensions to multi-agent decision-making, and MPPI for the trajectory optimization on manifolds. For tutorial demonstrations, some PI-based controllers are implemented in MATLAB and ROS2/Gazebo simulations for trajectory optimization. The simulation frameworks and source codes are publicly available at https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.Comment: 16 pages, 9 figure

arXiv.org e-Print Archive

Probabilistic models for data efficient reinforcement learning

Author: Kamthe Sanket
Publication venue: Computing, Imperial College London
Publication date: 01/11/2021
Field of study

Trial-and-error based reinforcement learning (RL) has seen rapid advancements in recent times, especially with the advent of deep neural networks. However, the standard deep learning methods often overlook the progress made in control theory by treating systems as black-box. We propose a model-based RL framework based on probabilistic Model Predictive Control (MPC). In particular, we propose to learn a probabilistic transition model using Gaussian Processes (GPs) to incorporate model uncertainty into long-term predictions, thereby, reducing the impact of model errors. We provide theoretical guarantees for first-order optimality in the GP-based transition models with deterministic approximate inference for long-term planning. We demonstrate that our approach not only achieves the state-of-the-art data efficiency, but also is a principled way for RL in constrained environments. When the true state of the dynamical system cannot be fully observed the standard model based methods cannot be directly applied. For these systems an additional step of state estimation is needed. We propose distributed message passing for state estimation in non-linear dynamical systems. In particular, we propose to use expectation propagation (EP) to iteratively refine the state estimate, i.e., the Gaussian posterior distribution on the latent state. We show two things: (a) Classical Rauch-Tung-Striebel (RTS) smoothers, such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS), are special cases of our message passing scheme; (b) running the message passing scheme more than once can lead to significant improvements over the classical RTS smoothers. We show the explicit connection between message passing with EP and well-known RTS smoothers and provide a practical implementation of the suggested algorithm. Furthermore, we address convergence issues of EP by generalising this framework to damped updates and the consideration of general -divergences. Probabilistic models can also be used to generate synthetic data. In model based RL we use ’synthetic’ data as a proxy to real environments and in order to achieve high data efficiency. The ability to generate high-fidelity synthetic data is crucial when available (real) data is limited as in RL or where privacy and data protection standards allow only for limited use of the given data, e.g., in medical and financial data-sets. Current state-of-the-art methods for synthetic data generation are based on generative models, such as Generative Adversarial Networks (GANs). Even though GANs have achieved remarkable results in synthetic data generation, they are often challenging to interpret. Furthermore, GAN-based methods can suffer when used with mixed real and categorical variables. Moreover, the loss function (discriminator loss) design itself is problem specific, i.e., the generative model may not be useful for tasks it was not explicitly trained for. In this paper, we propose to use a probabilistic model as a synthetic data generator. Learning the probabilistic model for the data is equivalent to estimating the density of the data. Based on the copula theory, we divide the density estimation task into two parts, i.e., estimating univariate marginals and estimating the multivariate copula density over the univariate marginals. We use normalising flows to learn both the copula density and univariate marginals. We benchmark our method on both simulated and real data-sets in terms of density estimation as well as the ability to generate high-fidelity synthetic data.Open Acces

Spiral - Imperial College Digital Repository

FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization

Author: Panagou Dimitra
Parwana Hardik
Publication venue
Publication date: 31/01/2024
Field of study

Propagating state distributions through a generic, uncertain nonlinear dynamical model is known to be intractable and usually begets numerical or analytical approximations. We introduce a method for state prediction, called the Expansion-Compression Unscented Transform, and use it to solve a class of online policy optimization problems. Our proposed algorithm propagates a finite number of sigma points through a state-dependent distribution, which dictates an increase in the number of sigma points at each time step to represent the resulting distribution; this is what we call the expansion operation. To keep the algorithm scalable, we augment the expansion operation with a compression operation based on moment matching, thereby keeping the number of sigma points constant across predictions over multiple time steps. Its performance is empirically shown to be comparable to Monte Carlo but at a much lower computational cost. Under state and control input constraints, the state prediction is subsequently used in tandem with a proposed variant of constrained gradient-descent for online update of policy parameters in a receding horizon fashion. The framework is implemented as a differentiable computational graph for policy training. We showcase our framework for a quadrotor stabilization task as part of a benchmark comparison in safe-control-gym and for optimizing the parameters of a Control Barrier Function based controller in a leader-follower problem

arXiv.org e-Print Archive