900 research outputs found
Direct Policy Optimization using Deterministic Sampling and Collocation
We present an approach for approximately solving discrete-time stochastic
optimal-control problems by combining direct trajectory optimization,
deterministic sampling, and policy optimization. Our feedback motion-planning
algorithm uses a quasi-Newton method to simultaneously optimize a reference
trajectory, a set of deterministically chosen sample trajectories, and a
parameterized policy. We demonstrate that this approach exactly recovers LQR
policies in the case of linear dynamics, quadratic objective, and Gaussian
disturbances. We also demonstrate the algorithm on several nonlinear,
underactuated robotic systems to highlight its performance and ability to
handle control limits, safely avoid obstacles, and generate robust plans in the
presence of unmodeled dynamics.Comment: revisions for RA-L 202
Optimal PMU Placement for Power System Dynamic State Estimation by Using Empirical Observability Gramian
In this paper the empirical observability Gramian calculated around the
operating region of a power system is used to quantify the degree of
observability of the system states under specific phasor measurement unit (PMU)
placement. An optimal PMU placement method for power system dynamic state
estimation is further formulated as an optimization problem which maximizes the
determinant of the empirical observability Gramian and is efficiently solved by
the NOMAD solver, which implements the Mesh Adaptive Direct Search (MADS)
algorithm. The implementation, validation, and also the robustness to load
fluctuations and contingencies of the proposed method are carefully discussed.
The proposed method is tested on WSCC 3-machine 9-bus system and NPCC
48-machine 140-bus system by performing dynamic state estimation with
square-root unscented Kalman filter. The simulation results show that the
determined optimal PMU placements by the proposed method can guarantee good
observability of the system states, which further leads to smaller estimation
errors and larger number of convergent states for dynamic state estimation
compared with random PMU placements. Under optimal PMU placements an obvious
observability transition can be observed. The proposed method is also validated
to be very robust to both load fluctuations and contingencies.Comment: Accepted by IEEE Transactions on Power System
Analysis, Design, and Optimization of Robust Trajectories in Cislunar Environment for Limited-Capability Spacecraft
Nowadays, the space exploration is going in the direction of exploiting small platforms to get high scientific return at significantly lower costs. However, miniaturized spacecraft pose different challenges both from the technological and mission analysis point of view. While the former is in constant evolution due to the manufacturers, the latter is an open point, since it is still based on a traditional approach, not able to cope with the new platforms' peculiarities. In this work, a revised preliminary mission analysis approach, merging the nominal trajectory optimization with a complete navigation assessment, is formulated in a general form and three main blocks composing it are identified. Then, the integrated approach is specialized for a cislunar test case scenario, represented by the transfer trajectory from a low lunar orbit to an halo orbit of the CubeSat LUMIO, and each block is modeled with mathematical means. Eventually, optimal solutions, minimizing the total costs, are sought, showing the benefits of an integrated approach
Computational guidance using sparse Gauss-Hermite quadrature differential dynamic programming
This paper proposes a new computational guidance algorithm using differential dynamic programming and sparse Gauss-Hermite quadrature rule. By the application of sparse Gauss-Hermite quadrature rule, numerical differentiation in the calculation of Hessian matrices and gradients in differential dynamic programming is avoided. Based on the new differential dynamic programming approach developed, a three-dimensional computational algorithm is proposed to control the impact angle and impact time for an air-to-surface interceptor. Extensive numerical simulations are performed to show the effectiveness of the proposed approach
Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives
This paper presents a tutorial overview of path integral (PI) control
approaches for stochastic optimal control and trajectory optimization. We
concisely summarize the theoretical development of path integral control to
compute a solution for stochastic optimal control and provide algorithmic
descriptions of the cross-entropy (CE) method, an open-loop controller using
the receding horizon scheme known as the model predictive path integral (MPPI),
and a parameterized state feedback controller based on the path integral
control theory. We discuss policy search methods based on path integral
control, efficient and stable sampling strategies, extensions to multi-agent
decision-making, and MPPI for the trajectory optimization on manifolds. For
tutorial demonstrations, some PI-based controllers are implemented in MATLAB
and ROS2/Gazebo simulations for trajectory optimization. The simulation
frameworks and source codes are publicly available at
https://github.com/INHA-Autonomous-Systems-Laboratory-ASL/An-Overview-on-Recent-Advances-in-Path-Integral-Control.Comment: 16 pages, 9 figure
Probabilistic models for data efficient reinforcement learning
Trial-and-error based reinforcement learning (RL) has seen rapid advancements
in recent times, especially with the advent of deep neural networks. However, the
standard deep learning methods often overlook the progress made in control theory
by treating systems as black-box. We propose a model-based RL framework based
on probabilistic Model Predictive Control (MPC). In particular, we propose to learn
a probabilistic transition model using Gaussian Processes (GPs) to incorporate model
uncertainty into long-term predictions, thereby, reducing the impact of model errors. We
provide theoretical guarantees for first-order optimality in the GP-based transition models
with deterministic approximate inference for long-term planning. We demonstrate that
our approach not only achieves the state-of-the-art data efficiency, but also is a principled
way for RL in constrained environments.
When the true state of the dynamical system cannot be fully observed the standard
model based methods cannot be directly applied. For these systems an additional step of
state estimation is needed. We propose distributed message passing for state estimation in
non-linear dynamical systems. In particular, we propose to use expectation propagation
(EP) to iteratively refine the state estimate, i.e., the Gaussian posterior distribution on the
latent state. We show two things: (a) Classical Rauch-Tung-Striebel (RTS) smoothers,
such as the extended Kalman smoother (EKS) or the unscented Kalman smoother (UKS),
are special cases of our message passing scheme; (b) running the message passing
scheme more than once can lead to significant improvements over the classical RTS
smoothers. We show the explicit connection between message passing with EP and
well-known RTS smoothers and provide a practical implementation of the suggested
algorithm. Furthermore, we address convergence issues of EP by generalising this
framework to damped updates and the consideration of general -divergences.
Probabilistic models can also be used to generate synthetic data. In model based RL
we use ’synthetic’ data as a proxy to real environments and in order to achieve high data
efficiency. The ability to generate high-fidelity synthetic data is crucial when available
(real) data is limited as in RL or where privacy and data protection standards allow
only for limited use of the given data, e.g., in medical and financial data-sets. Current
state-of-the-art methods for synthetic data generation are based on generative models,
such as Generative Adversarial Networks (GANs). Even though GANs have achieved
remarkable results in synthetic data generation, they are often challenging to interpret.
Furthermore, GAN-based methods can suffer when used with mixed real and categorical
variables. Moreover, the loss function (discriminator loss) design itself is problem
specific, i.e., the generative model may not be useful for tasks it was not explicitly trained
for. In this paper, we propose to use a probabilistic model as a synthetic data generator.
Learning the probabilistic model for the data is equivalent to estimating the density of
the data. Based on the copula theory, we divide the density estimation task into two parts,
i.e., estimating univariate marginals and estimating the multivariate copula density over
the univariate marginals. We use normalising flows to learn both the copula density and
univariate marginals. We benchmark our method on both simulated and real data-sets in
terms of density estimation as well as the ability to generate high-fidelity synthetic data.Open Acces
FORESEE: Prediction with Expansion-Compression Unscented Transform for Online Policy Optimization
Propagating state distributions through a generic, uncertain nonlinear
dynamical model is known to be intractable and usually begets numerical or
analytical approximations. We introduce a method for state prediction, called
the Expansion-Compression Unscented Transform, and use it to solve a class of
online policy optimization problems. Our proposed algorithm propagates a finite
number of sigma points through a state-dependent distribution, which dictates
an increase in the number of sigma points at each time step to represent the
resulting distribution; this is what we call the expansion operation. To keep
the algorithm scalable, we augment the expansion operation with a compression
operation based on moment matching, thereby keeping the number of sigma points
constant across predictions over multiple time steps. Its performance is
empirically shown to be comparable to Monte Carlo but at a much lower
computational cost. Under state and control input constraints, the state
prediction is subsequently used in tandem with a proposed variant of
constrained gradient-descent for online update of policy parameters in a
receding horizon fashion. The framework is implemented as a differentiable
computational graph for policy training. We showcase our framework for a
quadrotor stabilization task as part of a benchmark comparison in
safe-control-gym and for optimizing the parameters of a Control Barrier
Function based controller in a leader-follower problem
- …