Search CORE

3,771 research outputs found

Sparse Wide-Area Control of Power Systems using Data-driven Reinforcement Learning

Author: Chakrabortty Aranya
Dizche Amirhassan Fallah
Duel-Hallen Alexandra
Publication venue
Publication date: 28/09/2018
Field of study

In this paper we present an online wide-area oscillation damping control (WAC) design for uncertain models of power systems using ideas from reinforcement learning. We assume that the exact small-signal model of the power system at the onset of a contingency is not known to the operator and use the nominal model and online measurements of the generator states and control inputs to rapidly converge to a state-feedback controller that minimizes a given quadratic energy cost. However, unlike conventional linear quadratic regulators (LQR), we intend our controller to be sparse, so its implementation reduces the communication costs. We, therefore, employ the gradient support pursuit (GraSP) optimization algorithm to impose sparsity constraints on the control gain matrix during learning. The sparse controller is thereafter implemented using distributed communication. Using the IEEE 39-bus power system model with 1149 unknown parameters, it is demonstrated that the proposed learning method provides reliable LQR performance while the controller matched to the nominal model becomes unstable for severely uncertain systems.Comment: Submitted to IEEE ACC 2019. 8 pages, 4 figure

arXiv.org e-Print Archive

Crossref

Optimal control of nonlinear partially-unknown systems with unsymmetrical input constraints and its applications to the optimal UAV circumnavigation problem

Author: Shen Lincheng
Sun Zhiyong
Wang Xiangke
Yu Yangguang
Publication venue
Publication date: 27/05/2020
Field of study

Aimed at solving the optimal control problem for nonlinear systems with unsymmetrical input constraints, we present an online adaptive approach for partially unknown control systems/dynamics. The designed algorithm converges online to the optimal control solution without the knowledge of the internal system dynamics. The optimality of the obtained control policy and the stability for the closed-loop dynamic optimality are proved theoretically. The proposed method greatly relaxes the assumption on the form of the internal dynamics and input constraints in previous works. Besides, the control design framework proposed in this paper offers a new approach to solve the optimal circumnavigation problem involving a moving target for a fixed-wing unmanned aerial vehicle (UAV). The control performance of our method is compared with that of the existing circumnavigation control law in a numerical simulation and the simulation results validate the effectiveness of our algorithm

arXiv.org e-Print Archive

Path integral policy improvement with differential dynamic programming

Author: Crevecoeur Guillaume
Lefebvre Tom
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Path Integral Policy Improvement with Covariance Matrix Adaptation (PI2-CMA) is a step-based model free reinforcement learning approach that combines statistical estimation techniques with fundamental results from Stochastic Optimal Control. Basically, a policy distribution is improved iteratively using reward weighted averaging of the corresponding rollouts. It was assumed that PI2-CMA somehow exploited gradient information that was contained by the reward weighted statistics. To our knowledge we are the first to expose the principle of this gradient extraction rigorously. Our findings reveal that PI2-CMA essentially obtains gradient information similar to the forward and backward passes in the Differential Dynamic Programming (DDP) method. It is then straightforward to extend the analogy with DDP by introducing a feedback term in the policy update. This suggests a novel algorithm which we coin Path Integral Policy Improvement with Differential Dynamic Programming (PI2-DDP). The resulting algorithm is similar to the previously proposed Sampled Differential Dynamic Programming (SaDDP) but we derive the method independently as a generalization of the framework of PI2-CMA. Our derivations suggest to implement some small variations to SaDDP so to increase performance. We validated our claims on a robot trajectory learning task

Crossref

Ghent University Academic Bibliography