Search CORE

2,812 research outputs found

Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

Author: Başar Tamer
Fazel Maryam
Hu Bin
Li Na
Mesbahi Mehran
Zhang Kaiqing
Publication venue
Publication date: 10/10/2022
Field of study

Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR),

\mathcal{H}_\infty

control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous System

arXiv.org e-Print Archive

Gradient Methods for Large-Scale and Distributed Linear Quadratic Control

Author: Mårtensson Karl
Publication venue: Department of Automatic Control, Lund Institute of Technology, Lund University
Publication date: 01/01/2012
Field of study

This thesis considers methods for synthesis of linear quadratic controllers for large-scale, interconnected systems. Conventional methods that solve the linear quadratic control problem are only applicable to systems with moderate size, due to the rapid increase in both computational time and memory requirements as the system size increases. The methods presented in this thesis show a much slower increase in these requirements when faced with system matrices with a sparse structure. Hence, they are useful for control design for systems of large order, since they usually have sparse systems matrices. An equally important feature of the methods is that the controllers are restricted to have a distributed nature, meaning that they respect a potential interconnection structure of the system. The controllers considered in the thesis have the same structure as the centralized LQG solution, that is, they are consisting of a state predictor and feedback from the estimated states. Strategies for determining the feedback matrix and predictor matrix separately, are suggested. The strategies use gradient directions of the cost function to iteratively approach a locally optimal solution in either problem. A scheme to determine bounds on the degree of suboptimality of the partial solution in every iteration, is presented. It is also shown that these bounds can be combined to give a bound on the degree of suboptimality of the full output feedback controller. Another method that treats the synthesis of the feedback matrix and predictor matrix simultaneously is also presented. The functionality of the developed methods is illustrated by an application, where the methods are used to compute controllers for a large deformable mirror, found in a telescope to compensate for atmospheric disturbances. The model of the mirror is obtained by discretizing a partial differential equation. This gives a linear, sparse representation of the mirror with a very large state space, which is suitable for the methods presented in the thesis. The performance of the controllers is evaluated using performance measures from the adaptive optics community

Lund University Publications

On Control and Estimation of Large and Uncertain Systems

Author: Kjellqvist Olle
Publication venue
Publication date: 14/12/2022
Field of study

This thesis contains an introduction and six papers about the control and estimation of large and uncertain systems. The first paper poses and solves a deterministic version of the multiple-model estimation problem for finite sets of linear systems. The estimate is an interpolation of Kalman filter estimates. It achieves a provided energy gain bound from disturbances to the point-wise estimation error, given that the gain bound is feasible. The second paper shows how to compute upper and lower bounds for the smallest feasible gain bound. The bounds are computed via Riccati recursions. The third paper proves that it is sufficient to consider observer-based feedback in output-feedback control of linear systems with uncertain parameters, where the uncertain parameters belong to a finite set. The paper also contains an example of a discrete-time integrator with unknown gain. The fourth paper argues that the current methods for analyzing the robustness of large systems with structured uncertainty do not distinguish between sparse and dense perturbations and proposes a new robustness measure that captures sparsity. The paper also thoroughly analyzes this new measure. In particular, it proposes an upper bound that is amenable to distributed computation and valuable for control design. The fifth paper solves the problem of localized state-feedback L2 control with communication delay for large discrete-time systems. The synthesis procedure can be performed for each node in parallel. The paper combines the localized state-feedback controller with a localized Kalman filter to synthesize a localized output feedback controller that stabilizes the closed-loop subject to communication constraints. The sixth paper concerns optimal linear-quadratic team-decision problems where the team does not have access to the model. Instead, the players must learn optimal policies by interacting with the environment. The paper contains algorithms and regret bounds for the first- and zeroth-order information feedback

Lund University Publications