3,043 research outputs found
Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon
We explore reinforcement learning methods for finding the optimal policy in
the linear quadratic regulator (LQR) problem. In particular, we consider the
convergence of policy gradient methods in the setting of known and unknown
parameters. We are able to produce a global linear convergence guarantee for
this approach in the setting of finite time horizon and stochastic state
dynamics under weak assumptions. The convergence of a projected policy gradient
method is also established in order to handle problems with constraints. We
illustrate the performance of the algorithm with two examples. The first
example is the optimal liquidation of a holding in an asset. We show results
for the case where we assume a model for the underlying dynamics and where we
apply the method to the data directly. The empirical evidence suggests that the
policy gradient method can learn the global optimal solution for a larger class
of stochastic systems containing the LQR framework and that it is more robust
with respect to model mis-specification when compared to a model-based
approach. The second example is an LQR system in a higher dimensional setting
with synthetic data.Comment: 49 pages, 9 figure
Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies
Gradient-based methods have been widely used for system design and
optimization in diverse application domains. Recently, there has been a renewed
interest in studying theoretical properties of these methods in the context of
control and reinforcement learning. This article surveys some of the recent
developments on policy optimization, a gradient-based iterative approach for
feedback control synthesis, popularized by successes of reinforcement learning.
We take an interdisciplinary perspective in our exposition that connects
control theory, reinforcement learning, and large-scale optimization. We review
a number of recently-developed theoretical results on the optimization
landscape, global convergence, and sample complexity of gradient-based methods
for various continuous control problems such as the linear quadratic regulator
(LQR), control, risk-sensitive control, linear quadratic
Gaussian (LQG) control, and output feedback synthesis. In conjunction with
these optimization results, we also discuss how direct policy optimization
handles stability and robustness concerns in learning-based control, two main
desiderata in control engineering. We conclude the survey by pointing out
several challenges and opportunities at the intersection of learning and
control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous
System
Data-enabled Policy Optimization for the Linear Quadratic Regulator
Policy optimization (PO), an essential approach of reinforcement learning for
a broad range of system classes, requires significantly more system data than
indirect (identification-followed-by-control) methods or behavioral-based
direct methods even in the simplest linear quadratic regulator (LQR) problem.
In this paper, we take an initial step towards bridging this gap by proposing
the data-enabled policy optimization (DeePO) method, which requires only a
finite number of sufficiently exciting data to iteratively solve the LQR via
PO. Based on a data-driven closed-loop parameterization, we are able to
directly compute the policy gradient from a bath of persistently exciting data.
Next, we show that the nonconvex PO problem satisfies a projected gradient
dominance property by relating it to an equivalent convex program, leading to
the global convergence of DeePO. Moreover, we apply regularization methods to
enhance certainty-equivalence and robustness of the resulting controller and
show an implicit regularization property. Finally, we perform simulations to
validate our results.Comment: Submitted to IEEE CDC 202
Global Convergence of Policy Gradient Primal-dual Methods for Risk-constrained LQRs
While the techniques in optimal control theory are often model-based, the
policy optimization (PO) approach can directly optimize the performance metric
of interest without explicit dynamical models, and is an essential approach for
reinforcement learning problems. However, it usually leads to a non-convex
optimization problem in most cases, where there is little theoretical
understanding on its performance. In this paper, we focus on the
risk-constrained Linear Quadratic Regulator (LQR) problem with noisy input via
the PO approach, which results in a challenging non-convex problem. To this
end, we first build on our earlier result that the optimal policy has an affine
structure to show that the associated Lagrangian function is locally gradient
dominated with respect to the policy, based on which we establish strong
duality. Then, we design policy gradient primal-dual methods with global
convergence guarantees to find an optimal policy-multiplier pair in both
model-based and sample-based settings. Finally, we use samples of system
trajectories in simulations to validate our policy gradient primal-dual
methods
- …