1,322 research outputs found
Data-enabled Policy Optimization for the Linear Quadratic Regulator
Policy optimization (PO), an essential approach of reinforcement learning for
a broad range of system classes, requires significantly more system data than
indirect (identification-followed-by-control) methods or behavioral-based
direct methods even in the simplest linear quadratic regulator (LQR) problem.
In this paper, we take an initial step towards bridging this gap by proposing
the data-enabled policy optimization (DeePO) method, which requires only a
finite number of sufficiently exciting data to iteratively solve the LQR via
PO. Based on a data-driven closed-loop parameterization, we are able to
directly compute the policy gradient from a bath of persistently exciting data.
Next, we show that the nonconvex PO problem satisfies a projected gradient
dominance property by relating it to an equivalent convex program, leading to
the global convergence of DeePO. Moreover, we apply regularization methods to
enhance certainty-equivalence and robustness of the resulting controller and
show an implicit regularization property. Finally, we perform simulations to
validate our results.Comment: Submitted to IEEE CDC 202
- …