1,367 research outputs found
Inverse linear-quadratic discrete-time finite-horizon optimal control for indistinguishable homogeneous agents: A convex optimization approach
The inverse linear-quadratic optimal control problem is a system identification problem whose aim is to recover the quadratic cost function and hence the closed-loop system matrices based on observations of optimal trajectories. In this paper, the discrete-time, finite-horizon case is considered, where the agents are also assumed to be homogeneous and indistinguishable. The latter means that the agents all have the same dynamics and objective functions and the observations are in terms of “snap shots” of all agents at different time instants, but what is not known is “which agent moved where” for consecutive observations. This absence of linked optimal trajectories makes the problem challenging. We first show that this problem is globally identifiable. Then, for the case of noiseless observations, we show that the true cost matrix, and hence the closed-loop system matrices, can be recovered as the unique global optimal solution to a convex optimization problem. Next, for the case of noisy observations, we formulate an estimator as the unique global optimal solution to a modified convex optimization problem. Moreover, the statistical consistency of this estimator is shown. Finally, the performance of the proposed method is demonstrated by a number of numerical examples
Inverse optimal control for averaged cost per stage linear quadratic regulators
Inverse Optimal Control (IOC) is a powerful framework for learning a
behaviour from observations of experts. The framework aims to identify the
underlying cost function that the observed optimal trajectories (the experts'
behaviour) are optimal with respect to. In this work, we considered the case of
identifying the cost and the feedback law from observed trajectories generated
by an ``average cost per stage" linear quadratic regulator. We show that
identifying the cost is in general an ill-posed problem, and give necessary and
sufficient conditions for non-identifiability. Moreover, despite the fact that
the problem is in general ill-posed, we construct an estimator for the cost
function and show that the control gain corresponding to this estimator is a
statistically consistent estimator for the true underlying control gain. In
fact, the constructed estimator is based on convex optimization, and hence the
proved statistical consistency is also observed in practice. We illustrate the
latter by applying the method on a simulation example from rehabilitation
robotics.Comment: 10 pages, 2 figure
Global Convergence of Receding-Horizon Policy Search in Learning Estimator Designs
We introduce the receding-horizon policy gradient (RHPG) algorithm, the first
PG algorithm with provable global convergence in learning the optimal linear
estimator designs, i.e., the Kalman filter (KF). Notably, the RHPG algorithm
does not require any prior knowledge of the system for initialization and does
not require the target system to be open-loop stable. The key of RHPG is that
we integrate vanilla PG (or any other policy search directions) into a dynamic
programming outer loop, which iteratively decomposes the infinite-horizon KF
problem that is constrained and non-convex in the policy parameter into a
sequence of static estimation problems that are unconstrained and
strongly-convex, thus enabling global convergence. We further provide
fine-grained analyses of the optimization landscape under RHPG and detail the
convergence and sample complexity guarantees of the algorithm. This work serves
as an initial attempt to develop reinforcement learning algorithms specifically
for control applications with performance guarantees by utilizing classic
control theory in both algorithmic design and theoretical analyses. Lastly, we
validate our theories by deploying the RHPG algorithm to learn the Kalman
filter design of a large-scale convection-diffusion model. We open-source the
code repository at \url{https://github.com/xiangyuan-zhang/LearningKF}.Comment: arXiv admin note: text overlap with arXiv:2301.1262
- …