Search CORE

45 research outputs found

Online Control for Linear Dynamics: A Data-Driven Approach

Author: Chen Yongxin
Liu Zishun
Publication venue
Publication date: 16/08/2023
Field of study

This paper considers an online control problem over a linear time-invariant system with unknown dynamics, bounded disturbance, and adversarial cost. We propose a data-driven strategy to reduce the regret of the controller. Unlike model-based methods, our algorithm does not identify the system model, instead, it leverages a single noise-free trajectory to calculate the accumulation of disturbance and makes decisions using the accumulated disturbance action controller we design, whose parameters are updated by online gradient descent. We prove that the regret of our algorithm is

\mathcal{O}(\sqrt{T})

under mild assumptions, suggesting that its performance is on par with model-based methods

arXiv.org e-Print Archive

Regret-Optimal LQR Control

Author: Goel Gautam
Hassibi Babak
Lale Sahin
Sabag Oron
Publication venue
Publication date: 13/04/2023
Field of study

We consider the infinite-horizon LQR control problem. Motivated by competitive analysis in online learning, as a criterion for controller design we introduce the dynamic regret, defined as the difference between the LQR cost of a causal controller (that has only access to past disturbances) and the LQR cost of the \emph{unique} clairvoyant one (that has also access to future disturbances) that is known to dominate all other controllers. The regret itself is a function of the disturbances, and we propose to find a causal controller that minimizes the worst-case regret over all bounded energy disturbances. The resulting controller has the interpretation of guaranteeing the smallest regret compared to the best non-causal controller that can see the future. We derive explicit formulas for the optimal regret and for the regret-optimal controller for the state-space setting. These explicit solutions are obtained by showing that the regret-optimal control problem can be reduced to a Nehari extension problem that can be solved explicitly. The regret-optimal controller is shown to be linear and can be expressed as the sum of the classical

H_2

state-feedback law and an

n

-th order controller (

n

is the state dimension), and its construction simply requires a solution to the standard LQR Riccati equation and two Lyapunov equations. Simulations over a range of plants demonstrate that the regret-optimal controller interpolates nicely between the

H_2

and the

H_\infty

optimal controllers, and generally has

H_2

and

H_\infty

costs that are simultaneously close to their optimal values. The regret-optimal controller thus presents itself as a viable option for control systems design

arXiv.org e-Print Archive

Online Optimization with Memory and Competitive Control

Author: Chung Soon-Jo
Lin Yiheng
Shi Guanya
Wierman Adam
Yue Yisong
Publication venue
Publication date: 13/02/2020
Field of study

This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous p decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems

arXiv.org e-Print Archive

Caltech Authors

Non-Stochastic Control with Bandit Feedback

Author: Gradu Paula
Hallman John
Hazan Elad
Publication venue
Publication date: 01/01/2020
Field of study

We study the problem of controlling a linear dynamical system with adversarial perturbations where the only feedback available to the controller is the scalar loss, and the loss function itself is unknown. For this problem, with either a known or unknown system, we give an efficient sublinear regret algorithm. The main algorithmic difficulty is the dependence of the loss on past controls. To overcome this issue, we propose an efficient algorithm for the general setting of bandit convex optimization for loss functions with memory, which may be of independent interest

arXiv.org e-Print Archive

Princeton University Open Access Repository

Regret-optimal control in dynamic environments

Author: Goel Gautam
Hassibi Babak
Publication venue
Publication date: 20/10/2020
Field of study

We consider control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing an online controller which minimizes regret against the best dynamic sequence of control actions selected in hindsight (dynamic regret), instead of the best fixed controller in some specific class of controllers (static regret). This formulation is attractive when the environment changes over time and no single controller achieves good performance over the entire time horizon. We derive the state-space structure of the regret-optimal controller via a novel reduction to

H_{\infty}

control and present a tight data-dependent bound on its regret in terms of the energy of the disturbance. Our results easily extend to the model-predictive setting where the controller can anticipate future disturbances and to settings where the controller only affects the system dynamics after a fixed delay. We present numerical experiments which show that our regret-optimal controller interpolates between the performance of the

H_2

-optimal and

H_{\infty}

-optimal controllers across stochastic and adversarial environments

arXiv.org e-Print Archive

Caltech Authors

Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework

Author: Basei Matteo
Guo Xin
Hu Anran
Publication venue
Publication date: 10/11/2020
Field of study

In this paper we study a continuous-time linear quadratic reinforcement learning problem in an episodic setting. We first show that na\"ive discretization and piecewise approximation with discrete-time RL algorithms yields a linear regret with respect to the number of learning episodes

N

. We then propose an algorithm with continuous-time controls based on a regularized least-squares estimation, and establish a sublinear regret bound in the order of

\tilde{O}(\sqrt{N})

. The analysis consists of two parts: parameter estimation error, which relies on properties of sub-exponential random variables and double stochastic integrals; and perturbation analysis, which establishes the robustness of the associated continuous-time Riccati equation by exploiting its regularity property.Comment: 25 page

arXiv.org e-Print Archive