45 research outputs found
Online Control for Linear Dynamics: A Data-Driven Approach
This paper considers an online control problem over a linear time-invariant
system with unknown dynamics, bounded disturbance, and adversarial cost. We
propose a data-driven strategy to reduce the regret of the controller. Unlike
model-based methods, our algorithm does not identify the system model, instead,
it leverages a single noise-free trajectory to calculate the accumulation of
disturbance and makes decisions using the accumulated disturbance action
controller we design, whose parameters are updated by online gradient descent.
We prove that the regret of our algorithm is under mild
assumptions, suggesting that its performance is on par with model-based
methods
Regret-Optimal LQR Control
We consider the infinite-horizon LQR control problem. Motivated by
competitive analysis in online learning, as a criterion for controller design
we introduce the dynamic regret, defined as the difference between the LQR cost
of a causal controller (that has only access to past disturbances) and the LQR
cost of the \emph{unique} clairvoyant one (that has also access to future
disturbances) that is known to dominate all other controllers. The regret
itself is a function of the disturbances, and we propose to find a causal
controller that minimizes the worst-case regret over all bounded energy
disturbances. The resulting controller has the interpretation of guaranteeing
the smallest regret compared to the best non-causal controller that can see the
future. We derive explicit formulas for the optimal regret and for the
regret-optimal controller for the state-space setting. These explicit solutions
are obtained by showing that the regret-optimal control problem can be reduced
to a Nehari extension problem that can be solved explicitly. The regret-optimal
controller is shown to be linear and can be expressed as the sum of the
classical state-feedback law and an -th order controller ( is the
state dimension), and its construction simply requires a solution to the
standard LQR Riccati equation and two Lyapunov equations. Simulations over a
range of plants demonstrate that the regret-optimal controller interpolates
nicely between the and the optimal controllers, and generally
has and costs that are simultaneously close to their optimal
values. The regret-optimal controller thus presents itself as a viable option
for control systems design
Online Optimization with Memory and Competitive Control
This paper presents competitive algorithms for a novel class of online optimization problems with memory. We consider a setting where the learner seeks to minimize the sum of a hitting cost and a switching cost that depends on the previous p decisions. This setting generalizes Smoothed Online Convex Optimization. The proposed approach, Optimistic Regularized Online Balanced Descent, achieves a constant, dimension-free competitive ratio. Further, we show a connection between online optimization with memory and online control with adversarial disturbances. This connection, in turn, leads to a new constant-competitive policy for a rich class of online control problems
Non-Stochastic Control with Bandit Feedback
We study the problem of controlling a linear dynamical system with
adversarial perturbations where the only feedback available to the controller
is the scalar loss, and the loss function itself is unknown. For this problem,
with either a known or unknown system, we give an efficient sublinear regret
algorithm. The main algorithmic difficulty is the dependence of the loss on
past controls. To overcome this issue, we propose an efficient algorithm for
the general setting of bandit convex optimization for loss functions with
memory, which may be of independent interest
Regret-optimal control in dynamic environments
We consider control in linear time-varying dynamical systems from the
perspective of regret minimization. Unlike most prior work in this area, we
focus on the problem of designing an online controller which minimizes regret
against the best dynamic sequence of control actions selected in hindsight
(dynamic regret), instead of the best fixed controller in some specific class
of controllers (static regret). This formulation is attractive when the
environment changes over time and no single controller achieves good
performance over the entire time horizon. We derive the state-space structure
of the regret-optimal controller via a novel reduction to control
and present a tight data-dependent bound on its regret in terms of the energy
of the disturbance. Our results easily extend to the model-predictive setting
where the controller can anticipate future disturbances and to settings where
the controller only affects the system dynamics after a fixed delay. We present
numerical experiments which show that our regret-optimal controller
interpolates between the performance of the -optimal and
-optimal controllers across stochastic and adversarial
environments
Linear Quadratic Reinforcement Learning: Sublinear Regret in the Episodic Continuous-Time Framework
In this paper we study a continuous-time linear quadratic reinforcement
learning problem in an episodic setting. We first show that na\"ive
discretization and piecewise approximation with discrete-time RL algorithms
yields a linear regret with respect to the number of learning episodes . We
then propose an algorithm with continuous-time controls based on a regularized
least-squares estimation, and establish a sublinear regret bound in the order
of . The analysis consists of two parts: parameter
estimation error, which relies on properties of sub-exponential random
variables and double stochastic integrals; and perturbation analysis, which
establishes the robustness of the associated continuous-time Riccati equation
by exploiting its regularity property.Comment: 25 page