46 research outputs found
Best of Both Worlds in Online Control: Competitive Ratio and Policy Regret
We consider the fundamental problem of online control of a linear dynamical
system from two different viewpoints: regret minimization and competitive
analysis. We prove that the optimal competitive policy is well-approximated by
a convex parameterized policy class, known as a disturbance-action control
(DAC) policies. Using this structural result, we show that several recently
proposed online control algorithms achieve the best of both worlds: sublinear
regret vs. the best DAC policy selected in hindsight, and optimal competitive
ratio, up to an additive correction which grows sublinearly in the time
horizon. We further conclude that sublinear regret vs. the optimal competitive
policy is attainable when the linear dynamical system is unknown, and even when
a stabilizing controller for the dynamics is not available a priori
Certainty equivalence and model uncertainty
Simon’s and Theil’s certainty equivalence property justifies a convenient algorithm for solving dynamic programming problems with quadratic objectives and linear transition laws: first, optimize under perfect foresight, then substitute optimal forecasts for unknown future values. A similar decomposition into separate optimization and forecasting steps prevails when a decision maker wants a decision rule that is robust to model misspecification. Concerns about model misspecification leave the first step of the algorithm intact and affect only the second step of forecasting the future. The decision maker attains robustness by making forecasts with a distorted model that twists probabilities relative to his approximating model. The appropriate twisting emerges from a two-player zero-sum dynamic game.
Learning over All Stabilizing Nonlinear Controllers for a Partially-Observed Linear System
This paper proposes a nonlinear policy architecture for control of
partially-observed linear dynamical systems providing built-in closed-loop
stability guarantees. The policy is based on a nonlinear version of the Youla
parameterization, and augments a known stabilizing linear controller with a
nonlinear operator from a recently developed class of dynamic neural network
models called the recurrent equilibrium network (REN). We prove that RENs are
universal approximators of contracting and Lipschitz nonlinear systems, and
subsequently show that the the proposed Youla-REN architecture is a universal
approximator of stabilizing nonlinear controllers. The REN architecture
simplifies learning since unconstrained optimization can be applied, and we
consider both a model-based case where exact gradients are available and
reinforcement learning using random search with zeroth-order oracles. In
simulation examples our method converges faster to better controllers and is
more scalable than existing methods, while guaranteeing stability during
learning transients
Online Agnostic Boosting via Regret Minimization
Boosting is a widely used machine learning approach based on the idea of
aggregating weak learning rules. While in statistical learning numerous
boosting methods exist both in the realizable and agnostic settings, in online
learning they exist only in the realizable case. In this work we provide the
first agnostic online boosting algorithm; that is, given a weak learner with
only marginally-better-than-trivial regret guarantees, our algorithm boosts it
to a strong learner with sublinear regret.
Our algorithm is based on an abstract (and simple) reduction to online convex
optimization, which efficiently converts an arbitrary online convex optimizer
to an online booster.
Moreover, this reduction extends to the statistical as well as the online
realizable settings, thus unifying the 4 cases of statistical/online and
agnostic/realizable boosting