46 research outputs found

    Best of Both Worlds in Online Control: Competitive Ratio and Policy Regret

    Full text link
    We consider the fundamental problem of online control of a linear dynamical system from two different viewpoints: regret minimization and competitive analysis. We prove that the optimal competitive policy is well-approximated by a convex parameterized policy class, known as a disturbance-action control (DAC) policies. Using this structural result, we show that several recently proposed online control algorithms achieve the best of both worlds: sublinear regret vs. the best DAC policy selected in hindsight, and optimal competitive ratio, up to an additive correction which grows sublinearly in the time horizon. We further conclude that sublinear regret vs. the optimal competitive policy is attainable when the linear dynamical system is unknown, and even when a stabilizing controller for the dynamics is not available a priori

    Certainty equivalence and model uncertainty

    Get PDF
    Simon’s and Theil’s certainty equivalence property justifies a convenient algorithm for solving dynamic programming problems with quadratic objectives and linear transition laws: first, optimize under perfect foresight, then substitute optimal forecasts for unknown future values. A similar decomposition into separate optimization and forecasting steps prevails when a decision maker wants a decision rule that is robust to model misspecification. Concerns about model misspecification leave the first step of the algorithm intact and affect only the second step of forecasting the future. The decision maker attains robustness by making forecasts with a distorted model that twists probabilities relative to his approximating model. The appropriate twisting emerges from a two-player zero-sum dynamic game.

    Learning over All Stabilizing Nonlinear Controllers for a Partially-Observed Linear System

    Full text link
    This paper proposes a nonlinear policy architecture for control of partially-observed linear dynamical systems providing built-in closed-loop stability guarantees. The policy is based on a nonlinear version of the Youla parameterization, and augments a known stabilizing linear controller with a nonlinear operator from a recently developed class of dynamic neural network models called the recurrent equilibrium network (REN). We prove that RENs are universal approximators of contracting and Lipschitz nonlinear systems, and subsequently show that the the proposed Youla-REN architecture is a universal approximator of stabilizing nonlinear controllers. The REN architecture simplifies learning since unconstrained optimization can be applied, and we consider both a model-based case where exact gradients are available and reinforcement learning using random search with zeroth-order oracles. In simulation examples our method converges faster to better controllers and is more scalable than existing methods, while guaranteeing stability during learning transients

    Online Agnostic Boosting via Regret Minimization

    Full text link
    Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting
    corecore