Non-stationary Online Learning with Memory and Non-stochastic Control

Wang, Yu-Xiang; Zhao, Peng; Zhou, Zhi-Hua

Non-stationary Online Learning with Memory and Non-stochastic Control

Authors: Yu-Xiang Wang
Peng Zhao
Zhi-Hua Zhou
Publication date: 25 June 2021
Publisher

Abstract

We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions and thus captures temporal effects of learning problems. In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments, which competes algorithms' decisions with a sequence of changing comparators. We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret. The key technical challenge is how to control the switching cost, the cumulative movements of player's decisions, which is neatly addressed by a novel decomposition of dynamic policy regret and an appropriate meta-expert structure. Furthermore, we apply the results to the problem of online non-stochastic control, i.e., controlling a linear dynamical system with adversarial disturbance and convex loss functions. We derive a novel gradient-based controller with dynamic policy regret guarantees, which is the first controller competitive to a sequence of changing policies

Similar works

Full text

Available Versions

arXiv.org e-Print Archive

oai:arXiv.org:2102.03758

Last time updated on 02/03/2021