5 research outputs found
Non-stationary Online Learning with Memory and Non-stochastic Control
We study the problem of Online Convex Optimization (OCO) with memory, which
allows loss functions to depend on past decisions and thus captures temporal
effects of learning problems. In this paper, we introduce dynamic policy regret
as the performance measure to design algorithms robust to non-stationary
environments, which competes algorithms' decisions with a sequence of changing
comparators. We propose a novel algorithm for OCO with memory that provably
enjoys an optimal dynamic policy regret. The key technical challenge is how to
control the switching cost, the cumulative movements of player's decisions,
which is neatly addressed by a novel decomposition of dynamic policy regret and
an appropriate meta-expert structure. Furthermore, we apply the results to the
problem of online non-stochastic control, i.e., controlling a linear dynamical
system with adversarial disturbance and convex loss functions. We derive a
novel gradient-based controller with dynamic policy regret guarantees, which is
the first controller competitive to a sequence of changing policies