On fixed convex combinations of no-regret learners

Abstract

No-regret algorithms for online convex optimization are potent online learning tools and have been demonstrated to be successful in a wide-ranging number of applications. Considering affine and external regret, we investigate what happens when a set of no-regret learners (voters) merge their respective decisions in each learning iteration to a single, common one in form of a convex combination. We show that an agent (or algorithm) that executes this merged decision in each iteration of the online learning process and each time feeds back a copy of its own reward function to the voters, incurs sublinear regret itself. As a by-product, we obtain a simple method that allows us to construct new no-regret algorithms out of known ones. © 2009 Springer Berlin Heidelberg

    Similar works