14 research outputs found
Online learning with graph-structured feedback against adaptive adversaries
We derive upper and lower bounds for the policy regret of -round online
learning problems with graph-structured feedback, where the adversary is
nonoblivious but assumed to have a bounded memory. We obtain upper bounds of
and for strongly-observable and
weakly-observable graphs, respectively, based on analyzing a variant of the
Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we
show that a matching lower bound of is achieved in
the case of full-information feedback. We also study the particular loss
structure of an oblivious adversary with switching costs, and show that in such
a setting, non-revealing strongly-observable feedback graphs achieve a lower
bound of , as well.Comment: This paper has been accepted to ISIT 201