14 research outputs found

    Online learning with graph-structured feedback against adaptive adversaries

    Full text link
    We derive upper and lower bounds for the policy regret of TT-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of O~(T2/3)\widetilde O(T^{2/3}) and O~(T3/4)\widetilde O(T^{3/4}) for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of Ω~(T2/3)\widetilde\Omega(T^{2/3}) is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of Ω~(T2/3)\widetilde\Omega(T^{2/3}), as well.Comment: This paper has been accepted to ISIT 201
    corecore