4 research outputs found
On the Minimax Regret for Online Learning with Feedback Graphs
In this work, we improve on the upper and lower bounds for the regret of
online learning with strongly observable undirected feedback graphs. The best
known upper bound for this problem is , where is the number of actions, is the independence
number of the graph, and is the time horizon. The factor is
known to be necessary when (the experts case). On the other hand,
when (the bandits case), the minimax rate is known to be
, and a lower bound is known to hold for any . Our improved upper bound
holds for any
and matches the lower bounds for bandits and experts, while
interpolating intermediate cases. To prove this result, we use FTRL with
-Tsallis entropy for a carefully chosen value of that
varies with . The analysis of this algorithm requires a new bound on
the variance term in the regret. We also show how to extend our techniques to
time-varying graphs, without requiring prior knowledge of their independence
numbers. Our upper bound is complemented by an improved
lower bound for all
, whose analysis relies on a novel reduction to multitask learning.
This shows that a logarithmic factor is necessary as soon as