114,937 research outputs found
Online Learning with Feedback Graphs: Beyond Bandits
We study a general class of online learning problems where the feedback is
specified by a graph. This class includes online prediction with expert advice
and the multi-armed bandit problem, but also several learning problems where
the online player does not necessarily observe his own loss. We analyze how the
structure of the feedback graph controls the inherent difficulty of the induced
-round learning problem. Specifically, we show that any feedback graph
belongs to one of three classes: strongly observable graphs, weakly observable
graphs, and unobservable graphs. We prove that the first class induces learning
problems with minimax regret, where
is the independence number of the underlying graph; the second class
induces problems with minimax regret,
where is the domination number of a certain portion of the graph; and
the third class induces problems with linear minimax regret. Our results
subsume much of the previous work on learning with feedback graphs and reveal
new connections to partial monitoring games. We also show how the regret is
affected if the graphs are allowed to vary with time
On the Minimax Regret for Online Learning with Feedback Graphs
In this work, we improve on the upper and lower bounds for the regret of
online learning with strongly observable undirected feedback graphs. The best
known upper bound for this problem is , where is the number of actions, is the independence
number of the graph, and is the time horizon. The factor is
known to be necessary when (the experts case). On the other hand,
when (the bandits case), the minimax rate is known to be
, and a lower bound is known to hold for any . Our improved upper bound
holds for any
and matches the lower bounds for bandits and experts, while
interpolating intermediate cases. To prove this result, we use FTRL with
-Tsallis entropy for a carefully chosen value of that
varies with . The analysis of this algorithm requires a new bound on
the variance term in the regret. We also show how to extend our techniques to
time-varying graphs, without requiring prior knowledge of their independence
numbers. Our upper bound is complemented by an improved
lower bound for all
, whose analysis relies on a novel reduction to multitask learning.
This shows that a logarithmic factor is necessary as soon as
Stochastic Online Learning with Probabilistic Graph Feedback
We consider a problem of stochastic online learning with general
probabilistic graph feedback, where each directed edge in the feedback graph
has probability . Two cases are covered. (a) The one-step case, where
after playing arm the learner observes a sample reward feedback of arm
with independent probability . (b) The cascade case where after playing
arm the learner observes feedback of all arms in a probabilistic
cascade starting from -- for each with probability , if arm
is played or observed, then a reward sample of arm would be observed
with independent probability . Previous works mainly focus on
deterministic graphs which corresponds to one-step case with , an adversarial sequence of graphs with certain topology guarantees,
or a specific type of random graphs. We analyze the asymptotic lower bounds and
design algorithms in both cases. The regret upper bounds of the algorithms
match the lower bounds with high probability
- …