Search CORE

114,937 research outputs found

Online Learning with Feedback Graphs: Beyond Bandits

Author: Alon Noga
Cesa-Bianchi Nicolò
Dekel Ofer
Koren Tomer
Publication venue
Publication date: 01/01/2015
Field of study

We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced

T

-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with

\widetilde\Theta(\alpha^{1/2} T^{1/2})

minimax regret, where

\alpha

is the independence number of the underlying graph; the second class induces problems with

\widetilde\Theta(\delta^{1/3}T^{2/3})

minimax regret, where

\delta

is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time

arXiv.org e-Print Archive

AIR Universita degli studi di Milano

On the Minimax Regret for Online Learning with Feedback Graphs

Author: Cesa-Bianchi Nicolò
Cesari Tommaso
Eldowa Khaled
Esposito Emmanuel
Publication venue
Publication date: 24/05/2023
Field of study

In this work, we improve on the upper and lower bounds for the regret of online learning with strongly observable undirected feedback graphs. The best known upper bound for this problem is

\mathcal{O}\bigl(\sqrt{\alpha T\ln K}\bigr)

, where

K

is the number of actions,

\alpha

is the independence number of the graph, and

T

is the time horizon. The

\sqrt{\ln K}

factor is known to be necessary when

\alpha = 1

(the experts case). On the other hand, when

\alpha = K

(the bandits case), the minimax rate is known to be

\Theta\bigl(\sqrt{KT}\bigr)

, and a lower bound

\Omega\bigl(\sqrt{\alpha T}\bigr)

is known to hold for any

\alpha

. Our improved upper bound

\mathcal{O}\bigl(\sqrt{\alpha T(1+\ln(K/\alpha))}\bigr)

holds for any

\alpha

and matches the lower bounds for bandits and experts, while interpolating intermediate cases. To prove this result, we use FTRL with

q

-Tsallis entropy for a carefully chosen value of

q \in [1/2, 1)

that varies with

\alpha

. The analysis of this algorithm requires a new bound on the variance term in the regret. We also show how to extend our techniques to time-varying graphs, without requiring prior knowledge of their independence numbers. Our upper bound is complemented by an improved

\Omega\bigl(\sqrt{\alpha T(\ln K)/(\ln\alpha)}\bigr)

lower bound for all

\alpha > 1

, whose analysis relies on a novel reduction to multitask learning. This shows that a logarithmic factor is necessary as soon as

\alpha < K

arXiv.org e-Print Archive

Stochastic Online Learning with Probabilistic Graph Feedback

Author: Chen Wei
Leung Kwong-Sak
Li Shuai
Wen Zheng
Publication venue
Publication date: 21/11/2019
Field of study

We consider a problem of stochastic online learning with general probabilistic graph feedback, where each directed edge in the feedback graph has probability

p_{ij}

. Two cases are covered. (a) The one-step case, where after playing arm

i

the learner observes a sample reward feedback of arm

j

with independent probability

p_{ij}

. (b) The cascade case where after playing arm

i

the learner observes feedback of all arms

j

in a probabilistic cascade starting from

i

-- for each

(i,j)

with probability

p_{ij}

, if arm

i

is played or observed, then a reward sample of arm

j

would be observed with independent probability

p_{ij}

. Previous works mainly focus on deterministic graphs which corresponds to one-step case with

p_{ij} \in \{0,1\}

, an adversarial sequence of graphs with certain topology guarantees, or a specific type of random graphs. We analyze the asymptotic lower bounds and design algorithms in both cases. The regret upper bounds of the algorithms match the lower bounds with high probability

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications