17,150 research outputs found
No Internal Regret via Neighborhood Watch
We present an algorithm which attains O(\sqrt{T}) internal (and thus
external) regret for finite games with partial monitoring under the local
observability condition. Recently, this condition has been shown by (Bartok,
Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring
games against an i.i.d. opponent, and the authors conjectured that the same
holds for non-stochastic adversaries. Our result is in the affirmative, and it
completes the characterization of possible rates for finite partial-monitoring
games, an open question stated by (Cesa-Bianchi, Lugosi, and Stoltz, 2006). Our
regret guarantees also hold for the more general model of partial monitoring
with random signals
Online Learning with Feedback Graphs: Beyond Bandits
We study a general class of online learning problems where the feedback is
specified by a graph. This class includes online prediction with expert advice
and the multi-armed bandit problem, but also several learning problems where
the online player does not necessarily observe his own loss. We analyze how the
structure of the feedback graph controls the inherent difficulty of the induced
-round learning problem. Specifically, we show that any feedback graph
belongs to one of three classes: strongly observable graphs, weakly observable
graphs, and unobservable graphs. We prove that the first class induces learning
problems with minimax regret, where
is the independence number of the underlying graph; the second class
induces problems with minimax regret,
where is the domination number of a certain portion of the graph; and
the third class induces problems with linear minimax regret. Our results
subsume much of the previous work on learning with feedback graphs and reveal
new connections to partial monitoring games. We also show how the regret is
affected if the graphs are allowed to vary with time
"Tit-For-Tat Equilibria in Discounted Repeated Games with Private Monitoring"
We investigate infinitely repeated games with imperfect private monitoring. We focus on a class of games where the payoff functions are additively separable and the signal for monitoring a player's action does not depend on the other player's action. Tit-for-tat strategies function very well in this class, according to which each player's action in each period depends only on the signal for the opponent's action one period before. With almost perfect monitoring, we show that even if the discount factors are fixed low, efficiency is approximated by a tit-for-tat Nash equilibrium payoff vector.
- âŚ