26 research outputs found
No Internal Regret via Neighborhood Watch
We present an algorithm which attains O(\sqrt{T}) internal (and thus
external) regret for finite games with partial monitoring under the local
observability condition. Recently, this condition has been shown by (Bartok,
Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring
games against an i.i.d. opponent, and the authors conjectured that the same
holds for non-stochastic adversaries. Our result is in the affirmative, and it
completes the characterization of possible rates for finite partial-monitoring
games, an open question stated by (Cesa-Bianchi, Lugosi, and Stoltz, 2006). Our
regret guarantees also hold for the more general model of partial monitoring
with random signals
Online learning with graph-structured feedback against adaptive adversaries
We derive upper and lower bounds for the policy regret of -round online
learning problems with graph-structured feedback, where the adversary is
nonoblivious but assumed to have a bounded memory. We obtain upper bounds of
and for strongly-observable and
weakly-observable graphs, respectively, based on analyzing a variant of the
Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we
show that a matching lower bound of is achieved in
the case of full-information feedback. We also study the particular loss
structure of an oblivious adversary with switching costs, and show that in such
a setting, non-revealing strongly-observable feedback graphs achieve a lower
bound of , as well.Comment: This paper has been accepted to ISIT 201
Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]
Learning theory is an active research area that incorporates ideas,
problems, and techniques from a wide range of disciplines including
statistics, artificial intelligence, information theory, pattern
recognition, and theoretical computer science. The research reported
at the 21st International Conference on Algorithmic Learning Theory
(ALT 2010) ranges over areas such as query models, online learning,
inductive inference, boosting, kernel methods, complexity and
learning, reinforcement learning, unsupervised learning, grammatical
inference, and algorithmic forecasting. In this introduction we give
an overview of the five invited talks and the regular contributions
of ALT 2010
Online Learning with Feedback Graphs: Beyond Bandits
We study a general class of online learning problems where the feedback is
specified by a graph. This class includes online prediction with expert advice
and the multi-armed bandit problem, but also several learning problems where
the online player does not necessarily observe his own loss. We analyze how the
structure of the feedback graph controls the inherent difficulty of the induced
-round learning problem. Specifically, we show that any feedback graph
belongs to one of three classes: strongly observable graphs, weakly observable
graphs, and unobservable graphs. We prove that the first class induces learning
problems with minimax regret, where
is the independence number of the underlying graph; the second class
induces problems with minimax regret,
where is the domination number of a certain portion of the graph; and
the third class induces problems with linear minimax regret. Our results
subsume much of the previous work on learning with feedback graphs and reveal
new connections to partial monitoring games. We also show how the regret is
affected if the graphs are allowed to vary with time