Search CORE

26 research outputs found

No Internal Regret via Neighborhood Watch

Author: Foster Dean
Rakhlin Alexander
Publication venue
Publication date: 30/08/2011
Field of study

We present an algorithm which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition. Recently, this condition has been shown by (Bartok, Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring games against an i.i.d. opponent, and the authors conjectured that the same holds for non-stochastic adversaries. Our result is in the affirmative, and it completes the characterization of possible rates for finite partial-monitoring games, an open question stated by (Cesa-Bianchi, Lugosi, and Stoltz, 2006). Our regret guarantees also hold for the more general model of partial monitoring with random signals

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

Online learning with graph-structured feedback against adaptive adversaries

Author: Feng Zhili
Loh Po-Ling
Publication venue
Publication date: 01/04/2018
Field of study

We derive upper and lower bounds for the policy regret of

T

-round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of

\widetilde O(T^{2/3})

and

\widetilde O(T^{3/4})

for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of

\widetilde\Omega(T^{2/3})

is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of

\widetilde\Omega(T^{2/3})

, as well.Comment: This paper has been accepted to ISIT 201

arXiv.org e-Print Archive

Crossref

Editors' Introduction to [Algorithmic Learning Theory: 21st International Conference, ALT 2010, Canberra, Australia, October 6-8, 2010. Proceedings]

Author: Hutter Marcus
Stephan Frank
Vovk Vladimir
Zeugmann Thomas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2010
Field of study

Learning theory is an active research area that incorporates ideas, problems, and techniques from a wide range of disciplines including statistics, artificial intelligence, information theory, pattern recognition, and theoretical computer science. The research reported at the 21st International Conference on Algorithmic Learning Theory (ALT 2010) ranges over areas such as query models, online learning, inductive inference, boosting, kernel methods, complexity and learning, reinforcement learning, unsupervised learning, grammatical inference, and algorithmic forecasting. In this introduction we give an overview of the five invited talks and the regular contributions of ALT 2010

The Australian National University

Online Learning with Feedback Graphs: Beyond Bandits

Author: Alon Noga
Cesa-Bianchi Nicolò
Dekel Ofer
Koren Tomer
Publication venue
Publication date: 01/01/2015
Field of study

We study a general class of online learning problems where the feedback is specified by a graph. This class includes online prediction with expert advice and the multi-armed bandit problem, but also several learning problems where the online player does not necessarily observe his own loss. We analyze how the structure of the feedback graph controls the inherent difficulty of the induced

T

-round learning problem. Specifically, we show that any feedback graph belongs to one of three classes: strongly observable graphs, weakly observable graphs, and unobservable graphs. We prove that the first class induces learning problems with

\widetilde\Theta(\alpha^{1/2} T^{1/2})

minimax regret, where

\alpha

is the independence number of the underlying graph; the second class induces problems with

\widetilde\Theta(\delta^{1/3}T^{2/3})

minimax regret, where

\delta

is the domination number of a certain portion of the graph; and the third class induces problems with linear minimax regret. Our results subsume much of the previous work on learning with feedback graphs and reveal new connections to partial monitoring games. We also show how the regret is affected if the graphs are allowed to vary with time

arXiv.org e-Print Archive

AIR Universita degli studi di Milano