583 research outputs found
On the Impossibility of Regret Minimization in Repeated Games
Regret minimizing strategies for repeated games have been receiving increasing attention in the literature. These are simple adaptive behavior rules that exhibit nice convergence properties. If all players follow regret minimizing strategies, their average joint play converges to the set of correlated equilibria or to the Hannan set (depending on the notion of regret in use), or even to Nash equilibrium on certain classes of games. In this note we raise the question of validity of the regret minimization objective. By example we show that regret minimization can lead to unrealistic behavior, since it fails to take into account the effect of one's actions on subsequent behavior of the opponents. An amended notion of regret that corrects this defect is not very useful either, since achieving a no-regret objective is not guaranteed in that case.Repeated games, Regret minimization, No-regret strategy
Optimization, Learning, and Games with Predictable Sequences
We provide several applications of Optimistic Mirror Descent, an online
learning algorithm based on the idea of predictable sequences. First, we
recover the Mirror Prox algorithm for offline optimization, prove an extension
to Holder-smooth functions, and apply the results to saddle-point type
problems. Next, we prove that a version of Optimistic Mirror Descent (which has
a close relation to the Exponential Weights algorithm) can be used by two
strongly-uncoupled players in a finite zero-sum matrix game to converge to the
minimax equilibrium at the rate of O((log T)/T). This addresses a question of
Daskalakis et al 2011. Further, we consider a partial information version of
the problem. We then apply the results to convex programming and exhibit a
simple algorithm for the approximate Max Flow problem
Online learning with graph-structured feedback against adaptive adversaries
We derive upper and lower bounds for the policy regret of -round online
learning problems with graph-structured feedback, where the adversary is
nonoblivious but assumed to have a bounded memory. We obtain upper bounds of
and for strongly-observable and
weakly-observable graphs, respectively, based on analyzing a variant of the
Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we
show that a matching lower bound of is achieved in
the case of full-information feedback. We also study the particular loss
structure of an oblivious adversary with switching costs, and show that in such
a setting, non-revealing strongly-observable feedback graphs achieve a lower
bound of , as well.Comment: This paper has been accepted to ISIT 201
- …