2,712 research outputs found
Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games
Regret minimization is a powerful tool for solving large-scale extensive-form
games. State-of-the-art methods rely on minimizing regret locally at each
decision point. In this work we derive a new framework for regret minimization
on sequential decision problems and extensive-form games with general compact
convex sets at each decision point and general convex losses, as opposed to
prior work which has been for simplex decision points and linear losses. We
call our framework laminar regret decomposition. It generalizes the CFR
algorithm to this more general setting. Furthermore, our framework enables a
new proof of CFR even in the known setting, which is derived from a perspective
of decomposing polytope regret, thereby leading to an arguably simpler
interpretation of the algorithm. Our generalization to convex compact sets and
convex losses allows us to develop new algorithms for several problems:
regularized sequential decision making, regularized Nash equilibria in
extensive-form games, and computing approximate extensive-form perfect
equilibria. Our generalization also leads to the first regret-minimization
algorithm for computing reduced-normal-form quantal response equilibria based
on minimizing local regrets. Experiments show that our framework leads to
algorithms that scale at a rate comparable to the fastest variants of
counterfactual regret minimization for computing Nash equilibrium, and
therefore our approach leads to the first algorithm for computing quantal
response equilibria in extremely large games. Finally we show that our
framework enables a new kind of scalable opponent exploitation approach
Competing With Strategies
We study the problem of online learning with a notion of regret defined with
respect to a set of strategies. We develop tools for analyzing the minimax
rates and for deriving regret-minimization algorithms in this scenario. While
the standard methods for minimizing the usual notion of regret fail, through
our analysis we demonstrate existence of regret-minimization methods that
compete with such sets of strategies as: autoregressive algorithms, strategies
based on statistical models, regularized least squares, and follow the
regularized leader strategies. In several cases we also derive efficient
learning algorithms
On the Impossibility of Regret Minimization in Repeated Games
Regret minimizing strategies for repeated games have been receiving increasing attention in the literature. These are simple adaptive behavior rules that exhibit nice convergence properties. If all players follow regret minimizing strategies, their average joint play converges to the set of correlated equilibria or to the Hannan set (depending on the notion of regret in use), or even to Nash equilibrium on certain classes of games. In this note we raise the question of validity of the regret minimization objective. By example we show that regret minimization can lead to unrealistic behavior, since it fails to take into account the effect of one's actions on subsequent behavior of the opponents. An amended notion of regret that corrects this defect is not very useful either, since achieving a no-regret objective is not guaranteed in that case.Repeated games, Regret minimization, No-regret strategy
Iterated Regret Minimization in Game Graphs
Iterated regret minimization has been introduced recently by J.Y. Halpern and
R. Pass in classical strategic games. For many games of interest, this new
solution concept provides solutions that are judged more reasonable than
solutions offered by traditional game concepts -- such as Nash equilibrium --.
Although computing iterated regret on explicit matrix game is conceptually and
computationally easy, nothing is known about computing the iterated regret on
games whose matrices are defined implicitly using game tree, game DAG or, more
generally game graphs. In this paper, we investigate iterated regret
minimization for infinite duration two-player quantitative non-zero sum games
played on graphs.
We consider reachability objectives that are not necessarily antagonist.
Edges are weighted by integers -- one for each player --, and the payoffs are
defined by the sum of the weights along the paths. Depending on the class of
graphs, we give either polynomial or pseudo-polynomial time algorithms to
compute a strategy that minimizes the regret for a fixed player. We finally
give algorithms to compute the strategies of the two players that minimize the
iterated regret for trees, and for graphs with strictly positive weights only.Comment: 19 pages. Bug in introductive example fixed
- …