6,101 research outputs found
Smoothing Method for Approximate Extensive-Form Perfect Equilibrium
Nash equilibrium is a popular solution concept for solving
imperfect-information games in practice. However, it has a major drawback: it
does not preclude suboptimal play in branches of the game tree that are not
reached in equilibrium. Equilibrium refinements can mend this issue, but have
experienced little practical adoption. This is largely due to a lack of
scalable algorithms.
Sparse iterative methods, in particular first-order methods, are known to be
among the most effective algorithms for computing Nash equilibria in
large-scale two-player zero-sum extensive-form games. In this paper, we
provide, to our knowledge, the first extension of these methods to equilibrium
refinements. We develop a smoothing approach for behavioral perturbations of
the convex polytope that encompasses the strategy spaces of players in an
extensive-form game. This enables one to compute an approximate variant of
extensive-form perfect equilibria. Experiments show that our smoothing approach
leads to solutions with dramatically stronger strategies at information sets
that are reached with low probability in approximate Nash equilibria, while
retaining the overall convergence rate associated with fast algorithms for Nash
equilibrium. This has benefits both in approximate equilibrium finding (such
approximation is necessary in practice in large games) where some probabilities
are low while possibly heading toward zero in the limit, and exact equilibrium
computation where the low probabilities are actually zero.Comment: Published at IJCAI 1
Model and Reinforcement Learning for Markov Games with Risk Preferences
We motivate and propose a new model for non-cooperative Markov game which
considers the interactions of risk-aware players. This model characterizes the
time-consistent dynamic "risk" from both stochastic state transitions (inherent
to the game) and randomized mixed strategies (due to all other players). An
appropriate risk-aware equilibrium concept is proposed and the existence of
such equilibria is demonstrated in stationary strategies by an application of
Kakutani's fixed point theorem. We further propose a simulation-based
Q-learning type algorithm for risk-aware equilibrium computation. This
algorithm works with a special form of minimax risk measures which can
naturally be written as saddle-point stochastic optimization problems, and
covers many widely investigated risk measures. Finally, the almost sure
convergence of this simulation-based algorithm to an equilibrium is
demonstrated under some mild conditions. Our numerical experiments on a two
player queuing game validate the properties of our model and algorithm, and
demonstrate their worth and applicability in real life competitive
decision-making.Comment: 38 pages, 6 tables, 5 figure
Private monitoring with infinite histories
This paper develops new recursive methods for studying stationary sequential equilibria in games with private monitoring. We first consider games where play has occurred forever into the past and develop methods for analyzing a large class of stationary strategies, where the main restriction is that the strategy can be represented as a finite automaton. For a subset of this class, strategies which depend only on the players’ signals in the last k periods, these methods allow the construction of all pure strategy equilibria. We then show that each sequential equilibrium in a game with infinite histories defines a correlated equilibrium for a game with a start date and derive simple necessary and sufficient conditions for determining if an arbitrary correlation device yields a correlated equilibrium. This allows, for games with a start date, the construction of all pure strategy sequential equilibria in this subclass.
An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games
This paper presents a technique for approximating, up to any precision, the
set of subgame-perfect equilibria (SPE) in discounted repeated games. The
process starts with a single hypercube approximation of the set of SPE. Then
the initial hypercube is gradually partitioned on to a set of smaller adjacent
hypercubes, while those hypercubes that cannot contain any point belonging to
the set of SPE are simultaneously withdrawn.
Whether a given hypercube can contain an equilibrium point is verified by an
appropriate mathematical program. Three different formulations of the algorithm
for both approximately computing the set of SPE payoffs and extracting players'
strategies are then proposed: the first two that do not assume the presence of
an external coordination between players, and the third one that assumes a
certain level of coordination during game play for convexifying the set of
continuation payoffs after any repeated game history.
A special attention is paid to the question of extracting players' strategies
and their representability in form of finite automata, an important feature for
artificial agent systems.Comment: 26 pages, 13 figures, 1 tabl
On the Existence of Pure Strategy Nash Equilibria in Integer-Splittable Weighted Congestion Games
We study the existence of pure strategy Nash equilibria (PSNE) in integer–splittable weighted congestion games (ISWCGs), where agents can strategically assign different amounts of demand to different resources, but must distribute this demand in fixed-size parts. Such scenarios arise in a wide range of application domains, including job scheduling and network routing, where agents have to allocate multiple tasks and can assign a number of tasks to a particular selected resource. Specifically, in an ISWCG, an agent has a certain total demand (aka weight) that it needs to satisfy, and can do so by requesting one or more integer units of each resource from an element of a given collection of feasible subsets. Each resource is associated with a unit–cost function of its level of congestion; as such, the cost to an agent for using a particular resource is the product of the resource unit–cost and the number of units the agent requests.While general ISWCGs do not admit PSNE [(Rosenthal, 1973b)], the restricted subclass of these games with linear unit–cost functions has been shown to possess a potential function [(Meyers, 2006)], and hence, PSNE. However, the linearity of costs may not be necessary for the existence of equilibria in pure strategies. Thus, in this paper we prove that PSNE always exist for a larger class of convex and monotonically increasing unit–costs. On the other hand, our result is accompanied by a limiting assumption on the structure of agents’ strategy sets: specifically, each agent is associated with its set of accessible resources, and can distribute its demand across any subset of these resources.Importantly, we show that neither monotonicity nor convexity on its own guarantees this result. Moreover, we give a counterexample with monotone and semi–convex cost functions, thus distinguishing ISWCGs from the class of infinitely–splittable congestion games for which the conditions of monotonicity and semi–convexity have been shown to be sufficient for PSNE existence [(Rosen, 1965)]. Furthermore, we demonstrate that the finite improvement path property (FIP) does not hold for convex increasing ISWCGs. Thus, in contrast to the case with linear costs, a potential function argument cannot be used to prove our result. Instead, we provide a procedure that converges to an equilibrium from an arbitrary initial strategy profile, and in doing so show that ISWCGs with convex increasing unit–cost functions are weakly acyclic
Open-ended Learning in Symmetric Zero-sum Games
Zero-sum games such as chess and poker are, abstractly, functions that
evaluate pairs of agents, for example labeling them `winner' and `loser'. If
the game is approximately transitive, then self-play generates sequences of
agents of increasing strength. However, nontransitive games, such as
rock-paper-scissors, can exhibit strategic cycles, and there is no longer a
clear objective -- we want agents to increase in strength, but against whom is
unclear. In this paper, we introduce a geometric framework for formulating
agent objectives in zero-sum games, in order to construct adaptive sequences of
objectives that yield open-ended learning. The framework allows us to reason
about population performance in nontransitive games, and enables the
development of a new algorithm (rectified Nash response, PSRO_rN) that uses
game-theoretic niching to construct diverse populations of effective agents,
producing a stronger set of agents than existing algorithms. We apply PSRO_rN
to two highly nontransitive resource allocation games and find that PSRO_rN
consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
- …