6,101 research outputs found

    Smoothing Method for Approximate Extensive-Form Perfect Equilibrium

    Full text link
    Nash equilibrium is a popular solution concept for solving imperfect-information games in practice. However, it has a major drawback: it does not preclude suboptimal play in branches of the game tree that are not reached in equilibrium. Equilibrium refinements can mend this issue, but have experienced little practical adoption. This is largely due to a lack of scalable algorithms. Sparse iterative methods, in particular first-order methods, are known to be among the most effective algorithms for computing Nash equilibria in large-scale two-player zero-sum extensive-form games. In this paper, we provide, to our knowledge, the first extension of these methods to equilibrium refinements. We develop a smoothing approach for behavioral perturbations of the convex polytope that encompasses the strategy spaces of players in an extensive-form game. This enables one to compute an approximate variant of extensive-form perfect equilibria. Experiments show that our smoothing approach leads to solutions with dramatically stronger strategies at information sets that are reached with low probability in approximate Nash equilibria, while retaining the overall convergence rate associated with fast algorithms for Nash equilibrium. This has benefits both in approximate equilibrium finding (such approximation is necessary in practice in large games) where some probabilities are low while possibly heading toward zero in the limit, and exact equilibrium computation where the low probabilities are actually zero.Comment: Published at IJCAI 1

    Model and Reinforcement Learning for Markov Games with Risk Preferences

    Full text link
    We motivate and propose a new model for non-cooperative Markov game which considers the interactions of risk-aware players. This model characterizes the time-consistent dynamic "risk" from both stochastic state transitions (inherent to the game) and randomized mixed strategies (due to all other players). An appropriate risk-aware equilibrium concept is proposed and the existence of such equilibria is demonstrated in stationary strategies by an application of Kakutani's fixed point theorem. We further propose a simulation-based Q-learning type algorithm for risk-aware equilibrium computation. This algorithm works with a special form of minimax risk measures which can naturally be written as saddle-point stochastic optimization problems, and covers many widely investigated risk measures. Finally, the almost sure convergence of this simulation-based algorithm to an equilibrium is demonstrated under some mild conditions. Our numerical experiments on a two player queuing game validate the properties of our model and algorithm, and demonstrate their worth and applicability in real life competitive decision-making.Comment: 38 pages, 6 tables, 5 figure

    Private monitoring with infinite histories

    Get PDF
    This paper develops new recursive methods for studying stationary sequential equilibria in games with private monitoring. We first consider games where play has occurred forever into the past and develop methods for analyzing a large class of stationary strategies, where the main restriction is that the strategy can be represented as a finite automaton. For a subset of this class, strategies which depend only on the players’ signals in the last k periods, these methods allow the construction of all pure strategy equilibria. We then show that each sequential equilibrium in a game with infinite histories defines a correlated equilibrium for a game with a start date and derive simple necessary and sufficient conditions for determining if an arbitrary correlation device yields a correlated equilibrium. This allows, for games with a start date, the construction of all pure strategy sequential equilibria in this subclass.

    An Approximate Subgame-Perfect Equilibrium Computation Technique for Repeated Games

    Full text link
    This paper presents a technique for approximating, up to any precision, the set of subgame-perfect equilibria (SPE) in discounted repeated games. The process starts with a single hypercube approximation of the set of SPE. Then the initial hypercube is gradually partitioned on to a set of smaller adjacent hypercubes, while those hypercubes that cannot contain any point belonging to the set of SPE are simultaneously withdrawn. Whether a given hypercube can contain an equilibrium point is verified by an appropriate mathematical program. Three different formulations of the algorithm for both approximately computing the set of SPE payoffs and extracting players' strategies are then proposed: the first two that do not assume the presence of an external coordination between players, and the third one that assumes a certain level of coordination during game play for convexifying the set of continuation payoffs after any repeated game history. A special attention is paid to the question of extracting players' strategies and their representability in form of finite automata, an important feature for artificial agent systems.Comment: 26 pages, 13 figures, 1 tabl

    On the Existence of Pure Strategy Nash Equilibria in Integer-Splittable Weighted Congestion Games

    No full text
    We study the existence of pure strategy Nash equilibria (PSNE) in integer–splittable weighted congestion games (ISWCGs), where agents can strategically assign different amounts of demand to different resources, but must distribute this demand in fixed-size parts. Such scenarios arise in a wide range of application domains, including job scheduling and network routing, where agents have to allocate multiple tasks and can assign a number of tasks to a particular selected resource. Specifically, in an ISWCG, an agent has a certain total demand (aka weight) that it needs to satisfy, and can do so by requesting one or more integer units of each resource from an element of a given collection of feasible subsets. Each resource is associated with a unit–cost function of its level of congestion; as such, the cost to an agent for using a particular resource is the product of the resource unit–cost and the number of units the agent requests.While general ISWCGs do not admit PSNE [(Rosenthal, 1973b)], the restricted subclass of these games with linear unit–cost functions has been shown to possess a potential function [(Meyers, 2006)], and hence, PSNE. However, the linearity of costs may not be necessary for the existence of equilibria in pure strategies. Thus, in this paper we prove that PSNE always exist for a larger class of convex and monotonically increasing unit–costs. On the other hand, our result is accompanied by a limiting assumption on the structure of agents’ strategy sets: specifically, each agent is associated with its set of accessible resources, and can distribute its demand across any subset of these resources.Importantly, we show that neither monotonicity nor convexity on its own guarantees this result. Moreover, we give a counterexample with monotone and semi–convex cost functions, thus distinguishing ISWCGs from the class of infinitely–splittable congestion games for which the conditions of monotonicity and semi–convexity have been shown to be sufficient for PSNE existence [(Rosen, 1965)]. Furthermore, we demonstrate that the finite improvement path property (FIP) does not hold for convex increasing ISWCGs. Thus, in contrast to the case with linear costs, a potential function argument cannot be used to prove our result. Instead, we provide a procedure that converges to an equilibrium from an arbitrary initial strategy profile, and in doing so show that ISWCGs with convex increasing unit–cost functions are weakly acyclic

    Open-ended Learning in Symmetric Zero-sum Games

    Get PDF
    Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'. If the game is approximately transitive, then self-play generates sequences of agents of increasing strength. However, nontransitive games, such as rock-paper-scissors, can exhibit strategic cycles, and there is no longer a clear objective -- we want agents to increase in strength, but against whom is unclear. In this paper, we introduce a geometric framework for formulating agent objectives in zero-sum games, in order to construct adaptive sequences of objectives that yield open-ended learning. The framework allows us to reason about population performance in nontransitive games, and enables the development of a new algorithm (rectified Nash response, PSRO_rN) that uses game-theoretic niching to construct diverse populations of effective agents, producing a stronger set of agents than existing algorithms. We apply PSRO_rN to two highly nontransitive resource allocation games and find that PSRO_rN consistently outperforms the existing alternatives.Comment: ICML 2019, final versio
    corecore