31 research outputs found

    Simplifying optimal strategies in limsup and liminf stochastic games

    Get PDF
    We consider two-player zero-sum stochastic games with the limsup and with the liminf payoffs. For the limsup payoff, we prove that the existence of an optimal strategy implies the existence of a stationary optimal strategy. Our construction does not require the knowledge of an optimal strategy, only its existence. The main technique of the proof is to analyze the game with specific restricted action spaces. For the liminf payoff, we prove that the existence of a subgame-optimal strategy (i.e. a strategy that is optimal in every subgame) implies the existence of a subgame-optimal strategy under which the prescribed mixed actions only depend on the current state and on the state and the actions chosen at the previous period. In particular, such a strategy requires only finite memory. The proof relies on techniques that originate in gambling theory. (C) 2018 Elsevier B.V. All rights reserved

    Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games

    Get PDF
    Hannan consistency, or no external regret, is a~key concept for learning in games. An action selection algorithm is Hannan consistent (HC) if its performance is eventually as good as selecting the~best fixed action in hindsight. If both players in a~zero-sum normal form game use a~Hannan consistent algorithm, their average behavior converges to a~Nash equilibrium (NE) of the~game. A similar result is known about extensive form games, but the~played strategies need to be Hannan consistent with respect to the~counterfactual values, which are often difficult to obtain. We study zero-sum extensive form games with simultaneous moves, but otherwise perfect information. These games generalize normal form games and they are a special case of extensive form games. We study whether applying HC algorithms in each decision point of these games directly to the~observed payoffs leads to convergence to a~Nash equilibrium. This learning process corresponds to a~class of Monte Carlo Tree Search algorithms, which are popular for playing simultaneous-move games but do not have any known performance guarantees. We show that using HC algorithms directly on the~observed payoffs is not sufficient to guarantee the~convergence. With an~additional averaging over joint actions, the~convergence is guaranteed, but empirically slower. We further define an~additional property of HC algorithms, which is sufficient to guarantee the~convergence without the~averaging and we empirically show that commonly used HC algorithms have this property.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0014

    Characterization and simplification of optimal strategies in positive stochastic games

    Get PDF
    We consider positive zero-sum stochastic games with countable state and action spaces. For each player, we provide a characterization of those strategies that are optimal in every subgame. These characterizations are used to prove two simplification results. We show that if player 2 has an optimal strategy then he/she also has a stationary optimal strategy, and prove the same for player 1 under the assumption that the state space and player 2's action space are finite

    Optimal pricing in a free market wireless network

    Get PDF
    We consider an ad-hoc wireless network operating within a free market economic model. Users send data over a choice of paths, and scheduling and routing decisions are updated dynamically based on time varying channel conditions, user mobility, and current network prices charged by intermediate nodes. Each node sets its own price for relaying services, with the goal of earning revenue that exceeds its time average reception and transmission expenses. We first develop a greedy pricing strategy that maximizes social welfare while ensuring all participants make non-negative profit. We then construct a (non-greedy) policy that balances profits more evenly by optimizing a profit fairness metric. Both algorithms operate in a distributed manner and do not require knowledge of traffic rates or channel statistics. This work demonstrates that individuals can benefit from carrying wireless devices even if they are not interested in their own personal communication

    Absorption paths and equilibria in quitting games

    Get PDF
    We study quitting games and introduce an alternative notion of strategy profiles—absorption paths. An absorption path is parametrized by the total probability of absorption in past play rather than by time, and it accommodates both discrete-time aspects and continuous-time aspects. We then define the concept of sequentially 0-perfect absorption paths, which are shown to be limits of ε-equilibrium strategy profiles as ε goes to 0. We establish that all quitting games that do not have simple equilibria (that is, an equilibrium where the game terminates in the first period or one where the game never terminates) have a sequentially 0-perfect absorption path. Finally, we prove the existence of sequentially 0-perfect absorption paths in a new class of quitting games

    A Simple and General Axiomatization of Average Utility Maximization for Infinite Streams

    Get PDF
    This paper provides, first, the most general preference axiomatization of average utility (AU) maximization over infinite sequences presently available, reaching almost complete generality (only restriction: all periodic sequences should be contained in the domain). Here, infinite sequences may designate intertemporal outcomes streams where AU models patience, or welfare allocations where AU models fairness, or decision under ambiguity where AU models complete ignorance. Second, as a methodological contribution, this paper shows that infinite-dimensional representations can be simpler, rather than more complex, than finite-dimensional ones: infinite dimensions provide a richness that is convenient rather than cumbersome. In particular, (empirically problematic) continuity assumptions are not needed. Continuity is optional

    A Bayesian Model of Voting in Juries

    Get PDF
    We take a game-theoretic approach to the analysis of juries by modelling voting as a game of incomplete information. Rather than the usual assumption of two possible signals (one indicating guilt, the other innocence), we allow jurors to perceive a full spectrum of signals. Given any voting rule requiring a fixed fraction of votes to convict, we characterize the unique symmetric equilibrium of the game, and we consider the possibility of asymmetric equilibria: we give a condition under which no asymmetric equilibria exist and show that, without under which no asymmetric equilibria exist and show that, without it, asymmetric equilibria may exist. We offer a condition under which unanimity rule exhibits a bias toward convicting the innocent, regardless of the size of the jury, and we exhibit an example showing this bias can be reversed. And we prove a "jury theorem" for our general model: as the size of the jury increases, the probability of a mistaken judgment goes to zero for every voting rule, except unanimity rule; for unanimity rule, we give a condition under which the probability of a mistake is bounded strictly above zero, and we show that, without this condition, the probability of a mistake may go to zero.
    corecore