31 research outputs found
Simplifying optimal strategies in limsup and liminf stochastic games
We consider two-player zero-sum stochastic games with the limsup and with the liminf payoffs. For the limsup payoff, we prove that the existence of an optimal strategy implies the existence of a stationary optimal strategy. Our construction does not require the knowledge of an optimal strategy, only its existence. The main technique of the proof is to analyze the game with specific restricted action spaces. For the liminf payoff, we prove that the existence of a subgame-optimal strategy (i.e. a strategy that is optimal in every subgame) implies the existence of a subgame-optimal strategy under which the prescribed mixed actions only depend on the current state and on the state and the actions chosen at the previous period. In particular, such a strategy requires only finite memory. The proof relies on techniques that originate in gambling theory. (C) 2018 Elsevier B.V. All rights reserved
Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games
Hannan consistency, or no external regret, is a~key concept for learning in
games. An action selection algorithm is Hannan consistent (HC) if its
performance is eventually as good as selecting the~best fixed action in
hindsight. If both players in a~zero-sum normal form game use a~Hannan
consistent algorithm, their average behavior converges to a~Nash equilibrium
(NE) of the~game. A similar result is known about extensive form games, but
the~played strategies need to be Hannan consistent with respect to
the~counterfactual values, which are often difficult to obtain. We study
zero-sum extensive form games with simultaneous moves, but otherwise perfect
information. These games generalize normal form games and they are a special
case of extensive form games. We study whether applying HC algorithms in each
decision point of these games directly to the~observed payoffs leads to
convergence to a~Nash equilibrium. This learning process corresponds to a~class
of Monte Carlo Tree Search algorithms, which are popular for playing
simultaneous-move games but do not have any known performance guarantees. We
show that using HC algorithms directly on the~observed payoffs is not
sufficient to guarantee the~convergence. With an~additional averaging over
joint actions, the~convergence is guaranteed, but empirically slower. We
further define an~additional property of HC algorithms, which is sufficient to
guarantee the~convergence without the~averaging and we empirically show that
commonly used HC algorithms have this property.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0014
Characterization and simplification of optimal strategies in positive stochastic games
We consider positive zero-sum stochastic games with countable state and action spaces. For each player, we provide a characterization of those strategies that are optimal in every subgame. These characterizations are used to prove two simplification results. We show that if player 2 has an optimal strategy then he/she also has a stationary optimal strategy, and prove the same for player 1 under the assumption that the state space and player 2's action space are finite
Optimal pricing in a free market wireless network
We consider an ad-hoc wireless network operating within a free market economic model. Users send data over a choice of paths, and scheduling and routing decisions are updated dynamically based on time varying channel conditions, user mobility, and current network prices charged by intermediate nodes. Each node sets its own price for relaying services, with the goal of earning revenue that exceeds its time average reception and transmission expenses. We first develop a greedy pricing strategy that maximizes social welfare while ensuring all participants make non-negative profit. We then construct a (non-greedy) policy that balances profits more evenly by optimizing a profit fairness metric. Both algorithms operate in a distributed manner and do not require knowledge of traffic rates or channel statistics. This work demonstrates that individuals can benefit from carrying wireless devices even if they are not interested in their own personal communication
Absorption paths and equilibria in quitting games
We study quitting games and introduce an alternative notion of strategy profiles—absorption paths. An absorption path is parametrized by the total probability of absorption in past play rather than by time, and it accommodates both discrete-time aspects and continuous-time aspects. We then define the concept of sequentially 0-perfect absorption paths, which are shown to be limits of ε-equilibrium strategy profiles as ε goes to 0. We establish that all quitting games that do not have simple equilibria (that is, an equilibrium where the game terminates in the first period or one where the game never terminates) have a sequentially 0-perfect absorption path. Finally, we prove the existence of sequentially 0-perfect absorption paths in a new class of quitting games
A Simple and General Axiomatization of Average Utility Maximization for Infinite Streams
This paper provides, first, the most general preference axiomatization of average utility (AU) maximization over infinite sequences presently available, reaching almost complete generality (only restriction: all periodic sequences should be contained in the domain). Here, infinite sequences may designate intertemporal outcomes streams where AU models patience, or welfare allocations where AU models fairness, or decision under ambiguity where AU models complete ignorance. Second, as a methodological contribution, this paper shows that infinite-dimensional representations can be simpler, rather than more complex, than finite-dimensional ones: infinite dimensions provide a richness that is convenient rather than cumbersome. In particular, (empirically problematic) continuity assumptions are not needed. Continuity is optional
A Bayesian Model of Voting in Juries
We take a game-theoretic approach to the analysis of juries by modelling voting as a game of incomplete information. Rather than the usual assumption of two possible signals (one indicating guilt, the other innocence), we allow jurors to perceive a full spectrum of signals. Given any voting rule requiring a fixed fraction of votes to convict, we characterize the unique symmetric equilibrium of the game, and we consider the possibility of asymmetric equilibria: we give a condition under which no asymmetric equilibria exist and show that, without under which no asymmetric equilibria exist and show that, without it, asymmetric equilibria may exist. We offer a condition under which unanimity rule exhibits a bias toward convicting the innocent, regardless of the size of the jury, and we exhibit an example showing this bias can be reversed. And we prove a "jury theorem" for our general model: as the size of the jury increases, the probability of a mistaken judgment goes to zero for every voting rule, except unanimity rule; for unanimity rule, we give a condition under which the probability of a mistake is bounded strictly above zero, and we show that, without this condition, the probability of a mistake may go to zero.