4,537 research outputs found

    Linear algebraic structure of zero-determinant strategies in repeated games

    Full text link
    Zero-determinant (ZD) strategies, a recently found novel class of strategies in repeated games, has attracted much attention in evolutionary game theory. A ZD strategy unilaterally enforces a linear relation between average payoffs of players. Although existence and evolutional stability of ZD strategies have been studied in simple games, their mathematical properties have not been well-known yet. For example, what happens when more than one players employ ZD strategies have not been clarified. In this paper, we provide a general framework for investigating situations where more than one players employ ZD strategies in terms of linear algebra. First, we theoretically prove that a set of linear relations of average payoffs enforced by ZD strategies always has solutions, which implies that incompatible linear relations are impossible. Second, we prove that linear payoff relations are independent of each other under some conditions. These results hold for general games with public monitoring including perfect-monitoring games. Furthermore, we provide a simple example of a two-player game in which one player can simultaneously enforce two linear relations, that is, simultaneously control her and her opponent's average payoffs. All of these results elucidate general mathematical properties of ZD strategies.Comment: 19 pages, 2 figure

    An open reproducible framework for the study of the iterated prisoner's dilemma

    Get PDF
    The Axelrod library is an open source Python package that allows for reproducible game theoretic research into the Iterated Prisoner's Dilemma. This area of research began in the 1980s but suffers from a lack of documentation and test code. The goal of the library is to provide such a resource, with facilities for the design of new strategies and interactions between them, as well as conducting tournaments and ecological simulations for populations of strategies. With a growing collection of 139 strategies, the library is a also a platform for an original tournament that, in itself, is of interest to the game theoretic community. This paper describes the Iterated Prisoner's Dilemma, the Axelrod library and its development, and insights gained from some novel research.Comment: 11 pages, Journal of Open Research Software 4.1 (2016

    Reinforcement Learning with Perturbed Rewards

    Full text link
    Recent studies have shown that reinforcement learning (RL) models are vulnerable in various noisy scenarios. For instance, the observed reward channel is often subject to noise in practice (e.g., when rewards are collected through sensors), and is therefore not credible. In addition, for applications such as robotics, a deep reinforcement learning (DRL) algorithm can be manipulated to produce arbitrary errors by receiving corrupted rewards. In this paper, we consider noisy RL problems with perturbed rewards, which can be approximated with a confusion matrix. We develop a robust RL framework that enables agents to learn in noisy environments where only perturbed rewards are observed. Our solution framework builds on existing RL/DRL algorithms and firstly addresses the biased noisy reward setting without any assumptions on the true distribution (e.g., zero-mean Gaussian noise as made in previous works). The core ideas of our solution include estimating a reward confusion matrix and defining a set of unbiased surrogate rewards. We prove the convergence and sample complexity of our approach. Extensive experiments on different DRL platforms show that trained policies based on our estimated surrogate reward can achieve higher expected rewards, and converge faster than existing baselines. For instance, the state-of-the-art PPO algorithm is able to obtain 84.6% and 80.8% improvements on average score for five Atari games, with error rates as 10% and 30% respectively.Comment: AAAI 2020 (Spotlight

    The Evolution of Extortion in Iterated Prisoner's Dilemma Games

    Get PDF
    Iterated games are a fundamental component of economic and evolutionary game theory. They describe situations where two players interact repeatedly and have the possibility to use conditional strategies that depend on the outcome of previous interactions. In the context of evolution of cooperation, repeated games represent the mechanism of reciprocation. Recently a new class of strategies has been proposed, so called 'zero determinant strategies'. These strategies enforce a fixed linear relationship between one's own payoff and that of the other player. A subset of those strategies are 'extortioners' which ensure that any increase in the own payoff exceeds that of the other player by a fixed percentage. Here we analyze the evolutionary performance of this new class of strategies. We show that in reasonably large populations they can act as catalysts for the evolution of cooperation, similar to tit-for-tat, but they are not the stable outcome of natural selection. In very small populations, however, relative payoff differences between two players in a contest matter, and extortioners hold their ground. Extortion strategies do particularly well in co-evolutionary arms races between two distinct populations: significantly, they benefit the population which evolves at the slower rate - an instance of the so-called Red King effect. This may affect the evolution of interactions between host species and their endosymbionts.Comment: contains 4 figure

    Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma

    Get PDF
    We present tournament results and several powerful strategies for the Iterated Prisoner's Dilemma created using reinforcement learning techniques (evolutionary and particle swarm algorithms). These strategies are trained to perform well against a corpus of over 170 distinct opponents, including many well-known and classic strategies. All the trained strategies win standard tournaments against the total collection of other opponents. The trained strategies and one particular human made designed strategy are the top performers in noisy tournaments also
    • …
    corecore