4,537 research outputs found
Linear algebraic structure of zero-determinant strategies in repeated games
Zero-determinant (ZD) strategies, a recently found novel class of strategies
in repeated games, has attracted much attention in evolutionary game theory. A
ZD strategy unilaterally enforces a linear relation between average payoffs of
players. Although existence and evolutional stability of ZD strategies have
been studied in simple games, their mathematical properties have not been
well-known yet. For example, what happens when more than one players employ ZD
strategies have not been clarified. In this paper, we provide a general
framework for investigating situations where more than one players employ ZD
strategies in terms of linear algebra. First, we theoretically prove that a set
of linear relations of average payoffs enforced by ZD strategies always has
solutions, which implies that incompatible linear relations are impossible.
Second, we prove that linear payoff relations are independent of each other
under some conditions. These results hold for general games with public
monitoring including perfect-monitoring games. Furthermore, we provide a simple
example of a two-player game in which one player can simultaneously enforce two
linear relations, that is, simultaneously control her and her opponent's
average payoffs. All of these results elucidate general mathematical properties
of ZD strategies.Comment: 19 pages, 2 figure
An open reproducible framework for the study of the iterated prisoner's dilemma
The Axelrod library is an open source Python package that allows for
reproducible game theoretic research into the Iterated Prisoner's Dilemma. This
area of research began in the 1980s but suffers from a lack of documentation
and test code. The goal of the library is to provide such a resource, with
facilities for the design of new strategies and interactions between them, as
well as conducting tournaments and ecological simulations for populations of
strategies.
With a growing collection of 139 strategies, the library is a also a platform
for an original tournament that, in itself, is of interest to the game
theoretic community. This paper describes the Iterated Prisoner's Dilemma, the
Axelrod library and its development, and insights gained from some novel
research.Comment: 11 pages, Journal of Open Research Software 4.1 (2016
Reinforcement Learning with Perturbed Rewards
Recent studies have shown that reinforcement learning (RL) models are
vulnerable in various noisy scenarios. For instance, the observed reward
channel is often subject to noise in practice (e.g., when rewards are collected
through sensors), and is therefore not credible. In addition, for applications
such as robotics, a deep reinforcement learning (DRL) algorithm can be
manipulated to produce arbitrary errors by receiving corrupted rewards. In this
paper, we consider noisy RL problems with perturbed rewards, which can be
approximated with a confusion matrix. We develop a robust RL framework that
enables agents to learn in noisy environments where only perturbed rewards are
observed. Our solution framework builds on existing RL/DRL algorithms and
firstly addresses the biased noisy reward setting without any assumptions on
the true distribution (e.g., zero-mean Gaussian noise as made in previous
works). The core ideas of our solution include estimating a reward confusion
matrix and defining a set of unbiased surrogate rewards. We prove the
convergence and sample complexity of our approach. Extensive experiments on
different DRL platforms show that trained policies based on our estimated
surrogate reward can achieve higher expected rewards, and converge faster than
existing baselines. For instance, the state-of-the-art PPO algorithm is able to
obtain 84.6% and 80.8% improvements on average score for five Atari games, with
error rates as 10% and 30% respectively.Comment: AAAI 2020 (Spotlight
The Evolution of Extortion in Iterated Prisoner's Dilemma Games
Iterated games are a fundamental component of economic and evolutionary game
theory. They describe situations where two players interact repeatedly and have
the possibility to use conditional strategies that depend on the outcome of
previous interactions. In the context of evolution of cooperation, repeated
games represent the mechanism of reciprocation. Recently a new class of
strategies has been proposed, so called 'zero determinant strategies'. These
strategies enforce a fixed linear relationship between one's own payoff and
that of the other player. A subset of those strategies are 'extortioners' which
ensure that any increase in the own payoff exceeds that of the other player by
a fixed percentage. Here we analyze the evolutionary performance of this new
class of strategies. We show that in reasonably large populations they can act
as catalysts for the evolution of cooperation, similar to tit-for-tat, but they
are not the stable outcome of natural selection. In very small populations,
however, relative payoff differences between two players in a contest matter,
and extortioners hold their ground. Extortion strategies do particularly well
in co-evolutionary arms races between two distinct populations: significantly,
they benefit the population which evolves at the slower rate - an instance of
the so-called Red King effect. This may affect the evolution of interactions
between host species and their endosymbionts.Comment: contains 4 figure
Reinforcement Learning Produces Dominant Strategies for the Iterated Prisoner's Dilemma
We present tournament results and several powerful strategies for the
Iterated Prisoner's Dilemma created using reinforcement learning techniques
(evolutionary and particle swarm algorithms). These strategies are trained to
perform well against a corpus of over 170 distinct opponents, including many
well-known and classic strategies. All the trained strategies win standard
tournaments against the total collection of other opponents. The trained
strategies and one particular human made designed strategy are the top
performers in noisy tournaments also
- …