116,121 research outputs found
Adaptive Regret Minimization in Bounded-Memory Games
Online learning algorithms that minimize regret provide strong guarantees in
situations that involve repeatedly making decisions in an uncertain
environment, e.g. a driver deciding what route to drive to work every day.
While regret minimization has been extensively studied in repeated games, we
study regret minimization for a richer class of games called bounded memory
games. In each round of a two-player bounded memory-m game, both players
simultaneously play an action, observe an outcome and receive a reward. The
reward may depend on the last m outcomes as well as the actions of the players
in the current round. The standard notion of regret for repeated games is no
longer suitable because actions and rewards can depend on the history of play.
To account for this generality, we introduce the notion of k-adaptive regret,
which compares the reward obtained by playing actions prescribed by the
algorithm against a hypothetical k-adaptive adversary with the reward obtained
by the best expert in hindsight against the same adversary. Roughly, a
hypothetical k-adaptive adversary adapts her strategy to the defender's actions
exactly as the real adversary would within each window of k rounds. Our
definition is parametrized by a set of experts, which can include both fixed
and adaptive defender strategies.
We investigate the inherent complexity of and design algorithms for adaptive
regret minimization in bounded memory games of perfect and imperfect
information. We prove a hardness result showing that, with imperfect
information, any k-adaptive regret minimizing algorithm (with fixed strategies
as experts) must be inefficient unless NP=RP even when playing against an
oblivious adversary. In contrast, for bounded memory games of perfect and
imperfect information we present approximate 0-adaptive regret minimization
algorithms against an oblivious adversary running in time n^{O(1)}.Comment: Full Version. GameSec 2013 (Invited Paper
The Big Match in Small Space
In this paper we study how to play (stochastic) games optimally using little
space. We focus on repeated games with absorbing states, a type of two-player,
zero-sum concurrent mean-payoff games. The prototypical example of these games
is the well known Big Match of Gillete (1957). These games may not allow
optimal strategies but they always have {\epsilon}-optimal strategies. In this
paper we design {\epsilon}-optimal strategies for Player 1 in these games that
use only O(log log T ) space. Furthermore, we construct strategies for Player 1
that use space s(T), for an arbitrary small unbounded non-decreasing function
s, and which guarantee an {\epsilon}-optimal value for Player 1 in the limit
superior sense. The previously known strategies use space {\Omega}(logT) and it
was known that no strategy can use constant space if it is {\epsilon}-optimal
even in the limit superior sense. We also give a complementary lower bound.
Furthermore, we also show that no Markov strategy, even extended with finite
memory, can ensure value greater than 0 in the Big Match, answering a question
posed by Abraham Neyman
Comparing reactive and memory-one strategies of direct reciprocity
Direct reciprocity is a mechanism for the evolution of cooperation based on
repeated interactions. When individuals meet repeatedly, they can use
conditional strategies to enforce cooperative outcomes that would not be
feasible in one-shot social dilemmas. Direct reciprocity requires that
individuals keep track of their past interactions and find the right response.
However, there are natural bounds on strategic complexity: Humans find it
difficult to remember past interactions accurately, especially over long
timespans. Given these limitations, it is natural to ask how complex strategies
need to be for cooperation to evolve. Here, we study stochastic evolutionary
game dynamics in finite populations to systematically compare the evolutionary
performance of reactive strategies, which only respond to the co-player's
previous move, and memory-one strategies, which take into account the own and
the co-player's previous move. In both cases, we compare deterministic strategy
and stochastic strategy spaces. For reactive strategies and small costs, we
find that stochasticity benefits cooperation, because it allows for
generous-tit-for-tat. For memory one strategies and small costs, we find that
stochasticity does not increase the propensity for cooperation, because the
deterministic rule of win-stay, lose-shift works best. For memory one
strategies and large costs, however, stochasticity can augment cooperation.Comment: 18 pages, 7 figure
Cooperation Enforcement and Collusion Resistance in Repeated Public Goods Games
Enforcing cooperation among substantial agents is one of the main objectives
for multi-agent systems. However, due to the existence of inherent social
dilemmas in many scenarios, the free-rider problem may arise during agents'
long-run interactions and things become even severer when self-interested
agents work in collusion with each other to get extra benefits. It is commonly
accepted that in such social dilemmas, there exists no simple strategy for an
agent whereby she can simultaneously manipulate on the utility of each of her
opponents and further promote mutual cooperation among all agents. Here, we
show that such strategies do exist. Under the conventional repeated public
goods game, we novelly identify them and find that, when confronted with such
strategies, a single opponent can maximize his utility only via global
cooperation and any colluding alliance cannot get the upper hand. Since a full
cooperation is individually optimal for any single opponent, a stable
cooperation among all players can be achieved. Moreover, we experimentally show
that these strategies can still promote cooperation even when the opponents are
both self-learning and collusive
Qualitative Analysis of Concurrent Mean-payoff Games
We consider concurrent games played by two-players on a finite-state graph,
where in every round the players simultaneously choose a move, and the current
state along with the joint moves determine the successor state. We study a
fundamental objective, namely, mean-payoff objective, where a reward is
associated to each transition, and the goal of player 1 is to maximize the
long-run average of the rewards, and the objective of player 2 is strictly the
opposite. The path constraint for player 1 could be qualitative, i.e., the
mean-payoff is the maximal reward, or arbitrarily close to it; or quantitative,
i.e., a given threshold between the minimal and maximal reward. We consider the
computation of the almost-sure (resp. positive) winning sets, where player 1
can ensure that the path constraint is satisfied with probability 1 (resp.
positive probability). Our main results for qualitative path constraints are as
follows: (1) we establish qualitative determinacy results that show that for
every state either player 1 has a strategy to ensure almost-sure (resp.
positive) winning against all player-2 strategies, or player 2 has a spoiling
strategy to falsify almost-sure (resp. positive) winning against all player-1
strategies; (2) we present optimal strategy complexity results that precisely
characterize the classes of strategies required for almost-sure and positive
winning for both players; and (3) we present quadratic time algorithms to
compute the almost-sure and the positive winning sets, matching the best known
bound of algorithms for much simpler problems (such as reachability
objectives). For quantitative constraints we show that a polynomial time
solution for the almost-sure or the positive winning set would imply a solution
to a long-standing open problem (the value problem for turn-based deterministic
mean-payoff games) that is not known to be solvable in polynomial time
- …