3,829 research outputs found
A Continuation Method for Nash Equilibria in Structured Games
Structured game representations have recently attracted interest as models
for multi-agent artificial intelligence scenarios, with rational behavior most
commonly characterized by Nash equilibria. This paper presents efficient, exact
algorithms for computing Nash equilibria in structured game representations,
including both graphical games and multi-agent influence diagrams (MAIDs). The
algorithms are derived from a continuation method for normal-form and
extensive-form games due to Govindan and Wilson; they follow a trajectory
through a space of perturbed games and their equilibria, exploiting game
structure through fast computation of the Jacobian of the payoff function. They
are theoretically guaranteed to find at least one equilibrium of the game, and
may find more. Our approach provides the first efficient algorithm for
computing exact equilibria in graphical games with arbitrary topology, and the
first algorithm to exploit fine-grained structural properties of MAIDs.
Experimental results are presented demonstrating the effectiveness of the
algorithms and comparing them to predecessors. The running time of the
graphical game algorithm is similar to, and often better than, the running time
of previous approximate algorithms. The algorithm for MAIDs can effectively
solve games that are much larger than those solvable by previous methods
Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games
Regret minimization is a powerful tool for solving large-scale extensive-form
games. State-of-the-art methods rely on minimizing regret locally at each
decision point. In this work we derive a new framework for regret minimization
on sequential decision problems and extensive-form games with general compact
convex sets at each decision point and general convex losses, as opposed to
prior work which has been for simplex decision points and linear losses. We
call our framework laminar regret decomposition. It generalizes the CFR
algorithm to this more general setting. Furthermore, our framework enables a
new proof of CFR even in the known setting, which is derived from a perspective
of decomposing polytope regret, thereby leading to an arguably simpler
interpretation of the algorithm. Our generalization to convex compact sets and
convex losses allows us to develop new algorithms for several problems:
regularized sequential decision making, regularized Nash equilibria in
extensive-form games, and computing approximate extensive-form perfect
equilibria. Our generalization also leads to the first regret-minimization
algorithm for computing reduced-normal-form quantal response equilibria based
on minimizing local regrets. Experiments show that our framework leads to
algorithms that scale at a rate comparable to the fastest variants of
counterfactual regret minimization for computing Nash equilibrium, and
therefore our approach leads to the first algorithm for computing quantal
response equilibria in extremely large games. Finally we show that our
framework enables a new kind of scalable opponent exploitation approach
Ambiguity and Social Interaction
We present a non-technical account of ambiguity in strategic games and show how it may be applied to economics and social sciences. Optimistic and pessimistic responses to ambiguity are formally modelled. We show that pessimism has the effect of increasing (decreasing) equilibrium prices under Cournot (Bertrand) competition. In addition the effects of ambiguity on peace-making are examined. It is shown that ambiguity may select equilibria in coordination games with multiple equilibria. Some comparative statics results are derived for the impact of ambiguity in games with strategic complements
Experience-weighted Attraction Learning in Normal Form Games
In ‘experience-weighted attraction’ (EWA) learning, strategies have attractions that reflect initial predispositions, are updated based on payoff experience, and determine choice probabilities according to some rule (e.g., logit). A key feature is a parameter δ that weights the strength of hypothetical reinforcement of strategies that were not chosen according to the payoff they would have yielded, relative to reinforcement of chosen strategies according to received payoffs. The other key features are two discount rates, φ and ρ, which separately discount previous attractions, and an experience weight. EWA includes reinforcement learning and weighted fictitious play (belief learning) as special cases, and hybridizes their key elements. When δ= 0 and ρ= 0, cumulative choice reinforcement results. When δ= 1 and ρ=φ, levels of reinforcement of strategies are exactly the same as expected payoffs given weighted fictitious play beliefs. Using three sets of experimental data, parameter estimates of the model were calibrated on part of the data and used to predict a holdout sample. Estimates of δ are generally around .50, φ around .8 − 1, and ρ varies from 0 to φ. Reinforcement and belief-learning special cases are generally rejected in favor of EWA, though belief models do better in some constant-sum games. EWA is able to combine the best features of previous approaches, allowing attractions to begin and grow flexibly as choice reinforcement does, but reinforcing unchosen strategies substantially as belief-based models implicitly do
Recursive Inspection Games
We consider a sequential inspection game where an inspector uses a limited
number of inspections over a larger number of time periods to detect a
violation (an illegal act) of an inspectee. Compared with earlier models, we
allow varying rewards to the inspectee for successful violations. As one
possible example, the most valuable reward may be the completion of a sequence
of thefts of nuclear material needed to build a nuclear bomb. The inspectee can
observe the inspector, but the inspector can only determine if a violation
happens during a stage where he inspects, which terminates the game; otherwise
the game continues. Under reasonable assumptions for the payoffs, the
inspector's strategy is independent of the number of successful violations.
This allows to apply a recursive description of the game, even though this
normally assumes fully informed players after each stage. The resulting
recursive equation in three variables for the equilibrium payoff of the game,
which generalizes several other known equations of this kind, is solved
explicitly in terms of sums of binomial coefficients. We also extend this
approach to non-zero-sum games and, similar to Maschler (1966), "inspector
leadership" where the inspector commits to (the same) randomized inspection
schedule, but the inspectee acts legally (rather than mixes as in the
simultaneous game) as long as inspections remain.Comment: final version for Mathematics of Operations Research, new Theorem
- …