3,829 research outputs found

    A Continuation Method for Nash Equilibria in Structured Games

    Full text link
    Structured game representations have recently attracted interest as models for multi-agent artificial intelligence scenarios, with rational behavior most commonly characterized by Nash equilibria. This paper presents efficient, exact algorithms for computing Nash equilibria in structured game representations, including both graphical games and multi-agent influence diagrams (MAIDs). The algorithms are derived from a continuation method for normal-form and extensive-form games due to Govindan and Wilson; they follow a trajectory through a space of perturbed games and their equilibria, exploiting game structure through fast computation of the Jacobian of the payoff function. They are theoretically guaranteed to find at least one equilibrium of the game, and may find more. Our approach provides the first efficient algorithm for computing exact equilibria in graphical games with arbitrary topology, and the first algorithm to exploit fine-grained structural properties of MAIDs. Experimental results are presented demonstrating the effectiveness of the algorithms and comparing them to predecessors. The running time of the graphical game algorithm is similar to, and often better than, the running time of previous approximate algorithms. The algorithm for MAIDs can effectively solve games that are much larger than those solvable by previous methods

    Online Convex Optimization for Sequential Decision Processes and Extensive-Form Games

    Full text link
    Regret minimization is a powerful tool for solving large-scale extensive-form games. State-of-the-art methods rely on minimizing regret locally at each decision point. In this work we derive a new framework for regret minimization on sequential decision problems and extensive-form games with general compact convex sets at each decision point and general convex losses, as opposed to prior work which has been for simplex decision points and linear losses. We call our framework laminar regret decomposition. It generalizes the CFR algorithm to this more general setting. Furthermore, our framework enables a new proof of CFR even in the known setting, which is derived from a perspective of decomposing polytope regret, thereby leading to an arguably simpler interpretation of the algorithm. Our generalization to convex compact sets and convex losses allows us to develop new algorithms for several problems: regularized sequential decision making, regularized Nash equilibria in extensive-form games, and computing approximate extensive-form perfect equilibria. Our generalization also leads to the first regret-minimization algorithm for computing reduced-normal-form quantal response equilibria based on minimizing local regrets. Experiments show that our framework leads to algorithms that scale at a rate comparable to the fastest variants of counterfactual regret minimization for computing Nash equilibrium, and therefore our approach leads to the first algorithm for computing quantal response equilibria in extremely large games. Finally we show that our framework enables a new kind of scalable opponent exploitation approach

    Ambiguity and Social Interaction

    Get PDF
    We present a non-technical account of ambiguity in strategic games and show how it may be applied to economics and social sciences. Optimistic and pessimistic responses to ambiguity are formally modelled. We show that pessimism has the effect of increasing (decreasing) equilibrium prices under Cournot (Bertrand) competition. In addition the effects of ambiguity on peace-making are examined. It is shown that ambiguity may select equilibria in coordination games with multiple equilibria. Some comparative statics results are derived for the impact of ambiguity in games with strategic complements

    Experience-weighted Attraction Learning in Normal Form Games

    Get PDF
    In ‘experience-weighted attraction’ (EWA) learning, strategies have attractions that reflect initial predispositions, are updated based on payoff experience, and determine choice probabilities according to some rule (e.g., logit). A key feature is a parameter δ that weights the strength of hypothetical reinforcement of strategies that were not chosen according to the payoff they would have yielded, relative to reinforcement of chosen strategies according to received payoffs. The other key features are two discount rates, φ and ρ, which separately discount previous attractions, and an experience weight. EWA includes reinforcement learning and weighted fictitious play (belief learning) as special cases, and hybridizes their key elements. When δ= 0 and ρ= 0, cumulative choice reinforcement results. When δ= 1 and ρ=φ, levels of reinforcement of strategies are exactly the same as expected payoffs given weighted fictitious play beliefs. Using three sets of experimental data, parameter estimates of the model were calibrated on part of the data and used to predict a holdout sample. Estimates of δ are generally around .50, φ around .8 − 1, and ρ varies from 0 to φ. Reinforcement and belief-learning special cases are generally rejected in favor of EWA, though belief models do better in some constant-sum games. EWA is able to combine the best features of previous approaches, allowing attractions to begin and grow flexibly as choice reinforcement does, but reinforcing unchosen strategies substantially as belief-based models implicitly do

    Recursive Inspection Games

    Get PDF
    We consider a sequential inspection game where an inspector uses a limited number of inspections over a larger number of time periods to detect a violation (an illegal act) of an inspectee. Compared with earlier models, we allow varying rewards to the inspectee for successful violations. As one possible example, the most valuable reward may be the completion of a sequence of thefts of nuclear material needed to build a nuclear bomb. The inspectee can observe the inspector, but the inspector can only determine if a violation happens during a stage where he inspects, which terminates the game; otherwise the game continues. Under reasonable assumptions for the payoffs, the inspector's strategy is independent of the number of successful violations. This allows to apply a recursive description of the game, even though this normally assumes fully informed players after each stage. The resulting recursive equation in three variables for the equilibrium payoff of the game, which generalizes several other known equations of this kind, is solved explicitly in terms of sums of binomial coefficients. We also extend this approach to non-zero-sum games and, similar to Maschler (1966), "inspector leadership" where the inspector commits to (the same) randomized inspection schedule, but the inspectee acts legally (rather than mixes as in the simultaneous game) as long as inspections remain.Comment: final version for Mathematics of Operations Research, new Theorem
    corecore