65,341 research outputs found

    No-Regret Learning in Extensive-Form Games with Imperfect Recall

    Full text link
    Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR's regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR's guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.Comment: 21 pages, 4 figures, expanded version of article to appear in Proceedings of the Twenty-Ninth International Conference on Machine Learnin

    Extensive-Form Perfect Equilibrium Computation in Two-Player Games

    Full text link
    We study the problem of computing an Extensive-Form Perfect Equilibrium (EFPE) in 2-player games. This equilibrium concept refines the Nash equilibrium requiring resilience w.r.t. a specific vanishing perturbation (representing mistakes of the players at each decision node). The scientific challenge is intrinsic to the EFPE definition: it requires a perturbation over the agent form, but the agent form is computationally inefficient, due to the presence of highly nonlinear constraints. We show that the sequence form can be exploited in a non-trivial way and that, for general-sum games, finding an EFPE is equivalent to solving a suitably perturbed linear complementarity problem. We prove that Lemke's algorithm can be applied, showing that computing an EFPE is PPAD\textsf{PPAD}-complete. In the notable case of zero-sum games, the problem is in FP\textsf{FP} and can be solved by linear programming. Our algorithms also allow one to find a Nash equilibrium when players cannot perfectly control their moves, being subject to a given execution uncertainty, as is the case in most realistic physical settings.Comment: To appear in AAAI 1

    A finite-state, finite-memory minimum principle, part 2

    Get PDF
    In part 1 of this paper, a minimum principle was found for the finite-state, finite-memory (FSFM) stochastic control problem. In part 2, conditions for the sufficiency of the minimum principle are stated in terms of the informational properties of the problem. This is accomplished by introducing the notion of a signaling strategy. Then a min-H algorithm based on the FSFM minimum principle is presented. This algorithm converges, after a finite number of steps, to a person - by - person extremal solution

    Solving Imperfect Information Games Using Decomposition

    Full text link
    Decomposition, i.e. independently analyzing possible subgames, has proven to be an essential principle for effective decision-making in perfect information games. However, in imperfect information games, decomposition has proven to be problematic. To date, all proposed techniques for decomposition in imperfect information games have abandoned theoretical guarantees. This work presents the first technique for decomposing an imperfect information game into subgames that can be solved independently, while retaining optimality guarantees on the full-game solution. We can use this technique to construct theoretically justified algorithms that make better use of information available at run-time, overcome memory or disk limitations at run-time, or make a time/space trade-off to overcome memory or disk limitations while solving a game. In particular, we present an algorithm for subgame solving which guarantees performance in the whole game, in contrast to existing methods which may have unbounded error. In addition, we present an offline game solving algorithm, CFR-D, which can produce a Nash equilibrium for a game that is larger than available storage.Comment: 7 pages by 2 columns, 5 figures; April 21 2014 - expand explanations and theor

    Inductive Game Theory: A Basic Scenario

    Get PDF
    The aim of this paper is to present the new theory called “inductive game theory”. A paper, published by one of the present authors with A. Matsui, discussed some part of inductive game theory in a specific game. Here, we will give a more developed discourse of the theory. The paper is written to show one entire picture of the theory: From individual raw experiences, short-term memories to long-term memories, inductive derivation of individual views, classification of such views, decision making or modification of behavior based on a view, and repercussion from the modified play in the objective game. We focus on some clear-cut cases, forgetting a lot of possible variants, but will still give a lot of results. In order to show one possible discourse as a whole, we will ask the question of how Nash equilibrium is emerging from the viewpoint of inductive game theory, and will give one answer.

    Theoretical and Practical Advances on Smoothing for Extensive-Form Games

    Full text link
    Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of Ω(bdd)\Omega(b^dd), where bb is the branching factor of the player, and dd is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice
    corecore