181 research outputs found

    Convergent learning algorithms for potential games with unknown noisy rewards

    Get PDF
    In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and which must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned on-line. Standard results in game theory, however, do not consider such settings. Specifically, using results from stochastic approximation and differential inclusions, we prove the convergence of variants of fictitious play and adaptive play to Nash equilibria in potential games and weakly acyclic games, respectively. These variants all use a multi-agent version of Q-learning to estimate the reward functions and a novel form of the e-greedy decision rule to select an action. Furthermore, we derive e-greedy decision rules that exploit the sparse interaction structure encoded in two compact graphical representations of games, known as graphical and hypergraphical normal form, to improve the convergence rate of the learning algorithms. The structure captured in these representations naturally occurs in many distributed optimisation and control applications. Finally, we demonstrate the efficacy of the algorithms in a simulated ad hoc wireless sensor network management problem

    A User''s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method

    Get PDF
    This paper provides a step-by-step guide to solving dynamic stochastic games using the homotopy method. The homotopy method facilitates exploring the equilibrium correspondence in a systematic fashion; it is especially useful in games that have multiple equilibria. We discuss the theory of the homotopy method and its implementation and present two detailed examples of dynamic stochastic games that are solved using this method.

    Distributed convergence to Nash equilibria in two-network zero-sum games

    Full text link
    This paper considers a class of strategic scenarios in which two networks of agents have opposing objectives with regards to the optimization of a common objective function. In the resulting zero-sum game, individual agents collaborate with neighbors in their respective network and have only partial knowledge of the state of the agents in the other network. For the case when the interaction topology of each network is undirected, we synthesize a distributed saddle-point strategy and establish its convergence to the Nash equilibrium for the class of strictly concave-convex and locally Lipschitz objective functions. We also show that this dynamics does not converge in general if the topologies are directed. This justifies the introduction, in the directed case, of a generalization of this distributed dynamics which we show converges to the Nash equilibrium for the class of strictly concave-convex differentiable functions with locally Lipschitz gradients. The technical approach combines tools from algebraic graph theory, nonsmooth analysis, set-valued dynamical systems, and game theory

    A Newton Collocation Method for Solving Dynamic Bargaining Games

    Get PDF
    We develop and implement a collocation method to solve for an equilibrium in the dynamic legislative bargaining game of Duggan and Kalandrakis (2008). We formulate the collocation equations in a quasi-discrete version of the model, and we show that the collocation equations are locally Lipchitz continuous and directionally differentiable. In numerical experiments, we successfully implement a globally convergent variant of Broyden's method on a preconditioned version of the collocation equations, and the method economizes on computation cost by more than 50% compared to the value iteration method. We rely on a continuity property of the equilibrium set to obtain increasingly precise approximations of solutions to the continuum model. We showcase these techniques with an illustration of the dynamic core convergence theorem of Duggan and Kalandrakis (2008) in a nine-player, two-dimensional model with negative quadratic preferences.

    A User\u27s Guide to Solving Dynamic Stochastic Games Using the Homotopy Method

    Get PDF
    This paper provides a step-by-step guide to solving dynamic stochastic games using the homotopy method. The homotopy method facilitates exploring the equilibrium correspondence in a systematic fashion; it is especially useful in games that have multiple equilibria. We discuss the theory of the homotopy method and its implementation and present two detailed examples of dynamic stochastic games that are solved using this method

    A Douglas-Rachford splitting for semi-decentralized equilibrium seeking in generalized aggregative games

    Full text link
    We address the generalized aggregative equilibrium seeking problem for noncooperative agents playing average aggregative games with affine coupling constraints. First, we use operator theory to characterize the generalized aggregative equilibria of the game as the zeros of a monotone set-valued operator. Then, we massage the Douglas-Rachford splitting to solve the monotone inclusion problem and derive a single layer, semi-decentralized algorithm whose global convergence is guaranteed under mild assumptions. The potential of the proposed Douglas-Rachford algorithm is shown on a simplified resource allocation game, where we observe faster convergence with respect to forward-backward algorithms.Comment: arXiv admin note: text overlap with arXiv:1803.1044

    Convergent learning algorithms for unknown reward games

    Get PDF
    In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned online, but standard results in game theory do not consider such settings. For this problem, we derive a multiagent version of Q\mathcal{Q}-learning to estimate the reward functions using novel forms of the ϵ\epsilon-greedy learning policy. Using these Q\mathcal{Q}-learning schemes to estimate reward functions, we then provide conditions guaranteeing the convergence of adaptive play and the better-reply processes to Nash equilibria in potential games and games with more general forms of acyclicity, and of regret matching to the set of correlated equilibria in generic games. A secondary result is that we prove the strong ergoditicity of stochastic adaptive play and stochastic better-reply processes in the case of vanishing perturbations. Finally, we illustrate the efficacy of the algorithms in a set of randomly generated three-player coordination games and show the practical necessity of our results by demonstrating that violations to the derived learning parameter conditions can cause the algorithms to fail to converge
    corecore