15,868 research outputs found

    Function Approximation for Solving Stackelberg Equilibrium in Large Perfect Information Games

    Full text link
    Function approximation (FA) has been a critical component in solving large zero-sum games. Yet, little attention has been given towards FA in solving \textit{general-sum} extensive-form games, despite them being widely regarded as being computationally more challenging than their fully competitive or cooperative counterparts. A key challenge is that for many equilibria in general-sum games, no simple analogue to the state value function used in Markov Decision Processes and zero-sum games exists. In this paper, we propose learning the \textit{Enforceable Payoff Frontier} (EPF) -- a generalization of the state value function for general-sum games. We approximate the optimal \textit{Stackelberg extensive-form correlated equilibrium} by representing EPFs with neural networks and training them by using appropriate backup operations and loss functions. This is the first method that applies FA to the Stackelberg setting, allowing us to scale to much larger games while still enjoying performance guarantees based on FA error. Additionally, our proposed method guarantees incentive compatibility and is easy to evaluate without having to depend on self-play or approximate best-response oracles.Comment: To appear in AAAI 202

    Approximate dynamic programming for two-player zero-sum Markov games

    Get PDF
    International audienceThis paper provides an analysis of error propagation in Approximate Dynamic Programming applied to zero-sum two-player Stochastic Games. We provide a novel and unified error propagation analysis in L p-norm of three well-known algorithms adapted to Stochastic Games (namely Approximate Value Iteration, Approximate Policy Iteration and Approximate Generalized Policy Iteratio,n). We show that we can achieve a stationary policy which is 2γ+ (1−γ) 2-optimal, where is the value function approximation error and is the approximate greedy operator error. In addition , we provide a practical algorithm (AGPI-Q) to solve infinite horizon γ-discounted two-player zero-sum Stochastic Games in a batch setting. It is an extension of the Fitted-Q algorithm (which solves Markov Decisions Processes from data) and can be non-parametric. Finally, we demonstrate experimentally the performance of AGPI-Q on a simultaneous two-player game, namely Alesia

    A New Policy Iteration Algorithm For Reinforcement Learning in Zero-Sum Markov Games

    Full text link
    Optimal policies in standard MDPs can be obtained using either value iteration or policy iteration. However, in the case of zero-sum Markov games, there is no efficient policy iteration algorithm; e.g., it has been shown that one has to solve Omega(1/(1-alpha)) MDPs, where alpha is the discount factor, to implement the only known convergent version of policy iteration. Another algorithm, called naive policy iteration, is easy to implement but is only provably convergent under very restrictive assumptions. Prior attempts to fix naive policy iteration algorithm have several limitations. Here, we show that a simple variant of naive policy iteration for games converges exponentially fast. The only addition we propose to naive policy iteration is the use of lookahead policies, which are anyway used in practical algorithms. We further show that lookahead can be implemented efficiently in the function approximation setting of linear Markov games, which are the counterpart of the much-studied linear MDPs. We illustrate the application of our algorithm by providing bounds for policy-based RL (reinforcement learning) algorithms. We extend the results to the function approximation setting.Comment: 41 page

    Switching Diffusions: Applications To Ecological Models, And Numerical Methods For Games In Insurance

    Get PDF
    Recently, a class of dynamic systems called ``hybrid systems containing both continuous dynamics and discrete events has been adapted to treat a wide variety of situations arising in many real-world situations. Motivated by such development, this dissertation is devoted to the study of dynamical systems involving a Markov chain as the randomly switching process. The systems studied include hybrid competitive Lotka-Volterra ecosystems and non-zero-sum stochastic differential games between two insurance companies with regime-switching. The first part is concerned with competitive Lotka-Volterra model with Markov switching. A novelty of the contribution is that the Markov chain has a countable state space. Our main objective is to reduce the computational complexity by using the two-time-scale formulation. Because the existence and uniqueness as well as continuity of solutions for Lotka-Volterra ecosystems with Markovian switching in which the switching takes place in a countable set are not available, such properties are studied first. The two-time scale feature is highlighted by introducing a small parameter into the generator of the Markov chain. When the small parameter goes to 0, there is a limit system or reduced system. It is established in this work that if the reduced system possesses certain properties such as permanence and extinction, etc., then the complex original system also has the same properties when the parameter is sufficiently small. These results are obtained by using the perturbed Lyapunov function methods. The second part develops an approximation procedure for a class of non-zero-sum stochastic differential games for investment and reinsurance between two insurance companies. Both proportional reinsurance and excess-of-loss reinsurance policies are considered. We develop numerical algorithms to obtain the approximation to the Nash equilibrium by adopting the Markov chain approximation methodology. We establish the convergence of the approximation sequences and the approximation to the value functions. Numerical examples are presented to illustrate the applicability of the algorithms

    Multigrid methods for two-player zero-sum stochastic games

    Full text link
    We present a fast numerical algorithm for large scale zero-sum stochastic games with perfect information, which combines policy iteration and algebraic multigrid methods. This algorithm can be applied either to a true finite state space zero-sum two player game or to the discretization of an Isaacs equation. We present numerical tests on discretizations of Isaacs equations or variational inequalities. We also present a full multi-level policy iteration, similar to FMG, which allows to improve substantially the computation time for solving some variational inequalities.Comment: 31 page
    corecore