583 research outputs found

    Distributed Algorithms Based on Fictitious Play for Near Optimal Sequential Decision Making.

    Full text link
    We develop stochastic search algorithms to find optimal or close to optimal solutions for sequential decision making problems. We specifically consider two problem classes: 1. Large-scale, discrete, deterministic, finite horizon dynamic programming problems: We use a Sampled Fictitious Play (SFP) algorithm for solving large-scale, finite horizon, discrete dynamic programming (DP) problems. We model the DP problem as an identical interest game between multiple players. We show that the SFP algorithm converges to the equilibrium strategies of this game. In addition, we present two new algorithms, namely Repeated SFP and SFP Based Local Search, that find globally optimal solutions using SFP as a base algorithm. We present the performance of the algorithms on dynamic lot sizing problems and the Traveling Salesman Problem (TSP). Numerical experiments show that our algorithms find close to optimal solutions very quickly. We also present small modifications that improve the performance of the algorithms. 2. Stochastic, discounted, infinite horizon Markov Decision Problems: Using Sampled Fictitious Play (SFP) concepts, we develop an online learning algorithm, referred to as SFP based Learning (SFPL), for solving a discounted homogeneous Markov Decision Problem (MDP) where the transition probabilities are unknown. In SFPL, we estimate and update the unknown transition probabilities, the optimal value, and the optimal action of each state, simultaneously. We prove the convergence of SFPL to the optimal solution. We compare the performance of SFPL with SARSA and Q-Learning on dynamic location and windy gridworld problems.Ph.D.Industrial & Operations EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64602/1/esisikog_1.pd

    A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games

    Full text link
    This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Our objective is to find the Nash Equilibrium policies, which are free from exploitation by adversarial opponents. Distinct from prior efforts on finding Nash equilibria in extensive-form games such as Poker, which feature tree-structured transition dynamics and discrete state space, this paper focuses on Markov games with general transition dynamics and continuous state space. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular examples and two-player Atari games demonstrates the robustness of the proposed algorithms against adversarial opponents, as well as their advantageous performance over existing methods

    Zero-sum Polymatrix Markov Games: Equilibrium Collapse and Efficient Computation of Nash Equilibria

    Full text link
    The works of (Daskalakis et al., 2009, 2022; Jin et al., 2022; Deng et al., 2023) indicate that computing Nash equilibria in multi-player Markov games is a computationally hard task. This fact raises the question of whether or not computational intractability can be circumvented if one focuses on specific classes of Markov games. One such example is two-player zero-sum Markov games, in which efficient ways to compute a Nash equilibrium are known. Inspired by zero-sum polymatrix normal-form games (Cai et al., 2016), we define a class of zero-sum multi-agent Markov games in which there are only pairwise interactions described by a graph that changes per state. For this class of Markov games, we show that an ϵ\epsilon-approximate Nash equilibrium can be found efficiently. To do so, we generalize the techniques of (Cai et al., 2016), by showing that the set of coarse-correlated equilibria collapses to the set of Nash equilibria. Afterwards, it is possible to use any algorithm in the literature that computes approximate coarse-correlated equilibria Markovian policies to get an approximate Nash equilibrium.Comment: Added missing proofs for the infinite-horizo

    Episodic Logit-Q Dynamics for Efficient Learning in Stochastic Teams

    Full text link
    We present new learning dynamics combining (independent) log-linear learning and value iteration for stochastic games within the auxiliary stage game framework. The dynamics presented provably attain the efficient equilibrium (also known as optimal equilibrium) in identical-interest stochastic games, beyond the recent concentration of progress on provable convergence to some (possibly inefficient) equilibrium. The dynamics are also independent in the sense that agents take actions consistent with their local viewpoint to a reasonable extent rather than seeking equilibrium. These aspects can be of practical interest in the control applications of intelligent and autonomous systems. The key challenges are the convergence to an inefficient equilibrium and the non-stationarity of the environment from a single agent's viewpoint due to the adaptation of others. The log-linear update plays an important role in addressing the former. We address the latter through the play-in-episodes scheme in which the agents update their Q-function estimates only at the end of the episodes

    A survey of random processes with reinforcement

    Full text link
    The models surveyed include generalized P\'{o}lya urns, reinforced random walks, interacting urn models, and continuous reinforced processes. Emphasis is on methods and results, with sketches provided of some proofs. Applications are discussed in statistics, biology, economics and a number of other areas.Comment: Published at http://dx.doi.org/10.1214/07-PS094 in the Probability Surveys (http://www.i-journals.org/ps/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A mirror descent approach for Mean Field Control applied to Demande-Side management

    Full text link
    We consider a finite-horizon Mean Field Control problem for Markovian models. The objective function is composed of a sum of convex and Lipschitz functions taking their values on a space of state-action distributions. We introduce an iterative algorithm which we prove to be a Mirror Descent associated with a non-standard Bregman divergence, having a convergence rate of order 1/ \sqrt K. It requires the solution of a simple dynamic programming problem at each iteration. We compare this algorithm with learning methods for Mean Field Games after providing a reformulation of our control problem as a game problem. These theoretical contributions are illustrated with numerical examples applied to a demand-side management problem for power systems aimed at controlling the average power consumption profile of a population of flexible devices contributing to the power system balance
    corecore