2 research outputs found

    A Computationally Efficient Implementation of Fictitious Play for Large-Scale Games

    Full text link
    The paper is concerned with distributed learning and optimization in large-scale settings. The well-known Fictitious Play (FP) algorithm has been shown to achieve Nash equilibrium learning in certain classes of multi-agent games. However, FP can be computationally difficult to implement when the number of players is large. Sampled FP is a variant of FP that mitigates the computational difficulties arising in FP by using a Monte-Carlo (i.e., sampling-based) approach. The Sampled FP algorithm has been studied both as a tool for distributed learning and as an optimization heuristic for large-scale problems. Despite its computational advantages, a shortcoming of Sampled FP is that the number of samples that must be drawn in each round of the algorithm grows without bound (on the order of t\sqrt{t}, where tt is the round of the repeated play). In this paper we propose Computationally Efficient Sampled FP (CESFP)---a variant of Sampled FP in which only one sample need be drawn each round of the algorithm (a substantial reduction from O(t)O(\sqrt{t}) samples per round, as required in Sampled FP). CESFP operates using a stochastic-approximation type rule to estimate the expected utility from round to round. It is proven that the CESFP algorithm achieves Nash equilibrium learning in the same sense as classical FP and Sampled FP. Simulation results suggest that the convergence rate of CESFP (in terms of repeated-play iterations) is similar to that of Sampled FP.Comment: Submitted for publication. Initial Submission: Jun. 2015. 15 page

    A SAMPLED FICTITIOUS PLAY BASED LEARNING ALGORITHM FOR INFINITE HORIZON MARKOV DECISION PROCESSES

    No full text
    Using Sampled Fictitious Play (SFP) concepts, we develop SFPL: Sampled Fictitious Play Learning — a learning algorithm for solving discounted homogeneous Markov Decision Problems where the transition probabilities are unknown and need to be learned via simulation or direct observation of the system in real time. Thus, SFPL simultaneously updates the estimates of the unknown transition probabilities and the estimates of optimal value and optimal action in the observed state. In the spirit of SFP, the action after each transition is selected by sampling from the empirical distribution of previous optimal action estimates for the current state. The resulting algorithm is provably convergent. We compare its performance with other learning methods, including SARSA and Q-learning.
    corecore