2 research outputs found
A Computationally Efficient Implementation of Fictitious Play for Large-Scale Games
The paper is concerned with distributed learning and optimization in
large-scale settings. The well-known Fictitious Play (FP) algorithm has been
shown to achieve Nash equilibrium learning in certain classes of multi-agent
games. However, FP can be computationally difficult to implement when the
number of players is large. Sampled FP is a variant of FP that mitigates the
computational difficulties arising in FP by using a Monte-Carlo (i.e.,
sampling-based) approach. The Sampled FP algorithm has been studied both as a
tool for distributed learning and as an optimization heuristic for large-scale
problems. Despite its computational advantages, a shortcoming of Sampled FP is
that the number of samples that must be drawn in each round of the algorithm
grows without bound (on the order of , where is the round of the
repeated play). In this paper we propose Computationally Efficient Sampled FP
(CESFP)---a variant of Sampled FP in which only one sample need be drawn each
round of the algorithm (a substantial reduction from samples per
round, as required in Sampled FP). CESFP operates using a
stochastic-approximation type rule to estimate the expected utility from round
to round. It is proven that the CESFP algorithm achieves Nash equilibrium
learning in the same sense as classical FP and Sampled FP. Simulation results
suggest that the convergence rate of CESFP (in terms of repeated-play
iterations) is similar to that of Sampled FP.Comment: Submitted for publication. Initial Submission: Jun. 2015. 15 page
A SAMPLED FICTITIOUS PLAY BASED LEARNING ALGORITHM FOR INFINITE HORIZON MARKOV DECISION PROCESSES
Using Sampled Fictitious Play (SFP) concepts, we develop SFPL: Sampled Fictitious Play Learning — a learning algorithm for solving discounted homogeneous Markov Decision Problems where the transition probabilities are unknown and need to be learned via simulation or direct observation of the system in real time. Thus, SFPL simultaneously updates the estimates of the unknown transition probabilities and the estimates of optimal value and optimal action in the observed state. In the spirit of SFP, the action after each transition is selected by sampling from the empirical distribution of previous optimal action estimates for the current state. The resulting algorithm is provably convergent. We compare its performance with other learning methods, including SARSA and Q-learning.