6 research outputs found
Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games
Hannan consistency, or no external regret, is a~key concept for learning in
games. An action selection algorithm is Hannan consistent (HC) if its
performance is eventually as good as selecting the~best fixed action in
hindsight. If both players in a~zero-sum normal form game use a~Hannan
consistent algorithm, their average behavior converges to a~Nash equilibrium
(NE) of the~game. A similar result is known about extensive form games, but
the~played strategies need to be Hannan consistent with respect to
the~counterfactual values, which are often difficult to obtain. We study
zero-sum extensive form games with simultaneous moves, but otherwise perfect
information. These games generalize normal form games and they are a special
case of extensive form games. We study whether applying HC algorithms in each
decision point of these games directly to the~observed payoffs leads to
convergence to a~Nash equilibrium. This learning process corresponds to a~class
of Monte Carlo Tree Search algorithms, which are popular for playing
simultaneous-move games but do not have any known performance guarantees. We
show that using HC algorithms directly on the~observed payoffs is not
sufficient to guarantee the~convergence. With an~additional averaging over
joint actions, the~convergence is guaranteed, but empirically slower. We
further define an~additional property of HC algorithms, which is sufficient to
guarantee the~convergence without the~averaging and we empirically show that
commonly used HC algorithms have this property.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0014
Actor-Critic Fictitious Play in Simultaneous Move Multistage Games
International audienceFictitious play is a game theoretic iterative procedure meant to learn an equilibrium in normal form games. However, this algorithm requires that each player has full knowledge of other players' strategies. Using an architecture inspired by actor-critic algorithms, we build a stochastic approximation of the fictitious play process. This procedure is on-line, decentralized (an agent has no information of others' strategies and rewards) and applies to multistage games (a generalization of normal form games). In addition, we prove convergence of our method towards a Nash equilibrium in both the cases of zero-sum two-player multistage games and cooperative multistage games. We also provide empirical evidence of the soundness of our approach on the game of Alesia with and without function approximation