6 research outputs found
Analysis of Hannan Consistent Selection for Monte Carlo Tree Search in Simultaneous Move Games
Hannan consistency, or no external regret, is a~key concept for learning in
games. An action selection algorithm is Hannan consistent (HC) if its
performance is eventually as good as selecting the~best fixed action in
hindsight. If both players in a~zero-sum normal form game use a~Hannan
consistent algorithm, their average behavior converges to a~Nash equilibrium
(NE) of the~game. A similar result is known about extensive form games, but
the~played strategies need to be Hannan consistent with respect to
the~counterfactual values, which are often difficult to obtain. We study
zero-sum extensive form games with simultaneous moves, but otherwise perfect
information. These games generalize normal form games and they are a special
case of extensive form games. We study whether applying HC algorithms in each
decision point of these games directly to the~observed payoffs leads to
convergence to a~Nash equilibrium. This learning process corresponds to a~class
of Monte Carlo Tree Search algorithms, which are popular for playing
simultaneous-move games but do not have any known performance guarantees. We
show that using HC algorithms directly on the~observed payoffs is not
sufficient to guarantee the~convergence. With an~additional averaging over
joint actions, the~convergence is guaranteed, but empirically slower. We
further define an~additional property of HC algorithms, which is sufficient to
guarantee the~convergence without the~averaging and we empirically show that
commonly used HC algorithms have this property.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0014