7 research outputs found
A Version of Geiringer-like Theorem for Decision Making in the Environments with Randomness and Incomplete Information
Purpose: In recent years Monte-Carlo sampling methods, such as Monte Carlo
tree search, have achieved tremendous success in model free reinforcement
learning. A combination of the so called upper confidence bounds policy to
preserve the "exploration vs. exploitation" balance to select actions for
sample evaluations together with massive computing power to store and to update
dynamically a rather large pre-evaluated game tree lead to the development of
software that has beaten the top human player in the game of Go on a 9 by 9
board. Much effort in the current research is devoted to widening the range of
applicability of the Monte-Carlo sampling methodology to partially observable
Markov decision processes with non-immediate payoffs. The main challenge
introduced by randomness and incomplete information is to deal with the action
evaluation at the chance nodes due to drastic differences in the possible
payoffs the same action could lead to. The aim of this article is to establish
a version of a theorem that originated from population genetics and has been
later adopted in evolutionary computation theory that will lead to novel
Monte-Carlo sampling algorithms that provably increase the AI potential. Due to
space limitations the actual algorithms themselves will be presented in the
sequel papers, however, the current paper provides a solid mathematical
foundation for the development of such algorithms and explains why they are so
promising.Comment: 53 pages in size. This work has been recently submitted to the IJICC
(International Journal on Intelligent Computing and Cybernetics
A New Algorithm for Generating Equilibria in Massive Zero-Sum Games
In normal scenarios, computer scientists often consider the number of states in a game to capture the difficulty of learning an equilibrium. However, players do not see games in the same light: most consider Go or Chess to be more complex than Monopoly. In this paper, we discuss a new measure of game complexity that links existing state-of-the-art algorithms for computing approximate equilibria to a more human measure. In particular, we consider the range of skill in a game, i.e.how many different skill levels exist. We then modify existing techniques to design a new algorithm to compute approximate equilibria whose performance can be captured by this new measure. We use it to develop the first near Nash equilibrium for a four round abstraction of poker, and show that it would have been able to win handily the bankroll competition from last year鈥檚 AAAI poker competition