23,307 research outputs found
Theoretical and Practical Advances on Smoothing for Extensive-Form Games
Sparse iterative methods, in particular first-order methods, are known to be
among the most effective in solving large-scale two-player zero-sum
extensive-form games. The convergence rates of these methods depend heavily on
the properties of the distance-generating function that they are based on. We
investigate the acceleration of first-order methods for solving extensive-form
games through better design of the dilated entropy function---a class of
distance-generating functions related to the domains associated with the
extensive-form games. By introducing a new weighting scheme for the dilated
entropy function, we develop the first distance-generating function for the
strategy spaces of sequential games that has no dependence on the branching
factor of the player. This result improves the convergence rate of several
first-order methods by a factor of , where is the branching
factor of the player, and is the depth of the game tree.
Thus far, counterfactual regret minimization methods have been faster in
practice, and more popular, than first-order methods despite their
theoretically inferior convergence rates. Using our new weighting scheme and
practical tuning we show that, for the first time, the excessive gap technique
can be made faster than the fastest counterfactual regret minimization
algorithm, CFR+, in practice
Imperfect-Recall Abstractions with Bounds in Games
Imperfect-recall abstraction has emerged as the leading paradigm for
practical large-scale equilibrium computation in incomplete-information games.
However, imperfect-recall abstractions are poorly understood, and only weak
algorithm-specific guarantees on solution quality are known. In this paper, we
show the first general, algorithm-agnostic, solution quality guarantees for
Nash equilibria and approximate self-trembling equilibria computed in
imperfect-recall abstractions, when implemented in the original
(perfect-recall) game. Our results are for a class of games that generalizes
the only previously known class of imperfect-recall abstractions where any
results had been obtained. Further, our analysis is tighter in two ways, each
of which can lead to an exponential reduction in the solution quality error
bound.
We then show that for extensive-form games that satisfy certain properties,
the problem of computing a bound-minimizing abstraction for a single level of
the game reduces to a clustering problem, where the increase in our bound is
the distance function. This reduction leads to the first imperfect-recall
abstraction algorithm with solution quality bounds. We proceed to show a divide
in the class of abstraction problems. If payoffs are at the same scale at all
information sets considered for abstraction, the input forms a metric space.
Conversely, if this condition is not satisfied, we show that the input does not
form a metric space. Finally, we use these results to experimentally
investigate the quality of our bound for single-level abstraction
Solving Games with Functional Regret Estimation
We propose a novel online learning method for minimizing regret in large
extensive-form games. The approach learns a function approximator online to
estimate the regret for choosing a particular action. A no-regret algorithm
uses these estimates in place of the true regrets to define a sequence of
policies.
We prove the approach sound by providing a bound relating the quality of the
function approximation and regret of the algorithm. A corollary being that the
method is guaranteed to converge to a Nash equilibrium in self-play so long as
the regrets are ultimately realizable by the function approximator. Our
technique can be understood as a principled generalization of existing work on
abstraction in large games; in our work, both the abstraction as well as the
equilibrium are learned during self-play. We demonstrate empirically the method
achieves higher quality strategies than state-of-the-art abstraction techniques
given the same resources.Comment: AAAI Conference on Artificial Intelligence 201
Analysis and Optimization of Deep Counterfactual Value Networks
Recently a strong poker-playing algorithm called DeepStack was published,
which is able to find an approximate Nash equilibrium during gameplay by using
heuristic values of future states predicted by deep neural networks. This paper
analyzes new ways of encoding the inputs and outputs of DeepStack's deep
counterfactual value networks based on traditional abstraction techniques, as
well as an unabstracted encoding, which was able to increase the network's
accuracy.Comment: Long version of publication appearing at KI 2018: The 41st German
Conference on Artificial Intelligence
(http://dx.doi.org/10.1007/978-3-030-00111-7_26). Corrected typo in titl
Pirate plunder: game-based computational thinking using scratch blocks
Policy makers worldwide argue that children should be taught how technology works, and that the ‘computational thinking’ skills developed through programming are useful in a wider context. This is causing an increased focus on computer science in primary and secondary education.
Block-based programming tools, like Scratch, have become ubiquitous in primary education (5 to 11-years-old) throughout the UK. However, Scratch users often struggle to detect and correct ‘code smells’ (bad programming practices) such as duplicated blocks and large scripts, which can lead to programs that are difficult to understand. These ‘smells’ are caused by a lack of abstraction and decomposition in programs; skills that play a key role in computational thinking. In Scratch, repeats (loops), custom blocks (procedures) and clones (instances) can be used to correct these smells. Yet, custom blocks and clones are rarely taught to children under 11-years-old.
We describe the design of a novel educational block-based programming game, Pirate Plunder, which aims to teach these skills to children aged 9-11. Players use Scratch blocks to navigate around a grid, collect items and interact with obstacles. Blocks are explained in ‘tutorials’; the player then completes a series of ‘challenges’ before attempting the next tutorial. A set of Scratch blocks, including repeats, custom blocks and clones, are introduced in a linear difficulty progression. There are two versions of Pirate Plunder; one that uses a debugging-first approach, where the player is given a program that is incomplete or incorrect, and one where each level begins with an empty program.
The game design has been developed through iterative playtesting. The observations made during this process have influenced key design decisions such as Scratch integration, difficulty progression and reward system. In future, we will evaluate Pirate Plunder against a traditional Scratch curriculum and compare the debugging-first and non-debugging versions in a series of studies
- …