22,736 research outputs found

    Theoretical and Practical Advances on Smoothing for Extensive-Form Games

    Full text link
    Sparse iterative methods, in particular first-order methods, are known to be among the most effective in solving large-scale two-player zero-sum extensive-form games. The convergence rates of these methods depend heavily on the properties of the distance-generating function that they are based on. We investigate the acceleration of first-order methods for solving extensive-form games through better design of the dilated entropy function---a class of distance-generating functions related to the domains associated with the extensive-form games. By introducing a new weighting scheme for the dilated entropy function, we develop the first distance-generating function for the strategy spaces of sequential games that has no dependence on the branching factor of the player. This result improves the convergence rate of several first-order methods by a factor of Ω(bdd)\Omega(b^dd), where bb is the branching factor of the player, and dd is the depth of the game tree. Thus far, counterfactual regret minimization methods have been faster in practice, and more popular, than first-order methods despite their theoretically inferior convergence rates. Using our new weighting scheme and practical tuning we show that, for the first time, the excessive gap technique can be made faster than the fastest counterfactual regret minimization algorithm, CFR+, in practice

    Imperfect-Recall Abstractions with Bounds in Games

    Full text link
    Imperfect-recall abstraction has emerged as the leading paradigm for practical large-scale equilibrium computation in incomplete-information games. However, imperfect-recall abstractions are poorly understood, and only weak algorithm-specific guarantees on solution quality are known. In this paper, we show the first general, algorithm-agnostic, solution quality guarantees for Nash equilibria and approximate self-trembling equilibria computed in imperfect-recall abstractions, when implemented in the original (perfect-recall) game. Our results are for a class of games that generalizes the only previously known class of imperfect-recall abstractions where any results had been obtained. Further, our analysis is tighter in two ways, each of which can lead to an exponential reduction in the solution quality error bound. We then show that for extensive-form games that satisfy certain properties, the problem of computing a bound-minimizing abstraction for a single level of the game reduces to a clustering problem, where the increase in our bound is the distance function. This reduction leads to the first imperfect-recall abstraction algorithm with solution quality bounds. We proceed to show a divide in the class of abstraction problems. If payoffs are at the same scale at all information sets considered for abstraction, the input forms a metric space. Conversely, if this condition is not satisfied, we show that the input does not form a metric space. Finally, we use these results to experimentally investigate the quality of our bound for single-level abstraction

    Solving Games with Functional Regret Estimation

    Full text link
    We propose a novel online learning method for minimizing regret in large extensive-form games. The approach learns a function approximator online to estimate the regret for choosing a particular action. A no-regret algorithm uses these estimates in place of the true regrets to define a sequence of policies. We prove the approach sound by providing a bound relating the quality of the function approximation and regret of the algorithm. A corollary being that the method is guaranteed to converge to a Nash equilibrium in self-play so long as the regrets are ultimately realizable by the function approximator. Our technique can be understood as a principled generalization of existing work on abstraction in large games; in our work, both the abstraction as well as the equilibrium are learned during self-play. We demonstrate empirically the method achieves higher quality strategies than state-of-the-art abstraction techniques given the same resources.Comment: AAAI Conference on Artificial Intelligence 201

    Analysis and Optimization of Deep Counterfactual Value Networks

    Full text link
    Recently a strong poker-playing algorithm called DeepStack was published, which is able to find an approximate Nash equilibrium during gameplay by using heuristic values of future states predicted by deep neural networks. This paper analyzes new ways of encoding the inputs and outputs of DeepStack's deep counterfactual value networks based on traditional abstraction techniques, as well as an unabstracted encoding, which was able to increase the network's accuracy.Comment: Long version of publication appearing at KI 2018: The 41st German Conference on Artificial Intelligence (http://dx.doi.org/10.1007/978-3-030-00111-7_26). Corrected typo in titl

    Pirate plunder: game-based computational thinking using scratch blocks

    Get PDF
    Policy makers worldwide argue that children should be taught how technology works, and that the ‘computational thinking’ skills developed through programming are useful in a wider context. This is causing an increased focus on computer science in primary and secondary education. Block-based programming tools, like Scratch, have become ubiquitous in primary education (5 to 11-years-old) throughout the UK. However, Scratch users often struggle to detect and correct ‘code smells’ (bad programming practices) such as duplicated blocks and large scripts, which can lead to programs that are difficult to understand. These ‘smells’ are caused by a lack of abstraction and decomposition in programs; skills that play a key role in computational thinking. In Scratch, repeats (loops), custom blocks (procedures) and clones (instances) can be used to correct these smells. Yet, custom blocks and clones are rarely taught to children under 11-years-old. We describe the design of a novel educational block-based programming game, Pirate Plunder, which aims to teach these skills to children aged 9-11. Players use Scratch blocks to navigate around a grid, collect items and interact with obstacles. Blocks are explained in ‘tutorials’; the player then completes a series of ‘challenges’ before attempting the next tutorial. A set of Scratch blocks, including repeats, custom blocks and clones, are introduced in a linear difficulty progression. There are two versions of Pirate Plunder; one that uses a debugging-first approach, where the player is given a program that is incomplete or incorrect, and one where each level begins with an empty program. The game design has been developed through iterative playtesting. The observations made during this process have influenced key design decisions such as Scratch integration, difficulty progression and reward system. In future, we will evaluate Pirate Plunder against a traditional Scratch curriculum and compare the debugging-first and non-debugging versions in a series of studies
    • …
    corecore