1,142 research outputs found

    Deep Reinforcement Learning from Self-Play in Imperfect-Information Games

    Get PDF
    Many real-world applications can be described as large-scale games of imperfect information. To deal with these challenging domains, prior work has focused on computing Nash equilibria in a handcrafted abstraction of the domain. In this paper we introduce the first scalable end-to-end approach to learning approximate Nash equilibria without prior domain knowledge. Our method combines fictitious self-play with deep reinforcement learning. When applied to Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium, whereas common reinforcement learning methods diverged. In Limit Texas Holdem, a poker game of real-world scale, NFSP learnt a strategy that approached the performance of state-of-the-art, superhuman algorithms based on significant domain expertise.Comment: updated version, incorporating conference feedbac

    Approximations of Semicontinuous Functions with Applications to Stochastic Optimization and Statistical Estimation

    Get PDF
    Upper semicontinuous (usc) functions arise in the analysis of maximization problems, distributionally robust optimization, and function identification, which includes many problems of nonparametric statistics. We establish that every usc function is the limit of a hypo-converging sequence of piecewise affine functions of the difference-of-max type and illustrate resulting algorithmic possibilities in the context of approximate solution of infinite-dimensional optimization problems. In an effort to quantify the ease with which classes of usc functions can be approximated by finite collections, we provide upper and lower bounds on covering numbers for bounded sets of usc functions under the Attouch-Wets distance. The result is applied in the context of stochastic optimization problems defined over spaces of usc functions. We establish confidence regions for optimal solutions based on sample average approximations and examine the accompanying rates of convergence. Examples from nonparametric statistics illustrate the results

    On the Sample Complexity of Predictive Sparse Coding

    Full text link
    The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real dataset

    Convex Q Learning in a Stochastic Environment: Extended Version

    Full text link
    The paper introduces the first formulation of convex Q-learning for Markov decision processes with function approximation. The algorithms and theory rest on a relaxation of a dual of Manne's celebrated linear programming characterization of optimal control. The main contributions firstly concern properties of the relaxation, described as a deterministic convex program: we identify conditions for a bounded solution, and a significant relationship between the solution to the new convex program, and the solution to standard Q-learning. The second set of contributions concern algorithm design and analysis: (i) A direct model-free method for approximating the convex program for Q-learning shares properties with its ideal. In particular, a bounded solution is ensured subject to a simple property of the basis functions; (ii) The proposed algorithms are convergent and new techniques are introduced to obtain the rate of convergence in a mean-square sense; (iii) The approach can be generalized to a range of performance criteria, and it is found that variance can be reduced by considering ``relative'' dynamic programming equations; (iv) The theory is illustrated with an application to a classical inventory control problem.Comment: Extended version of "Convex Q-learning in a stochastic environment", IEEE Conference on Decision and Control, 2023 (to appear
    • …
    corecore