1,142 research outputs found
Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
Many real-world applications can be described as large-scale games of
imperfect information. To deal with these challenging domains, prior work has
focused on computing Nash equilibria in a handcrafted abstraction of the
domain. In this paper we introduce the first scalable end-to-end approach to
learning approximate Nash equilibria without prior domain knowledge. Our method
combines fictitious self-play with deep reinforcement learning. When applied to
Leduc poker, Neural Fictitious Self-Play (NFSP) approached a Nash equilibrium,
whereas common reinforcement learning methods diverged. In Limit Texas Holdem,
a poker game of real-world scale, NFSP learnt a strategy that approached the
performance of state-of-the-art, superhuman algorithms based on significant
domain expertise.Comment: updated version, incorporating conference feedbac
Approximations of Semicontinuous Functions with Applications to Stochastic Optimization and Statistical Estimation
Upper semicontinuous (usc) functions arise in the analysis of maximization
problems, distributionally robust optimization, and function identification,
which includes many problems of nonparametric statistics. We establish that
every usc function is the limit of a hypo-converging sequence of piecewise
affine functions of the difference-of-max type and illustrate resulting
algorithmic possibilities in the context of approximate solution of
infinite-dimensional optimization problems. In an effort to quantify the ease
with which classes of usc functions can be approximated by finite collections,
we provide upper and lower bounds on covering numbers for bounded sets of usc
functions under the Attouch-Wets distance. The result is applied in the context
of stochastic optimization problems defined over spaces of usc functions. We
establish confidence regions for optimal solutions based on sample average
approximations and examine the accompanying rates of convergence. Examples from
nonparametric statistics illustrate the results
On the Sample Complexity of Predictive Sparse Coding
The goal of predictive sparse coding is to learn a representation of examples
as sparse linear combinations of elements from a dictionary, such that a
learned hypothesis linear in the new representation performs well on a
predictive task. Predictive sparse coding algorithms recently have demonstrated
impressive performance on a variety of supervised tasks, but their
generalization properties have not been studied. We establish the first
generalization error bounds for predictive sparse coding, covering two
settings: 1) the overcomplete setting, where the number of features k exceeds
the original dimensionality d; and 2) the high or infinite-dimensional setting,
where only dimension-free bounds are useful. Both learning bounds intimately
depend on stability properties of the learned sparse encoder, as measured on
the training sample. Consequently, we first present a fundamental stability
result for the LASSO, a result characterizing the stability of the sparse codes
with respect to perturbations to the dictionary. In the overcomplete setting,
we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with
respect to d and k. In the high or infinite-dimensional setting, we show a
dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and
s, where s is an upper bound on the number of non-zeros in the sparse code for
any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed
considerably using a new notion of coding margin. Old Sparse Coding Stability
Theorem still in new version, now as Theorem 2. Presentation of all proofs
simplified/improved considerably. Paper reorganized. Empirical analysis
showing new coding margin is non-trivial on real dataset
Convex Q Learning in a Stochastic Environment: Extended Version
The paper introduces the first formulation of convex Q-learning for Markov
decision processes with function approximation. The algorithms and theory rest
on a relaxation of a dual of Manne's celebrated linear programming
characterization of optimal control. The main contributions firstly concern
properties of the relaxation, described as a deterministic convex program: we
identify conditions for a bounded solution, and a significant relationship
between the solution to the new convex program, and the solution to standard
Q-learning. The second set of contributions concern algorithm design and
analysis: (i) A direct model-free method for approximating the convex program
for Q-learning shares properties with its ideal. In particular, a bounded
solution is ensured subject to a simple property of the basis functions; (ii)
The proposed algorithms are convergent and new techniques are introduced to
obtain the rate of convergence in a mean-square sense; (iii) The approach can
be generalized to a range of performance criteria, and it is found that
variance can be reduced by considering ``relative'' dynamic programming
equations; (iv) The theory is illustrated with an application to a classical
inventory control problem.Comment: Extended version of "Convex Q-learning in a stochastic environment",
IEEE Conference on Decision and Control, 2023 (to appear
- …