157,557 research outputs found
Dyna-style planning with linear function approximation and prioritized sweeping
We consider the problem of efficiently learning optimal control policies and value functions over large state spaces in an online setting in which estimates must be available after each interaction with the world. This paper develops an explicitly model-based approach extending the Dyna architecture to linear function approximation. Dyna-style planning proceeds by generating imaginary experience from the world model and then applying model-free reinforcement learning algorithms to the imagined state transitions. Our main results are to prove that linear Dyna-style planning converges to a unique solution independent of the generating distribution, under natural conditions. In the policy evaluation setting, we prove that the limit point is the least-squares (LSTD) solution. An implication of our results is that prioritized-sweeping can be soundly extended to the linear approximation case, backing up to preceding features rather than to preceding states. We introduce two versions of prioritized sweeping with linear Dyna and briefly illustrate their performance empirically on the Mountain Car and Boyan Chain problems
Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review
The paper characterizes classes of functions for which deep learning can be
exponentially better than shallow learning. Deep convolutional networks are a
special case of these conditions, though weight sharing is not the main reason
for their exponential advantage
Rectified deep neural networks overcome the curse of dimensionality for nonsmooth value functions in zero-sum games of nonlinear stiff systems
In this paper, we establish that for a wide class of controlled stochastic
differential equations (SDEs) with stiff coefficients, the value functions of
corresponding zero-sum games can be represented by a deep artificial neural
network (DNN), whose complexity grows at most polynomially in both the
dimension of the state equation and the reciprocal of the required accuracy.
Such nonlinear stiff systems may arise, for example, from Galerkin
approximations of controlled stochastic partial differential equations (SPDEs),
or controlled PDEs with uncertain initial conditions and source terms. This
implies that DNNs can break the curse of dimensionality in numerical
approximations and optimal control of PDEs and SPDEs. The main ingredient of
our proof is to construct a suitable discrete-time system to effectively
approximate the evolution of the underlying stochastic dynamics. Similar ideas
can also be applied to obtain expression rates of DNNs for value functions
induced by stiff systems with regime switching coefficients and driven by
general L\'{e}vy noise.Comment: This revised version has been accepted for publication in Analysis
and Application
- …