195,130 research outputs found
Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning
We present for the first time an asymptotic convergence analysis of two
time-scale stochastic approximation driven by `controlled' Markov noise. In
particular, both the faster and slower recursions have non-additive controlled
Markov noise components in addition to martingale difference noise. We analyze
the asymptotic behavior of our framework by relating it to limiting
differential inclusions in both time-scales that are defined in terms of the
ergodic occupation measures associated with the controlled Markov processes.
Finally, we present a solution to the off-policy convergence problem for
temporal difference learning with linear function approximation, using our
results.Comment: 23 pages (relaxed some important assumptions from the previous
version), accepted in Mathematics of Operations Research in Feb, 201
Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Minima
The convergence rate of stochastic gradient search is analyzed in this paper.
Using arguments based on differential geometry and Lojasiewicz inequalities,
tight bounds on the convergence rate of general stochastic gradient algorithms
are derived. As opposed to the existing results, the results presented in this
paper allow the objective function to have multiple, non-isolated minima,
impose no restriction on the values of the Hessian (of the objective function)
and do not require the algorithm estimates to have a single limit point.
Applying these new results, the convergence rate of recursive prediction error
identification algorithms is studied. The convergence rate of supervised and
temporal-difference learning algorithms is also analyzed using the results
derived in the paper
Adaptive trajectories sampling for solving PDEs with deep learning methods
In this paper, we propose a new adaptive technique, named adaptive
trajectories sampling (ATS), which is used to select training points for the
numerical solution of partial differential equations (PDEs) with deep learning
methods. The key feature of the ATS is that all training points are adaptively
selected from trajectories that are generated according to a PDE-related
stochastic process. We incorporate the ATS into three known deep learning
solvers for PDEs, namely the adaptive derivative-free-loss method (ATS-DFLM),
the adaptive physics-informed neural network method (ATS-PINN), and the
adaptive temporal-difference method for forward-backward stochastic
differential equations (ATS-FBSTD). Our numerical experiments demonstrate that
the ATS remarkably improves the computational accuracy and efficiency of the
original deep learning solvers for the PDEs. In particular, for some specific
high-dimensional PDEs, the ATS can even improve the accuracy of the PINN by two
orders of magnitude.Comment: 18 pages, 12 figures, 42 reference
Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison
A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications
- …