195,130 research outputs found

    Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

    Full text link
    We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.Comment: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 201

    Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Minima

    Full text link
    The convergence rate of stochastic gradient search is analyzed in this paper. Using arguments based on differential geometry and Lojasiewicz inequalities, tight bounds on the convergence rate of general stochastic gradient algorithms are derived. As opposed to the existing results, the results presented in this paper allow the objective function to have multiple, non-isolated minima, impose no restriction on the values of the Hessian (of the objective function) and do not require the algorithm estimates to have a single limit point. Applying these new results, the convergence rate of recursive prediction error identification algorithms is studied. The convergence rate of supervised and temporal-difference learning algorithms is also analyzed using the results derived in the paper

    Adaptive trajectories sampling for solving PDEs with deep learning methods

    Full text link
    In this paper, we propose a new adaptive technique, named adaptive trajectories sampling (ATS), which is used to select training points for the numerical solution of partial differential equations (PDEs) with deep learning methods. The key feature of the ATS is that all training points are adaptively selected from trajectories that are generated according to a PDE-related stochastic process. We incorporate the ATS into three known deep learning solvers for PDEs, namely the adaptive derivative-free-loss method (ATS-DFLM), the adaptive physics-informed neural network method (ATS-PINN), and the adaptive temporal-difference method for forward-backward stochastic differential equations (ATS-FBSTD). Our numerical experiments demonstrate that the ATS remarkably improves the computational accuracy and efficiency of the original deep learning solvers for the PDEs. In particular, for some specific high-dimensional PDEs, the ATS can even improve the accuracy of the PINN by two orders of magnitude.Comment: 18 pages, 12 figures, 42 reference

    Mathematical properties of neuronal TD-rules and differential Hebbian learning: a comparison

    Get PDF
    A confusingly wide variety of temporally asymmetric learning rules exists related to reinforcement learning and/or to spike-timing dependent plasticity, many of which look exceedingly similar, while displaying strongly different behavior. These rules often find their use in control tasks, for example in robotics and for this rigorous convergence and numerical stability is required. The goal of this article is to review these rules and compare them to provide a better overview over their different properties. Two main classes will be discussed: temporal difference (TD) rules and correlation based (differential hebbian) rules and some transition cases. In general we will focus on neuronal implementations with changeable synaptic weights and a time-continuous representation of activity. In a machine learning (non-neuronal) context, for TD-learning a solid mathematical theory has existed since several years. This can partly be transfered to a neuronal framework, too. On the other hand, only now a more complete theory has also emerged for differential Hebb rules. In general rules differ by their convergence conditions and their numerical stability, which can lead to very undesirable behavior, when wanting to apply them. For TD, convergence can be enforced with a certain output condition assuring that the δ-error drops on average to zero (output control). Correlation based rules, on the other hand, converge when one input drops to zero (input control). Temporally asymmetric learning rules treat situations where incoming stimuli follow each other in time. Thus, it is necessary to remember the first stimulus to be able to relate it to the later occurring second one. To this end different types of so-called eligibility traces are being used by these two different types of rules. This aspect leads again to different properties of TD and differential Hebbian learning as discussed here. Thus, this paper, while also presenting several novel mathematical results, is mainly meant to provide a road map through the different neuronally emulated temporal asymmetrical learning rules and their behavior to provide some guidance for possible applications
    corecore