32,003 research outputs found

    On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

    Get PDF
    Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(lambda) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(lambda) and Q-learning belong

    Combining stochastic programming and optimal control to solve multistage stochastic optimization problems

    Get PDF
    In this contribution we propose an approach to solve a multistage stochastic programming problem which allows us to obtain a time and nodal decomposition of the original problem. This double decomposition is achieved applying a discrete time optimal control formulation to the original stochastic programming problem in arborescent form. Combining the arborescent formulation of the problem with the point of view of the optimal control theory naturally gives as a first result the time decomposability of the optimality conditions, which can be organized according to the terminology and structure of a discrete time optimal control problem into the systems of equation for the state and adjoint variables dynamics and the optimality conditions for the generalized Hamiltonian. Moreover these conditions, due to the arborescent formulation of the stochastic programming problem, further decompose with respect to the nodes in the event tree. The optimal solution is obtained by solving small decomposed subproblems and using a mean valued fixed-point iterative scheme to combine them. To enhance the convergence we suggest an optimization step where the weights are chosen in an optimal way at each iteration.Stochastic programming, discrete time control problem, decomposition methods, iterative scheme

    Convergence Analysis of Mixed Timescale Cross-Layer Stochastic Optimization

    Full text link
    This paper considers a cross-layer optimization problem driven by multi-timescale stochastic exogenous processes in wireless communication networks. Due to the hierarchical information structure in a wireless network, a mixed timescale stochastic iterative algorithm is proposed to track the time-varying optimal solution of the cross-layer optimization problem, where the variables are partitioned into short-term controls updated in a faster timescale, and long-term controls updated in a slower timescale. We focus on establishing a convergence analysis framework for such multi-timescale algorithms, which is difficult due to the timescale separation of the algorithm and the time-varying nature of the exogenous processes. To cope with this challenge, we model the algorithm dynamics using stochastic differential equations (SDEs) and show that the study of the algorithm convergence is equivalent to the study of the stochastic stability of a virtual stochastic dynamic system (VSDS). Leveraging the techniques of Lyapunov stability, we derive a sufficient condition for the algorithm stability and a tracking error bound in terms of the parameters of the multi-timescale exogenous processes. Based on these results, an adaptive compensation algorithm is proposed to enhance the tracking performance. Finally, we illustrate the framework by an application example in wireless heterogeneous network

    Petuum: A New Platform for Distributed Machine Learning on Big Data

    Full text link
    What is a systematic way to efficiently apply a wide spectrum of advanced ML programs to industrial scale problems, using Big Models (up to 100s of billions of parameters) on Big Data (up to terabytes or petabytes)? Modern parallelization strategies employ fine-grained operations and scheduling beyond the classic bulk-synchronous processing paradigm popularized by MapReduce, or even specialized graph-based execution that relies on graph representations of ML programs. The variety of approaches tends to pull systems and algorithms design in different directions, and it remains difficult to find a universal platform applicable to a wide range of ML programs at scale. We propose a general-purpose framework that systematically addresses data- and model-parallel challenges in large-scale ML, by observing that many ML programs are fundamentally optimization-centric and admit error-tolerant, iterative-convergent algorithmic solutions. This presents unique opportunities for an integrative system design, such as bounded-error network synchronization and dynamic scheduling based on ML program structure. We demonstrate the efficacy of these system designs versus well-known implementations of modern ML algorithms, allowing ML programs to run in much less time and at considerably larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl
    • …
    corecore