4,890 research outputs found

    Two Timescale Stochastic Approximation with Controlled Markov noise and Off-policy temporal difference learning

    Full text link
    We present for the first time an asymptotic convergence analysis of two time-scale stochastic approximation driven by `controlled' Markov noise. In particular, both the faster and slower recursions have non-additive controlled Markov noise components in addition to martingale difference noise. We analyze the asymptotic behavior of our framework by relating it to limiting differential inclusions in both time-scales that are defined in terms of the ergodic occupation measures associated with the controlled Markov processes. Finally, we present a solution to the off-policy convergence problem for temporal difference learning with linear function approximation, using our results.Comment: 23 pages (relaxed some important assumptions from the previous version), accepted in Mathematics of Operations Research in Feb, 201

    Linear Hamilton Jacobi Bellman Equations in High Dimensions

    Get PDF
    The Hamilton Jacobi Bellman Equation (HJB) provides the globally optimal solution to large classes of control problems. Unfortunately, this generality comes at a price, the calculation of such solutions is typically intractible for systems with more than moderate state space size due to the curse of dimensionality. This work combines recent results in the structure of the HJB, and its reduction to a linear Partial Differential Equation (PDE), with methods based on low rank tensor representations, known as a separated representations, to address the curse of dimensionality. The result is an algorithm to solve optimal control problems which scales linearly with the number of states in a system, and is applicable to systems that are nonlinear with stochastic forcing in finite-horizon, average cost, and first-exit settings. The method is demonstrated on inverted pendulum, VTOL aircraft, and quadcopter models, with system dimension two, six, and twelve respectively.Comment: 8 pages. Accepted to CDC 201

    Convergence and Convergence Rate of Stochastic Gradient Search in the Case of Multiple and Non-Isolated Extrema

    Full text link
    The asymptotic behavior of stochastic gradient algorithms is studied. Relying on results from differential geometry (Lojasiewicz gradient inequality), the single limit-point convergence of the algorithm iterates is demonstrated and relatively tight bounds on the convergence rate are derived. In sharp contrast to the existing asymptotic results, the new results presented here allow the objective function to have multiple and non-isolated minima. The new results also offer new insights into the asymptotic properties of several classes of recursive algorithms which are routinely used in engineering, statistics, machine learning and operations research
    corecore