8 research outputs found

    An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

    Full text link
    In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(λ\lambda)'s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-free TD methods to achieve this with per-step computation linear in the number of function approximation parameters are the gradient-TD family of methods including TDC, GTD(λ\lambda), and GQ(λ\lambda). Compared to these methods, our _emphatic TD(λ\lambda)_ is simpler and easier to use; it has only one learned parameter vector and one step-size parameter. Our treatment includes general state-dependent discounting and bootstrapping functions, and a way of specifying varying degrees of interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of reviews. The most important change was to signal early that the main result is about stability, not convergenc

    Asynchronous Approximation of a Single Component of the Solution to a Linear System

    Full text link
    We present a distributed asynchronous algorithm for approximating a single component of the solution to a system of linear equations Ax=bAx = b, where AA is a positive definite real matrix, and bRnb \in \mathbb{R}^n. This is equivalent to solving for xix_i in x=Gx+zx = Gx + z for some GG and zz such that the spectral radius of GG is less than 1. Our algorithm relies on the Neumann series characterization of the component xix_i, and is based on residual updates. We analyze our algorithm within the context of a cloud computation model, in which the computation is split into small update tasks performed by small processors with shared access to a distributed file system. We prove a robust asymptotic convergence result when the spectral radius ρ(G)<1\rho(|G|) < 1, regardless of the precise order and frequency in which the update tasks are performed. We provide convergence rate bounds which depend on the order of update tasks performed, analyzing both deterministic update rules via counting weighted random walks, as well as probabilistic update rules via concentration bounds. The probabilistic analysis requires analyzing the product of random matrices which are drawn from distributions that are time and path dependent. We specifically consider the setting where nn is large, yet GG is sparse, e.g., each row has at most dd nonzero entries. This is motivated by applications in which GG is derived from the edge structure of an underlying graph. Our results prove that if the local neighborhood of the graph does not grow too quickly as a function of nn, our algorithm can provide significant reduction in computation cost as opposed to any algorithm which computes the global solution vector xx. Our algorithm obtains an ϵx2\epsilon \|x\|_2 additive approximation for xix_i in constant time with respect to the size of the matrix when the maximum row sparsity d=O(1)d = O(1) and 1/(1G2)=O(1)1/(1-\|G\|_2) = O(1)

    Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems

    Get PDF
    We consider linear systems of equations, Ax = b, of various types frequently arising in large-scale applications, withanemphasisonthecasewhereAissingular. Undercertainconditions, necessaryaswell as sufficient, linear deterministic iterative methods generate sequences {xk} that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of large-scale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which {xk} converges to a solution when A is singular, and may also be used with substantial benefit when A is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism.

    Stochastic methods for large-scale linear problems, variational inequalities, and convex optimization

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 201-207).This thesis considers stochastic methods for large-scale linear systems, variational inequalities, and convex optimization problems. I focus on special structures that lend themselves to sampling, such as when the linear/nonlinear mapping or the objective function is an expected value or is the sum of a large number of terms, and/or the constraint is the intersection of a large number of simpler sets. For linear systems, I propose modifications to deterministic methods to allow the use of random samples and maintain the stochastic convergence, which is particularly challenging when the unknown system is singular or nearly singular. For variational inequalities and optimization problems, I propose a class of methods that combine elements of incremental constraint projection, stochastic gradient/ subgradient descent, and proximal algorithm. These methods can be applied with various sampling schemes that are suitable for applications involving distributed implementation, large data set, or online learning. I use a unified framework to analyze the convergence and the rate of convergence of these methods. This framework is based on a pair of supermartingale bounds, which control the convergence to feasibility and the convergence to optimality, respectively, and are coupled at different time scales.by Mengdi Wang.Ph.D
    corecore