Search CORE

8 research outputs found

An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning

Author: Mahmood A. Rupam
Sutton Richard S.
White Martha
Publication venue
Publication date: 20/04/2015
Field of study

In this paper we introduce the idea of improving the performance of parametric temporal-difference (TD) learning algorithms by selectively emphasizing or de-emphasizing their updates on different time steps. In particular, we show that varying the emphasis of linear TD(

\lambda

)'s updates in a particular way causes its expected update to become stable under off-policy training. The only prior model-free TD methods to achieve this with per-step computation linear in the number of function approximation parameters are the gradient-TD family of methods including TDC, GTD(

\lambda

), and GQ(

\lambda

). Compared to these methods, our _emphatic TD(

\lambda

)_ is simpler and easier to use; it has only one learned parameter vector and one step-size parameter. Our treatment includes general state-dependent discounting and bootstrapping functions, and a way of specifying varying degrees of interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of reviews. The most important change was to signal early that the main result is about stability, not convergenc

arXiv.org e-Print Archive

CiteSeerX

Asynchronous Approximation of a Single Component of the Solution to a Linear System

Author: Ozdaglar Asuman
Shah Devavrat
Yu Christina Lee
Publication venue
Publication date: 21/01/2019
Field of study

We present a distributed asynchronous algorithm for approximating a single component of the solution to a system of linear equations

Ax = b

, where

A

is a positive definite real matrix, and

b \in \mathbb{R}^n

. This is equivalent to solving for

x_i

x = Gx + z

for some

G

and

z

such that the spectral radius of

G

is less than 1. Our algorithm relies on the Neumann series characterization of the component

x_i

, and is based on residual updates. We analyze our algorithm within the context of a cloud computation model, in which the computation is split into small update tasks performed by small processors with shared access to a distributed file system. We prove a robust asymptotic convergence result when the spectral radius

\rho(|G|) < 1

, regardless of the precise order and frequency in which the update tasks are performed. We provide convergence rate bounds which depend on the order of update tasks performed, analyzing both deterministic update rules via counting weighted random walks, as well as probabilistic update rules via concentration bounds. The probabilistic analysis requires analyzing the product of random matrices which are drawn from distributions that are time and path dependent. We specifically consider the setting where

n

is large, yet

G

is sparse, e.g., each row has at most

d

nonzero entries. This is motivated by applications in which

G

is derived from the edge structure of an underlying graph. Our results prove that if the local neighborhood of the graph does not grow too quickly as a function of

n

, our algorithm can provide significant reduction in computation cost as opposed to any algorithm which computes the global solution vector

x

. Our algorithm obtains an

\epsilon \|x\|_2

additive approximation for

x_i

in constant time with respect to the size of the matrix when the maximum row sparsity

d = O(1)

and

1/(1-\|G\|_2) = O(1)

arXiv.org e-Print Archive

DSpace@MIT

Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems

Author: Dimitri P. Bertsekas
Mengdi Wang
Publication venue
Publication date: 01/01/2011
Field of study

We consider linear systems of equations, Ax = b, of various types frequently arising in large-scale applications, withanemphasisonthecasewhereAissingular. Undercertainconditions, necessaryaswell as sufficient, linear deterministic iterative methods generate sequences {xk} that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of large-scale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which {xk} converges to a solution when A is singular, and may also be used with substantial benefit when A is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism.

CiteSeerX

DSpace@MIT

Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems

Author: Bertsekas DP
Bertsekas DP
Bertsekas DP
Bertsekas DP
Borkar VS
Bradtke SJ
Campbell SL
Cottle RW
Curtiss JH
Curtiss JH
Dimitri P. Bertsekas
Facchinei F
Hageman LA
Keller HB
Korpelevich GM
Kushner HJ
Lancaster P
Mengdi Wang
Meyn SP
Ortega JM
Stewart GW
Sutton RS
Publication venue: 'Institute for Operations Research and the Management Sciences (INFORMS)'
Publication date
Field of study

Crossref

Stochastic methods for large-scale linear problems, variational inequalities, and convex optimization

Author: Wang Mengdi
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2013
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 201-207).This thesis considers stochastic methods for large-scale linear systems, variational inequalities, and convex optimization problems. I focus on special structures that lend themselves to sampling, such as when the linear/nonlinear mapping or the objective function is an expected value or is the sum of a large number of terms, and/or the constraint is the intersection of a large number of simpler sets. For linear systems, I propose modifications to deterministic methods to allow the use of random samples and maintain the stochastic convergence, which is particularly challenging when the unknown system is singular or nearly singular. For variational inequalities and optimization problems, I propose a class of methods that combine elements of incremental constraint projection, stochastic gradient/ subgradient descent, and proximal algorithm. These methods can be applied with various sampling schemes that are suitable for applications involving distributed implementation, large data set, or online learning. I use a unified framework to analyze the convergence and the rate of convergence of these methods. This framework is based on a pair of supermartingale bounds, which control the convergence to feasibility and the convergence to optimality, respectively, and are coupled at different time scales.by Mengdi Wang.Ph.D

DSpace@MIT