8 research outputs found
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
In this paper we introduce the idea of improving the performance of
parametric temporal-difference (TD) learning algorithms by selectively
emphasizing or de-emphasizing their updates on different time steps. In
particular, we show that varying the emphasis of linear TD()'s updates
in a particular way causes its expected update to become stable under
off-policy training. The only prior model-free TD methods to achieve this with
per-step computation linear in the number of function approximation parameters
are the gradient-TD family of methods including TDC, GTD(), and
GQ(). Compared to these methods, our _emphatic TD()_ is
simpler and easier to use; it has only one learned parameter vector and one
step-size parameter. Our treatment includes general state-dependent discounting
and bootstrapping functions, and a way of specifying varying degrees of
interest in accurately valuing different states.Comment: 29 pages This is a significant revision based on the first set of
reviews. The most important change was to signal early that the main result
is about stability, not convergenc
Asynchronous Approximation of a Single Component of the Solution to a Linear System
We present a distributed asynchronous algorithm for approximating a single
component of the solution to a system of linear equations , where
is a positive definite real matrix, and . This is
equivalent to solving for in for some and such that
the spectral radius of is less than 1. Our algorithm relies on the Neumann
series characterization of the component , and is based on residual
updates. We analyze our algorithm within the context of a cloud computation
model, in which the computation is split into small update tasks performed by
small processors with shared access to a distributed file system. We prove a
robust asymptotic convergence result when the spectral radius ,
regardless of the precise order and frequency in which the update tasks are
performed. We provide convergence rate bounds which depend on the order of
update tasks performed, analyzing both deterministic update rules via counting
weighted random walks, as well as probabilistic update rules via concentration
bounds. The probabilistic analysis requires analyzing the product of random
matrices which are drawn from distributions that are time and path dependent.
We specifically consider the setting where is large, yet is sparse,
e.g., each row has at most nonzero entries. This is motivated by
applications in which is derived from the edge structure of an underlying
graph. Our results prove that if the local neighborhood of the graph does not
grow too quickly as a function of , our algorithm can provide significant
reduction in computation cost as opposed to any algorithm which computes the
global solution vector . Our algorithm obtains an
additive approximation for in constant time with respect to the size of
the matrix when the maximum row sparsity and
Stabilization of Stochastic Iterative Methods for Singular and Nearly Singular Linear Systems
We consider linear systems of equations, Ax = b, of various types frequently arising in large-scale applications, withanemphasisonthecasewhereAissingular. Undercertainconditions, necessaryaswell as sufficient, linear deterministic iterative methods generate sequences {xk} that converge to a solution, as long as there exists at least one solution. We show that this convergence property is frequently lost when these methods are implemented with simulation, as is often done in important classes of large-scale problems. We introduce additional conditions and novel algorithmic stabilization schemes under which {xk} converges to a solution when A is singular, and may also be used with substantial benefit when A is nearly singular. Moreover, we establish the mathematical foundation for related work that deals with special cases of singular systems, including some arising in approximate dynamic programming, where convergence may be obtained without a stabilization mechanism.
Stochastic methods for large-scale linear problems, variational inequalities, and convex optimization
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 201-207).This thesis considers stochastic methods for large-scale linear systems, variational inequalities, and convex optimization problems. I focus on special structures that lend themselves to sampling, such as when the linear/nonlinear mapping or the objective function is an expected value or is the sum of a large number of terms, and/or the constraint is the intersection of a large number of simpler sets. For linear systems, I propose modifications to deterministic methods to allow the use of random samples and maintain the stochastic convergence, which is particularly challenging when the unknown system is singular or nearly singular. For variational inequalities and optimization problems, I propose a class of methods that combine elements of incremental constraint projection, stochastic gradient/ subgradient descent, and proximal algorithm. These methods can be applied with various sampling schemes that are suitable for applications involving distributed implementation, large data set, or online learning. I use a unified framework to analyze the convergence and the rate of convergence of these methods. This framework is based on a pair of supermartingale bounds, which control the convergence to feasibility and the convergence to optimality, respectively, and are coupled at different time scales.by Mengdi Wang.Ph.D