Search CORE

80,220 research outputs found

Approximate policy iteration: A survey and some new methods

Author: A. G. Barto
A. G. Barto
A. Gosavi
A. L. Samuel
A. L. Samuel
A. Nedić
B. Martinet
B. Roy Van
B. Roy Van
C. A. J. Fletcher
C. Szepesvari
C. Thiery
D. D. Castro
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. P. Bertsekas
D. S. Choi
D. White
Dimitri P. Bertsekas
E. V. Denardo
F. L. Lewis
F. L. Lewis
F. Pineda
G. J. Gordon
G. J. Tesauro
G. Strang
H. Chang
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
H. Yu
I. Menache
I. Szita
J. A. Boyan
J. Liu
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
J. N. Tsitsiklis
L. Busoniu
L. Busoniu
L. Busoniu
L. C. Baird
L. Gurvits
L. N. Trefethen
L. S. Shapley
M. A. Krasnoselskii
M. G. Lagoudakis
M. L. Puterman
M. Wang
N. Polydorides
P. J. Werbos
P. J. Werbös
P. T. Boer de
R. J. Williams
R. S. Sutton
R. S. Sutton
R. T. Rockafellar
R. Y. Rubinstein
S. J. Bradtke
S. Meyn
S. P. Singh
T. Jaakkola
T. Jung
V. F. Farias
V. S. Borkar
W. B. Powell
X. R. Cao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

We consider the classical policy iteration method of dynamic programming (DP), where approximations and simulation are used to deal with the curse of dimensionality. We survey a number of issues: convergence and rate of convergence of approximate policy evaluation methods, singularity and susceptibility to simulation noise of policy evaluation, exploration issues, constrained and enhanced policy iteration, policy oscillation and chattering, and optimistic and distributed policy iteration. Our discussion of policy evaluation is couched in general terms and aims to unify the available methods in the light of recent research developments and to compare the two main policy evaluation approaches: projected equations and temporal differences (TD), and aggregation. In the context of these approaches, we survey two different types of simulation-based algorithms: matrix inversion methods, such as least-squares temporal difference (LSTD), and iterative methods, such as least-squares policy evaluation (LSPE) and TD (λ), and their scaled variants. We discuss a recent method, based on regression and regularization, which rectifies the unreliability of LSTD for nearly singular projected Bellman equations. An iterative version of this method belongs to the LSPE class of methods and provides the connecting link between LSTD and LSPE. Our discussion of policy improvement focuses on the role of policy oscillation and its effect on performance guarantees. We illustrate that policy evaluation when done by the projected equation/TD approach may lead to policy oscillation, but when done by aggregation it does not. This implies better error bounds and more regular performance for aggregation, at the expense of some loss of generality in cost function representation capability. Hard aggregation provides the connecting link between projected equation/TD-based and aggregation-based policy evaluation, and is characterized by favorable error bounds.National Science Foundation (U.S.) (No.ECCS-0801549)Los Alamos National Laboratory. Information Science and Technology InstituteUnited States. Air Force (No.FA9550-10-1-0412

CiteSeerX

DSpace@MIT

Crossref

Institute of Mathematics AS CR, v. v. i.

Exponential Krylov time integration for modeling multi-frequency optical response with monochromatic sources

Author: Botchev M. A.
Hanse A. M.
Uppu R.
Publication venue: 'Elsevier BV'
Publication date: 30/06/2017
Field of study

Light incident on a layer of scattering material such as a piece of sugar or white paper forms a characteristic speckle pattern in transmission and reflection. The information hidden in the correlations of the speckle pattern with varying frequency, polarization and angle of the incident light can be exploited for applications such as biomedical imaging and high-resolution microscopy. Conventional computational models for multi-frequency optical response involve multiple solution runs of Maxwell's equations with monochromatic sources. Exponential Krylov subspace time solvers are promising candidates for improving efficiency of such models, as single monochromatic solution can be reused for the other frequencies without performing full time-domain computations at each frequency. However, we show that the straightforward implementation appears to have serious limitations. We further propose alternative ways for efficient solution through Krylov subspace methods. Our methods are based on two different splittings of the unknown solution into different parts, each of which can be computed efficiently. Experiments demonstrate a significant gain in computation time with respect to the standard solvers.Comment: 22 pages, 4 figure

arXiv.org e-Print Archive

University of Twente Research Information

Application of Operator Splitting Methods in Finance

Author: Hout Karel in 't
Toivanen Jari
Publication venue
Publication date: 04/04/2015
Field of study

Financial derivatives pricing aims to find the fair value of a financial contract on an underlying asset. Here we consider option pricing in the partial differential equations framework. The contemporary models lead to one-dimensional or multidimensional parabolic problems of the convection-diffusion type and generalizations thereof. An overview of various operator splitting methods is presented for the efficient numerical solution of these problems. Splitting schemes of the Alternating Direction Implicit (ADI) type are discussed for multidimensional problems, e.g. given by stochastic volatility (SV) models. For jump models Implicit-Explicit (IMEX) methods are considered which efficiently treat the nonlocal jump operator. For American options an easy-to-implement operator splitting method is described for the resulting linear complementarity problems. Numerical experiments are presented to illustrate the actual stability and convergence of the splitting schemes. Here European and American put options are considered under four asset price models: the classical Black-Scholes model, the Merton jump-diffusion model, the Heston SV model, and the Bates SV model with jumps

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen