8,313 research outputs found

    Stable linear approximations to dynamic programming for stochastic control problems with local transitions

    Get PDF
    Caption title.Includes bibliographical references (leaf [7]).Supported by NSF. ECS 9216531 Supported by EPRI. 8030-10 Supported by the ARO.Benjamin Van Roy and John N. Tsitsiklis

    Deterministic continutation of stochastic metastable equilibria via Lyapunov equations and ellipsoids

    Full text link
    Numerical continuation methods for deterministic dynamical systems have been one of the most successful tools in applied dynamical systems theory. Continuation techniques have been employed in all branches of the natural sciences as well as in engineering to analyze ordinary, partial and delay differential equations. Here we show that the deterministic continuation algorithm for equilibrium points can be extended to track information about metastable equilibrium points of stochastic differential equations (SDEs). We stress that we do not develop a new technical tool but that we combine results and methods from probability theory, dynamical systems, numerical analysis, optimization and control theory into an algorithm that augments classical equilibrium continuation methods. In particular, we use ellipsoids defining regions of high concentration of sample paths. It is shown that these ellipsoids and the distances between them can be efficiently calculated using iterative methods that take advantage of the numerical continuation framework. We apply our method to a bistable neural competition model and a classical predator-prey system. Furthermore, we show how global assumptions on the flow can be incorporated - if they are available - by relating numerical continuation, Kramers' formula and Rayleigh iteration.Comment: 29 pages, 7 figures [Fig.7 reduced in quality due to arXiv size restrictions]; v2 - added Section 9 on Kramers' formula, additional computations, corrected typos, improved explanation

    The non-locality of Markov chain approximations to two-dimensional diffusions

    Full text link
    In this short paper, we consider discrete-time Markov chains on lattices as approximations to continuous-time diffusion processes. The approximations can be interpreted as finite difference schemes for the generator of the process. We derive conditions on the diffusion coefficients which permit transition probabilities to match locally first and second moments. We derive a novel formula which expresses how the matching becomes more difficult for larger (absolute) correlations and strongly anisotropic processes, such that instantaneous moves to more distant neighbours on the lattice have to be allowed. Roughly speaking, for non-zero correlations, the distance covered in one timestep is proportional to the ratio of volatilities in the two directions. We discuss the implications to Markov decision processes and the convergence analysis of approximations to Hamilton-Jacobi-Bellman equations in the Barles-Souganidis framework.Comment: Corrected two errata from previous and journal version: definition of R in (5) and summations in (7

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
    • …
    corecore