38,155 research outputs found

    A unified framework for linear function approximation of value functions in stochastic control

    Get PDF
    The proceeding at:21st European Signal Processing Conference (EUSIPCO 2013), took place 2013, September 9-13, in Marrakech (marroc).This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.This work has been partly funded by the Spanish Ministry of Science and Innovation with the project GRE3N (TEC 2011-29006-C03-01/02/03) and in the program CONSOLIDER-INGENIO 2010 under project COMONSENS (CSD 2008-00010). This work was supported in part by the Spanish Ministry of Science and Innovation under the grants TEC2009-14219-C03-01,TEC2010-21217-C02- 02-CR4HFDVL and in the program CONSOLIDER-INGENIO 2010 under the grant CSD2008-00010 COMONSENS; and by the European Commission under the grant FP7-ICT-2009-4-248894-WHERE-2.Publicad

    A unified approach for the solution of the Fokker-Planck equation

    Full text link
    This paper explores the use of a discrete singular convolution algorithm as a unified approach for numerical integration of the Fokker-Planck equation. The unified features of the discrete singular convolution algorithm are discussed. It is demonstrated that different implementations of the present algorithm, such as global, local, Galerkin, collocation, and finite difference, can be deduced from a single starting point. Three benchmark stochastic systems, the repulsive Wong process, the Black-Scholes equation and a genuine nonlinear model, are employed to illustrate the robustness and to test accuracy of the present approach for the solution of the Fokker-Planck equation via a time-dependent method. An additional example, the incompressible Euler equation, is used to further validate the present approach for more difficult problems. Numerical results indicate that the present unified approach is robust and accurate for solving the Fokker-Planck equation.Comment: 19 page

    Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view

    Get PDF
    We investigate projection methods, for evaluating a linear approximation of the value function of a policy in a Markov Decision Process context. We consider two popular approaches, the one-step Temporal Difference fix-point computation (TD(0)) and the Bellman Residual (BR) minimization. We describe examples, where each method outperforms the other. We highlight a simple relation between the objective function they minimize, and show that while BR enjoys a performance guarantee, TD(0) does not in general. We then propose a unified view in terms of oblique projections of the Bellman equation, which substantially simplifies and extends the characterization of (schoknecht,2002) and the recent analysis of (Yu & Bertsekas, 2008). Eventually, we describe some simulations that suggest that if the TD(0) solution is usually slightly better than the BR solution, its inherent numerical instability makes it very bad in some cases, and thus worse on average

    Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

    Full text link
    We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystr\"{o}m approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks

    Semidefinite Relaxations for Stochastic Optimal Control Policies

    Full text link
    Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation have led to the discovery of a formulation of the value function as a linear Partial Differential Equation (PDE) for stochastic nonlinear systems with a mild constraint on their disturbances. This has yielded promising directions for research in the planning and control of nonlinear systems. This work proposes a new method obtaining approximate solutions to these linear stochastic optimal control (SOC) problems. A candidate polynomial with variable coefficients is proposed as the solution to the SOC problem. A Sum of Squares (SOS) relaxation is then taken to the partial differential constraints, leading to a hierarchy of semidefinite relaxations with improving sub-optimality gap. The resulting approximate solutions are shown to be guaranteed over- and under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in Portland, Oregon. 7 pages, colo
    corecore