38,155 research outputs found
A unified framework for linear function approximation of value functions in stochastic control
The proceeding at:21st European Signal Processing Conference (EUSIPCO 2013), took place 2013, September 9-13, in Marrakech (marroc).This paper contributes with a unified formulation that merges previous analysis on the prediction of the performance (value function) of certain sequence of actions (policy) when an agent operates a Markov decision process with large state-space. When the states are represented by features and the value function is linearly approximated, our analysis reveals a new relationship between two common cost functions used to obtain the optimal approximation. In addition, this analysis allows us to propose an efficient adaptive algorithm that provides an unbiased linear estimate. The performance of the proposed algorithm is illustrated by simulation, showing competitive results when compared with the state-of-the-art solutions.This work has been partly funded by the Spanish Ministry of Science and
Innovation with the project GRE3N (TEC 2011-29006-C03-01/02/03) and in
the program CONSOLIDER-INGENIO 2010 under project COMONSENS
(CSD 2008-00010). This work was supported in part by the Spanish Ministry of Science and Innovation under the grants TEC2009-14219-C03-01,TEC2010-21217-C02-
02-CR4HFDVL and in the program CONSOLIDER-INGENIO 2010 under
the grant CSD2008-00010 COMONSENS; and by the European Commission
under the grant FP7-ICT-2009-4-248894-WHERE-2.Publicad
A unified approach for the solution of the Fokker-Planck equation
This paper explores the use of a discrete singular convolution algorithm as a
unified approach for numerical integration of the Fokker-Planck equation. The
unified features of the discrete singular convolution algorithm are discussed.
It is demonstrated that different implementations of the present algorithm,
such as global, local, Galerkin, collocation, and finite difference, can be
deduced from a single starting point. Three benchmark stochastic systems, the
repulsive Wong process, the Black-Scholes equation and a genuine nonlinear
model, are employed to illustrate the robustness and to test accuracy of the
present approach for the solution of the Fokker-Planck equation via a
time-dependent method. An additional example, the incompressible Euler
equation, is used to further validate the present approach for more difficult
problems. Numerical results indicate that the present unified approach is
robust and accurate for solving the Fokker-Planck equation.Comment: 19 page
Should one compute the Temporal Difference fix point or minimize the Bellman Residual? The unified oblique projection view
We investigate projection methods, for evaluating a linear approximation of
the value function of a policy in a Markov Decision Process context. We
consider two popular approaches, the one-step Temporal Difference fix-point
computation (TD(0)) and the Bellman Residual (BR) minimization. We describe
examples, where each method outperforms the other. We highlight a simple
relation between the objective function they minimize, and show that while BR
enjoys a performance guarantee, TD(0) does not in general. We then propose a
unified view in terms of oblique projections of the Bellman equation, which
substantially simplifies and extends the characterization of (schoknecht,2002)
and the recent analysis of (Yu & Bertsekas, 2008). Eventually, we describe some
simulations that suggest that if the TD(0) solution is usually slightly better
than the BR solution, its inherent numerical instability makes it very bad in
some cases, and thus worse on average
Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control
We present two nonparametric approaches to Kullback-Leibler (KL) control, or
linearly-solvable Markov decision problem (LMDP) based on Gaussian processes
(GP) and Nystr\"{o}m approximation. Compared to recently developed parametric
methods, the proposed data-driven frameworks feature accurate function
approximation and efficient on-line operations. Theoretically, we derive the
mathematical connection of KL control based on dynamic programming with earlier
work in control theory which relies on information theoretic dualities for the
infinite time horizon case. Algorithmically, we give explicit optimal control
policies in nonparametric forms, and propose on-line update schemes with
budgeted computational costs. Numerical results demonstrate the effectiveness
and usefulness of the proposed frameworks
Semidefinite Relaxations for Stochastic Optimal Control Policies
Recent results in the study of the Hamilton Jacobi Bellman (HJB) equation
have led to the discovery of a formulation of the value function as a linear
Partial Differential Equation (PDE) for stochastic nonlinear systems with a
mild constraint on their disturbances. This has yielded promising directions
for research in the planning and control of nonlinear systems. This work
proposes a new method obtaining approximate solutions to these linear
stochastic optimal control (SOC) problems. A candidate polynomial with variable
coefficients is proposed as the solution to the SOC problem. A Sum of Squares
(SOS) relaxation is then taken to the partial differential constraints, leading
to a hierarchy of semidefinite relaxations with improving sub-optimality gap.
The resulting approximate solutions are shown to be guaranteed over- and
under-approximations for the optimal value function.Comment: Preprint. Accepted to American Controls Conference (ACC) 2014 in
Portland, Oregon. 7 pages, colo
- …