558 research outputs found

    Inverse stochastic optimal controls

    Full text link
    We study an inverse problem of the stochastic optimal control of general diffusions with performance index having the quadratic penalty term of the control process. Under mild conditions on the drift, the volatility, the cost functions of the state, and under the assumption that the optimal control belongs to the interior of the control set, we show that our inverse problem is well-posed using a stochastic maximum principle. Then, with the well-posedness, we reduce the inverse problem to some root finding problem of the expectation of a random variable involved with the value function, which has a unique solution. Based on this result, we propose a numerical method for our inverse problem by replacing the expectation above with arithmetic mean of observed optimal control processes and the corresponding state processes. The recent progress of numerical analyses of Hamilton-Jacobi-Bellman equations enables the proposed method to be implementable for multi-dimensional cases. In particular, with the help of the kernel-based collocation method for Hamilton-Jacobi-Bellman equations, our method for the inverse problems still works well even when an explicit form of the value function is unavailable. Several numerical experiments show that the numerical method recover the unknown weight parameter with high accuracy

    Optimal Reinforcement Learning for Gaussian Systems

    Full text link
    The exploration-exploitation trade-off is among the central challenges of reinforcement learning. The optimal Bayesian solution is intractable in general. This paper studies to what extent analytic statements about optimal learning are possible if all beliefs are Gaussian processes. A first order approximation of learning of both loss and dynamics, for nonlinear, time-varying systems in continuous time and space, subject to a relatively weak restriction on the dynamics, is described by an infinite-dimensional partial differential equation. An approximate finite-dimensional projection gives an impression for how this result may be helpful.Comment: final pre-conference version of this NIPS 2011 paper. Once again, please note some nontrivial changes to exposition and interpretation of the results, in particular in Equation (9) and Eqs. 11-14. The algorithm and results have remained the same, but their theoretical interpretation has change

    Nonparametric Infinite Horizon Kullback-Leibler Stochastic Control

    Full text link
    We present two nonparametric approaches to Kullback-Leibler (KL) control, or linearly-solvable Markov decision problem (LMDP) based on Gaussian processes (GP) and Nystr\"{o}m approximation. Compared to recently developed parametric methods, the proposed data-driven frameworks feature accurate function approximation and efficient on-line operations. Theoretically, we derive the mathematical connection of KL control based on dynamic programming with earlier work in control theory which relies on information theoretic dualities for the infinite time horizon case. Algorithmically, we give explicit optimal control policies in nonparametric forms, and propose on-line update schemes with budgeted computational costs. Numerical results demonstrate the effectiveness and usefulness of the proposed frameworks

    A Probabilistic Numerical Method for Fully Nonlinear Parabolic PDEs

    Full text link
    We consider the probabilistic numerical scheme for fully nonlinear PDEs suggested in \cite{cstv}, and show that it can be introduced naturally as a combination of Monte Carlo and finite differences scheme without appealing to the theory of backward stochastic differential equations. Our first main result provides the convergence of the discrete-time approximation and derives a bound on the discretization error in terms of the time step. An explicit implementable scheme requires to approximate the conditional expectation operators involved in the discretization. This induces a further Monte Carlo error. Our second main result is to prove the convergence of the latter approximation scheme, and to derive an upper bound on the approximation error. Numerical experiments are performed for the approximation of the solution of the mean curvature flow equation in dimensions two and three, and for two and five-dimensional (plus time) fully-nonlinear Hamilton-Jacobi-Bellman equations arising in the theory of portfolio optimization in financial mathematics

    Backward Ornstein-Uhlenbeck transition operators and mild solutions of non-autonomous Hamilton-Jacobi equations in Banach spaces

    Full text link
    In this paper we revisit the mild-solution approach to second-order semi-linear PDEs of Hamilton-Jacobi type in infinite-dimensional spaces. We show that a well-known result on existence of mild solutions in Hilbert spaces can be easily extended to non-autonomous Hamilton-Jacobi equations in Banach spaces. The main tool is the regularizing property of Ornstein-Uhlenbeck transition evolution operators for stochastic Cauchy problems in Banach spaces with time-dependent coefficient

    Feynman-Kac representation of fully nonlinear PDEs and applications

    Get PDF
    The classical Feynman-Kac formula states the connection between linear parabolic partial differential equations (PDEs), like the heat equation, and expectation of stochastic processes driven by Brownian motion. It gives then a method for solving linear PDEs by Monte Carlo simulations of random processes. The extension to (fully)nonlinear PDEs led in the recent years to important developments in stochastic analysis and the emergence of the theory of backward stochastic differential equations (BSDEs), which can be viewed as nonlinear Feynman-Kac formulas. We review in this paper the main ideas and results in this area, and present implications of these probabilistic representations for the numerical resolution of nonlinear PDEs, together with some applications to stochastic control problems and model uncertainty in finance

    Mean Field Games and Applications.

    Get PDF
    This text is inspired from a “Cours Bachelier” held in January 2009 and taught by Jean-Michel Lasry. This course was based upon the articles of the three authors and upon unpublished materials they developed. Proofs were not presented during the conferences and are now available. So are some issues that were only rapidly tackled during class.Mean Field Games;
    • …
    corecore