61 research outputs found

    A neural network based policy iteration algorithm with global H2H^2-superlinear convergence for stochastic games on domains

    Full text link
    In this work, we propose a class of numerical schemes for solving semilinear Hamilton-Jacobi-Bellman-Isaacs (HJBI) boundary value problems which arise naturally from exit time problems of diffusion processes with controlled drift. We exploit policy iteration to reduce the semilinear problem into a sequence of linear Dirichlet problems, which are subsequently approximated by a multilayer feedforward neural network ansatz. We establish that the numerical solutions converge globally in the H2H^2-norm, and further demonstrate that this convergence is superlinear, by interpreting the algorithm as an inexact Newton iteration for the HJBI equation. Moreover, we construct the optimal feedback controls from the numerical value functions and deduce convergence. The numerical schemes and convergence results are then extended to HJBI boundary value problems corresponding to controlled diffusion processes with oblique boundary reflection. Numerical experiments on the stochastic Zermelo navigation problem are presented to illustrate the theoretical results and to demonstrate the effectiveness of the method.Comment: Additional numerical experiments have been included (on Pages 27-31) to show the proposed algorithm achieves a more stable and more rapid convergence than the existing neural network based methods within similar computational tim

    Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies

    Full text link
    Gradient-based methods have been widely used for system design and optimization in diverse application domains. Recently, there has been a renewed interest in studying theoretical properties of these methods in the context of control and reinforcement learning. This article surveys some of the recent developments on policy optimization, a gradient-based iterative approach for feedback control synthesis, popularized by successes of reinforcement learning. We take an interdisciplinary perspective in our exposition that connects control theory, reinforcement learning, and large-scale optimization. We review a number of recently-developed theoretical results on the optimization landscape, global convergence, and sample complexity of gradient-based methods for various continuous control problems such as the linear quadratic regulator (LQR), H\mathcal{H}_\infty control, risk-sensitive control, linear quadratic Gaussian (LQG) control, and output feedback synthesis. In conjunction with these optimization results, we also discuss how direct policy optimization handles stability and robustness concerns in learning-based control, two main desiderata in control engineering. We conclude the survey by pointing out several challenges and opportunities at the intersection of learning and control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous System

    A fast iterative PDE-based algorithm for feedback controls of nonsmooth mean-field control problems

    Full text link
    A PDE-based accelerated gradient algorithm is proposed to seek optimal feedback controls of McKean-Vlasov dynamics subject to nonsmooth costs, whose coefficients involve mean-field interactions both on the state and action. It exploits a forward-backward splitting approach and iteratively refines the approximate controls based on the gradients of smooth costs, the proximal maps of nonsmooth costs, and dynamically updated momentum parameters. At each step, the state dynamics is realized via a particle approximation, and the required gradient is evaluated through a coupled system of nonlocal linear PDEs. The latter is solved by finite difference approximation or neural network-based residual approximation, depending on the state dimension. Exhaustive numerical experiments for low and high-dimensional mean-field control problems, including sparse stabilization of stochastic Cucker-Smale models, are presented, which reveal that our algorithm captures important structures of the optimal feedback control, and achieves a robust performance with respect to parameter perturbation.Comment: Add Sections 2.3 and 2.4 for theoretical convergence result

    A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

    Full text link
    We consider policy gradient methods for stochastic optimal control problem in continuous time. In particular, we analyze the gradient flow for the control, viewed as a continuous time limit of the policy gradient method. We prove the global convergence of the gradient flow and establish a convergence rate under some regularity assumptions. The main novelty in the analysis is the notion of local optimal control function, which is introduced to characterize the local optimality of the iterate.Comment: 53 page

    Neural Q-learning for solving PDEs

    Get PDF
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. To solve PDEs with Dirichlet boundary condition, our “Q-PDE” algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the QPDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units → ∞. For monotone PDEs (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, strongly converges in L 2 to the PDE solution as the training time → ∞. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs

    State-dependent Riccati equation feedback stabilization for nonlinear PDEs

    Get PDF
    The synthesis of suboptimal feedback laws for controlling nonlinear dynamics arising from semi-discretized PDEs is studied. An approach based on the State-dependent Riccati Equation (SDRE) is presented for 2 and ∞ control problems. Depending on the nonlinearity and the dimension of the resulting problem, offline, online, and hybrid offline-online alternatives to the SDRE synthesis are proposed. The hybrid offline-online SDRE method reduces to the sequential solution of Lyapunov equations, effectively enabling the computation of suboptimal feedback controls for two-dimensional PDEs. Numerical tests for the Sine-Gordon, degenerate Zeldovich, and viscous Burgers’ PDEs are presented, providing a thorough experimental assessment of the proposed methodology

    Fourier Neural Network Approximation of Transition Densities in Finance

    Full text link
    This paper introduces FourNet, a novel single-layer feed-forward neural network (FFNN) method designed to approximate transition densities for which closed-form expressions of their Fourier transforms, i.e. characteristic functions, are available. A unique feature of FourNet lies in its use of a Gaussian activation function, enabling exact Fourier and inverse Fourier transformations and drawing analogies with the Gaussian mixture model. We mathematically establish FourNet's capacity to approximate transition densities in the L2L_2-sense arbitrarily well with finite number of neurons. The parameters of FourNet are learned by minimizing a loss function derived from the known characteristic function and the Fourier transform of the FFNN, complemented by a strategic sampling approach to enhance training. Through a rigorous and comprehensive error analysis, we derive informative bounds for the L2L_2 estimation error and the potential (pointwise) loss of nonnegativity in the estimated densities. FourNet's accuracy and versatility are demonstrated through a wide range of dynamics common in quantitative finance, including L\'{e}vy processes and the Heston stochastic volatility models-including those augmented with the self-exciting Queue-Hawkes jump process.Comment: 27 pages, 5 figure

    Neural Q-learning for solving elliptic PDEs

    Full text link
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units \rightarrow \infty. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in L2L^2 to the PDE solution as the training time \rightarrow \infty. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs