61 research outputs found
A neural network based policy iteration algorithm with global -superlinear convergence for stochastic games on domains
In this work, we propose a class of numerical schemes for solving semilinear
Hamilton-Jacobi-Bellman-Isaacs (HJBI) boundary value problems which arise
naturally from exit time problems of diffusion processes with controlled drift.
We exploit policy iteration to reduce the semilinear problem into a sequence of
linear Dirichlet problems, which are subsequently approximated by a multilayer
feedforward neural network ansatz. We establish that the numerical solutions
converge globally in the -norm, and further demonstrate that this
convergence is superlinear, by interpreting the algorithm as an inexact Newton
iteration for the HJBI equation. Moreover, we construct the optimal feedback
controls from the numerical value functions and deduce convergence. The
numerical schemes and convergence results are then extended to HJBI boundary
value problems corresponding to controlled diffusion processes with oblique
boundary reflection. Numerical experiments on the stochastic Zermelo navigation
problem are presented to illustrate the theoretical results and to demonstrate
the effectiveness of the method.Comment: Additional numerical experiments have been included (on Pages 27-31)
to show the proposed algorithm achieves a more stable and more rapid
convergence than the existing neural network based methods within similar
computational tim
Towards a Theoretical Foundation of Policy Optimization for Learning Control Policies
Gradient-based methods have been widely used for system design and
optimization in diverse application domains. Recently, there has been a renewed
interest in studying theoretical properties of these methods in the context of
control and reinforcement learning. This article surveys some of the recent
developments on policy optimization, a gradient-based iterative approach for
feedback control synthesis, popularized by successes of reinforcement learning.
We take an interdisciplinary perspective in our exposition that connects
control theory, reinforcement learning, and large-scale optimization. We review
a number of recently-developed theoretical results on the optimization
landscape, global convergence, and sample complexity of gradient-based methods
for various continuous control problems such as the linear quadratic regulator
(LQR), control, risk-sensitive control, linear quadratic
Gaussian (LQG) control, and output feedback synthesis. In conjunction with
these optimization results, we also discuss how direct policy optimization
handles stability and robustness concerns in learning-based control, two main
desiderata in control engineering. We conclude the survey by pointing out
several challenges and opportunities at the intersection of learning and
control.Comment: To Appear in Annual Review of Control, Robotics, and Autonomous
System
A fast iterative PDE-based algorithm for feedback controls of nonsmooth mean-field control problems
A PDE-based accelerated gradient algorithm is proposed to seek optimal
feedback controls of McKean-Vlasov dynamics subject to nonsmooth costs, whose
coefficients involve mean-field interactions both on the state and action. It
exploits a forward-backward splitting approach and iteratively refines the
approximate controls based on the gradients of smooth costs, the proximal maps
of nonsmooth costs, and dynamically updated momentum parameters. At each step,
the state dynamics is realized via a particle approximation, and the required
gradient is evaluated through a coupled system of nonlocal linear PDEs. The
latter is solved by finite difference approximation or neural network-based
residual approximation, depending on the state dimension. Exhaustive numerical
experiments for low and high-dimensional mean-field control problems, including
sparse stabilization of stochastic Cucker-Smale models, are presented, which
reveal that our algorithm captures important structures of the optimal feedback
control, and achieves a robust performance with respect to parameter
perturbation.Comment: Add Sections 2.3 and 2.4 for theoretical convergence result
A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee
We consider policy gradient methods for stochastic optimal control problem in
continuous time. In particular, we analyze the gradient flow for the control,
viewed as a continuous time limit of the policy gradient method. We prove the
global convergence of the gradient flow and establish a convergence rate under
some regularity assumptions. The main novelty in the analysis is the notion of
local optimal control function, which is introduced to characterize the local
optimality of the iterate.Comment: 53 page
Neural Q-learning for solving PDEs
Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by
adapting the Q-learning algorithm in reinforcement learning. To solve PDEs with Dirichlet
boundary condition, our “Q-PDE” algorithm is mesh-free and therefore has the potential
to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach,
we prove that the neural network approximator for the PDE solution, trained with the QPDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential
equation (ODE) as the number of hidden units → ∞. For monotone PDEs (i.e. those given
by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the
NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional
ODE, strongly converges in L
2
to the PDE solution as the training time → ∞. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm
is a solution of the PDE (not necessarily under the monotone condition). The numerical
performance of the Q-PDE algorithm is studied for several elliptic PDEs
State-dependent Riccati equation feedback stabilization for nonlinear PDEs
The synthesis of suboptimal feedback laws for controlling nonlinear dynamics arising from semi-discretized PDEs is studied. An approach based on the State-dependent Riccati Equation (SDRE) is presented for 2 and ∞ control problems. Depending on the nonlinearity and the dimension of the resulting problem, offline, online, and hybrid offline-online alternatives to the SDRE synthesis are proposed. The hybrid offline-online SDRE method reduces to the sequential solution of Lyapunov equations, effectively enabling the computation of suboptimal feedback controls for two-dimensional PDEs. Numerical tests for the Sine-Gordon, degenerate Zeldovich, and viscous Burgers’ PDEs are presented, providing a thorough experimental assessment of the proposed methodology
Fourier Neural Network Approximation of Transition Densities in Finance
This paper introduces FourNet, a novel single-layer feed-forward neural
network (FFNN) method designed to approximate transition densities for which
closed-form expressions of their Fourier transforms, i.e. characteristic
functions, are available. A unique feature of FourNet lies in its use of a
Gaussian activation function, enabling exact Fourier and inverse Fourier
transformations and drawing analogies with the Gaussian mixture model. We
mathematically establish FourNet's capacity to approximate transition densities
in the -sense arbitrarily well with finite number of neurons. The
parameters of FourNet are learned by minimizing a loss function derived from
the known characteristic function and the Fourier transform of the FFNN,
complemented by a strategic sampling approach to enhance training. Through a
rigorous and comprehensive error analysis, we derive informative bounds for the
estimation error and the potential (pointwise) loss of nonnegativity in
the estimated densities. FourNet's accuracy and versatility are demonstrated
through a wide range of dynamics common in quantitative finance, including
L\'{e}vy processes and the Heston stochastic volatility models-including those
augmented with the self-exciting Queue-Hawkes jump process.Comment: 27 pages, 5 figure
Neural Q-learning for solving elliptic PDEs
Solving high-dimensional partial differential equations (PDEs) is a major
challenge in scientific computing. We develop a new numerical method for
solving elliptic-type PDEs by adapting the Q-learning algorithm in
reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has
the potential to overcome the curse of dimensionality. Using a neural tangent
kernel (NTK) approach, we prove that the neural network approximator for the
PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of
an infinite-dimensional ordinary differential equation (ODE) as the number of
hidden units . For monotone PDE (i.e. those given by
monotone operators, which may be nonlinear), despite the lack of a spectral gap
in the NTK, we then prove that the limit neural network, which satisfies the
infinite-dimensional ODE, converges in to the PDE solution as the
training time . More generally, we can prove that any fixed
point of the wide-network limit for the Q-PDE algorithm is a solution of the
PDE (not necessarily under the monotone condition). The numerical performance
of the Q-PDE algorithm is studied for several elliptic PDEs
- …