This paper is on learning the Kalman gain by policy optimization method.
Firstly, we reformulate the finite-horizon Kalman filter as a policy
optimization problem of the dual system. Secondly, we obtain the global linear
convergence of exact gradient descent method in the setting of known
parameters. Thirdly, the gradient estimation and stochastic gradient descent
method are proposed to solve the policy optimization problem, and further the
global linear convergence and sample complexity of stochastic gradient descent
are provided for the setting of unknown noise covariance matrices and known
model parameters