We present a mathematical framework and computational methods to optimally
design a finite number of sequential experiments. We formulate this sequential
optimal experimental design (sOED) problem as a finite-horizon partially
observable Markov decision process (POMDP) in a Bayesian setting and with
information-theoretic utilities. It is built to accommodate continuous random
variables, general non-Gaussian posteriors, and expensive nonlinear forward
models. sOED then seeks an optimal design policy that incorporates elements of
both feedback and lookahead, generalizing the suboptimal batch and greedy
designs. We solve for the sOED policy numerically via policy gradient (PG)
methods from reinforcement learning, and derive and prove the PG expression for
sOED. Adopting an actor-critic approach, we parameterize the policy and value
functions using deep neural networks and improve them using gradient estimates
produced from simulated episodes of designs and observations. The overall
PG-sOED method is validated on a linear-Gaussian benchmark, and its advantages
over batch and greedy designs are demonstrated through a contaminant source
inversion problem in a convection-diffusion field.Comment: Preprint 37 pages, 16 figure