We propose a model-based reinforcement learning (RL) approach for noisy
time-dependent gate optimization with improved sample complexity over
model-free RL. Sample complexity is the number of controller interactions with
the physical system. Leveraging an inductive bias, inspired by recent advances
in neural ordinary differential equations (ODEs), we use an auto-differentiable
ODE parametrised by a learnable Hamiltonian ansatz to represent the model
approximating the environment whose time-dependent part, including the control,
is fully known. Control alongside Hamiltonian learning of continuous
time-independent parameters is addressed through interactions with the system.
We demonstrate an order of magnitude advantage in the sample complexity of our
method over standard model-free RL in preparing some standard unitary gates
with closed and open system dynamics, in realistic numerical experiments
incorporating single shot measurements, arbitrary Hilbert space truncations and
uncertainty in Hamiltonian parameters. Also, the learned Hamiltonian can be
leveraged by existing control methods like GRAPE for further gradient-based
optimization with the controllers found by RL as initializations. Our algorithm
that we apply on nitrogen vacancy (NV) centers and transmons in this paper is
well suited for controlling partially characterised one and two qubit systems.Comment: 14+6 pages, 6+4 figures, comments welcome