Model-based reinforcement learning usually suffers from a high sample
complexity in training the world model, especially for the environments with
complex dynamics. To make the training for general physical environments more
efficient, we introduce Hamiltonian canonical ordinary differential equations
into the learning process, which inspires a novel model of neural ordinary
differential auto-encoder (NODA). NODA can model the physical world by nature
and is flexible to impose Hamiltonian mechanics (e.g., the dimension of the
physical equations) which can further accelerate training of the environment
models. It can consequentially empower an RL agent with the robust
extrapolation using a small amount of samples as well as the guarantee on the
physical plausibility. Theoretically, we prove that NODA has uniform bounds for
multi-step transition errors and value errors under certain conditions.
Extensive experiments show that NODA can learn the environment dynamics
effectively with a high sample efficiency, making it possible to facilitate
reinforcement learning agents at the early stage