In this paper, we propose ensemble inverse model network based disturbance observer (EIMN-DOB) to improve the robustness of the policy network (PN) which is a training result of policy based reinforcement learning (RL),
without physical modeling. EIMN-DOB uses the ensemble model of the inverse model network (IMN), which acts as a
nominal inverse model, and can estimate and cancel model uncertainty and disturbance like a typical disturbance observer
(DOB) without a physical modeling. Because EIMN is trained from the data used in training RL, the additional training
data for expressing the inverse model are not required. The experiments in this paper appeared that the PN of soft actor
critic(SAC) combined with EIMN-DOB maintains control performance even in the presence of disturbance in continuous
control benchmark tasks based on Mujoco physics engine. When the trained PN is used with EIMN-DOB in the real environment, the control performance in simulator can be preserved in the real environment, and it is expected to be utilized
to minimize the sim-to-real gap of RL.1