1 research outputs found

    How to easily make a policy network of reinforcement learning robust without physical modeling

    No full text
    In this paper, we propose ensemble inverse model network based disturbance observer (EIMN-DOB) to improve the robustness of the policy network (PN) which is a training result of policy based reinforcement learning (RL), without physical modeling. EIMN-DOB uses the ensemble model of the inverse model network (IMN), which acts as a nominal inverse model, and can estimate and cancel model uncertainty and disturbance like a typical disturbance observer (DOB) without a physical modeling. Because EIMN is trained from the data used in training RL, the additional training data for expressing the inverse model are not required. The experiments in this paper appeared that the PN of soft actor critic(SAC) combined with EIMN-DOB maintains control performance even in the presence of disturbance in continuous control benchmark tasks based on Mujoco physics engine. When the trained PN is used with EIMN-DOB in the real environment, the control performance in simulator can be preserved in the real environment, and it is expected to be utilized to minimize the sim-to-real gap of RL.1
    corecore