25 research outputs found
Max-Min Off-Policy Actor-Critic Method Focusing on Worst-Case Robustness to Model Misspecification
In the field of reinforcement learning, because of the high cost and risk of
policy training in the real world, policies are trained in a simulation
environment and transferred to the corresponding real-world environment.
However, the simulation environment does not perfectly mimic the real-world
environment, lead to model misspecification. Multiple studies report
significant deterioration of policy performance in a real-world environment. In
this study, we focus on scenarios involving a simulation environment with
uncertainty parameters and the set of their possible values, called the
uncertainty parameter set. The aim is to optimize the worst-case performance on
the uncertainty parameter set to guarantee the performance in the corresponding
real-world environment. To obtain a policy for the optimization, we propose an
off-policy actor-critic approach called the Max-Min Twin Delayed Deep
Deterministic Policy Gradient algorithm (M2TD3), which solves a max-min
optimization problem using a simultaneous gradient ascent descent approach.
Experiments in multi-joint dynamics with contact (MuJoCo) environments show
that the proposed method exhibited a worst-case performance superior to several
baseline approaches.Comment: Neural Information Processing Systems 2022 (NeurIPS '22