The generalization of model-based reinforcement learning (MBRL) methods to
environments with unseen transition dynamics is an important yet challenging
problem. Existing methods try to extract environment-specified information Z
from past transition segments to make the dynamics prediction model
generalizable to different dynamics. However, because environments are not
labelled, the extracted information inevitably contains redundant information
unrelated to the dynamics in transition segments and thus fails to maintain a
crucial property of Z: Z should be similar in the same environment and
dissimilar in different ones. As a result, the learned dynamics prediction
function will deviate from the true one, which undermines the generalization
ability. To tackle this problem, we introduce an interventional prediction
module to estimate the probability of two estimated z^iβ,z^jβ
belonging to the same environment. Furthermore, by utilizing the Z's
invariance within a single environment, a relational head is proposed to
enforce the similarity between Z^ from the same environment. As a
result, the redundant information will be reduced in Z^. We empirically
show that Z^ estimated by our method enjoy less redundant information
than previous methods, and such Z^ can significantly reduce dynamics
prediction errors and improve the performance of model-based RL methods on
zero-shot new environments with unseen dynamics. The codes of this method are
available at \url{https://github.com/CR-Gjx/RIA}.Comment: ICLR2022 accepted pape