We present RELATE, a model that learns to generate physically plausible
scenes and videos of multiple interacting objects. Similar to other generative
approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE
combines an object-centric GAN formulation with a model that explicitly
accounts for correlations between individual objects. This allows the model to
generate realistic scenes and videos from a physically-interpretable
parameterization. Furthermore, we show that modeling the object correlation is
necessary to learn to disentangle object positions and identity. We find that
RELATE is also amenable to physically realistic scene editing and that it
significantly outperforms prior art in object-centric scene generation in both
synthetic (CLEVR, ShapeStacks) and real-world data (cars). In addition, in
contrast to state-of-the-art methods in object-centric generative modeling,
RELATE also extends naturally to dynamic scenes and generates videos of high
visual fidelity. Source code, datasets and more results are available at
http://geometry.cs.ucl.ac.uk/projects/2020/relate/