The idea behind object-centric representation learning is that natural scenes
can better be modeled as compositions of objects and their relations as opposed
to distributed representations. This inductive bias can be injected into neural
networks to potentially improve systematic generalization and learning
efficiency of downstream tasks in scenes with multiple objects. In this paper,
we train state-of-the-art unsupervised models on five common multi-object
datasets and evaluate segmentation accuracy and downstream object property
prediction. In addition, we study systematic generalization and robustness by
investigating the settings where either single objects are out-of-distribution
-- e.g., having unseen colors, textures, and shapes -- or global properties of
the scene are altered -- e.g., by occlusions, cropping, or increasing the
number of objects. From our experimental study, we find object-centric
representations to be generally useful for downstream tasks and robust to
shifts in the data distribution, especially if shifts affect single objects