3 research outputs found
Enhancing Interpretable Object Abstraction via Clustering-based Slot Initialization
Object-centric representations using slots have shown the advances towards
efficient, flexible and interpretable abstraction from low-level perceptual
features in a compositional scene. Current approaches randomize the initial
state of slots followed by an iterative refinement. As we show in this paper,
the random slot initialization significantly affects the accuracy of the final
slot prediction. Moreover, current approaches require a predetermined number of
slots from prior knowledge of the data, which limits the applicability in the
real world. In our work, we initialize the slot representations with clustering
algorithms conditioned on the perceptual input features. This requires an
additional layer in the architecture to initialize the slots given the
identified clusters. We design permutation invariant and permutation
equivariant versions of this layer to enable the exchangeable slot
representations after clustering. Additionally, we employ mean-shift clustering
to automatically identify the number of slots for a given scene. We evaluate
our method on object discovery and novel view synthesis tasks with various
datasets. The results show that our method outperforms prior works
consistently, especially for complex scenes
Generalization and Robustness Implications in Object-Centric Learning
The idea behind object-centric representation learning is that natural scenes
can better be modeled as compositions of objects and their relations as opposed
to distributed representations. This inductive bias can be injected into neural
networks to potentially improve systematic generalization and learning
efficiency of downstream tasks in scenes with multiple objects. In this paper,
we train state-of-the-art unsupervised models on five common multi-object
datasets and evaluate segmentation accuracy and downstream object property
prediction. In addition, we study systematic generalization and robustness by
investigating the settings where either single objects are out-of-distribution
-- e.g., having unseen colors, textures, and shapes -- or global properties of
the scene are altered -- e.g., by occlusions, cropping, or increasing the
number of objects. From our experimental study, we find object-centric
representations to be generally useful for downstream tasks and robust to
shifts in the data distribution, especially if shifts affect single objects