Many real world tasks such as reasoning and physical interaction require
identification and manipulation of conceptual entities. A first step towards
solving these tasks is the automated discovery of distributed symbol-like
representations. In this paper, we explicitly formalize this problem as
inference in a spatial mixture model where each component is parametrized by a
neural network. Based on the Expectation Maximization framework we then derive
a differentiable clustering method that simultaneously learns how to group and
represent individual entities. We evaluate our method on the (sequential)
perceptual grouping task and find that it is able to accurately recover the
constituent objects. We demonstrate that the learned representations are useful
for next-step prediction.Comment: Accepted to NIPS 201