Self-supervised, category-agnostic segmentation of real-world images is a
challenging open problem in computer vision. Here, we show how to learn static
grouping priors from motion self-supervision by building on the cognitive
science concept of a Spelke Object: a set of physical stuff that moves
together. We introduce the Excitatory-Inhibitory Segment Extraction Network
(EISEN), which learns to extract pairwise affinity graphs for static scenes
from motion-based training signals. EISEN then produces segments from
affinities using a novel graph propagation and competition network. During
training, objects that undergo correlated motion (such as robot arms and the
objects they move) are decoupled by a bootstrapping process: EISEN explains
away the motion of objects it has already learned to segment. We show that
EISEN achieves a substantial improvement in the state of the art for
self-supervised image segmentation on challenging synthetic and real-world
robotics datasets.Comment: 25 pages, 10 figure