7,941 research outputs found
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
SCAN: Learning Hierarchical Compositional Visual Concepts
The seemingly infinite diversity of the natural world arises from a
relatively small set of coherent rules, such as the laws of physics or
chemistry. We conjecture that these rules give rise to regularities that can be
discovered through primarily unsupervised experiences and represented as
abstract concepts. If such representations are compositional and hierarchical,
they can be recombined into an exponentially large set of new concepts. This
paper describes SCAN (Symbol-Concept Association Network), a new framework for
learning such abstractions in the visual domain. SCAN learns concepts through
fast symbol association, grounding them in disentangled visual primitives that
are discovered in an unsupervised manner. Unlike state of the art multimodal
generative model baselines, our approach requires very few pairings between
symbols and images and makes no assumptions about the form of symbol
representations. Once trained, SCAN is capable of multimodal bi-directional
inference, generating a diverse set of image samples from symbolic descriptions
and vice versa. It also allows for traversal and manipulation of the implicit
hierarchy of visual concepts through symbolic instructions and learnt logical
recombination operations. Such manipulations enable SCAN to break away from its
training data distribution and imagine novel visual concepts through
symbolically instructed recombination of previously learnt concepts
- …