2,268 research outputs found
Physical Primitive Decomposition
Objects are made of parts, each with distinct geometry, physics,
functionality, and affordances. Developing such a distributed, physical,
interpretable representation of objects will facilitate intelligent agents to
better explore and interact with the world. In this paper, we study physical
primitive decomposition---understanding an object through its components, each
with physical and geometric attributes. As annotated data for object parts and
physics are rare, we propose a novel formulation that learns physical
primitives by explaining both an object's appearance and its behaviors in
physical events. Our model performs well on block towers and tools in both
synthetic and real scenarios; we also demonstrate that visual and physical
observations often provide complementary signals. We further present ablation
and behavioral studies to better understand our model and contrast it with
human performance.Comment: ECCV 2018. Project page: http://ppd.csail.mit.edu
A Comparative Evaluation of Approximate Probabilistic Simulation and Deep Neural Networks as Accounts of Human Physical Scene Understanding
Humans demonstrate remarkable abilities to predict physical events in complex
scenes. Two classes of models for physical scene understanding have recently
been proposed: "Intuitive Physics Engines", or IPEs, which posit that people
make predictions by running approximate probabilistic simulations in causal
mental models similar in nature to video-game physics engines, and memory-based
models, which make judgments based on analogies to stored experiences of
previously encountered scenes and physical outcomes. Versions of the latter
have recently been instantiated in convolutional neural network (CNN)
architectures. Here we report four experiments that, to our knowledge, are the
first rigorous comparisons of simulation-based and CNN-based models, where both
approaches are concretely instantiated in algorithms that can run on raw image
inputs and produce as outputs physical judgments such as whether a stack of
blocks will fall. Both approaches can achieve super-human accuracy levels and
can quantitatively predict human judgments to a similar degree, but only the
simulation-based models generalize to novel situations in ways that people do,
and are qualitatively consistent with systematic perceptual illusions and
judgment asymmetries that people show.Comment: Accepted to CogSci 2016 as an oral presentatio
Learning to Reconstruct Shapes from Unseen Classes
From a single image, humans are able to perceive the full 3D shape of an
object by exploiting learned shape priors from everyday life. Contemporary
single-image 3D reconstruction algorithms aim to solve this task in a similar
fashion, but often end up with priors that are highly biased by training
classes. Here we present an algorithm, Generalizable Reconstruction (GenRe),
designed to capture more generic, class-agnostic shape priors. We achieve this
with an inference network and training procedure that combine 2.5D
representations of visible surfaces (depth and silhouette), spherical shape
representations of both visible and non-visible surfaces, and 3D voxel-based
representations, in a principled manner that exploits the causal structure of
how 3D shapes give rise to 2D images. Experiments demonstrate that GenRe
performs well on single-view shape reconstruction, and generalizes to diverse
novel objects from categories not seen during training.Comment: NeurIPS 2018 (Oral). The first two authors contributed equally to
this paper. Project page: http://genre.csail.mit.edu
Unsupervised Discovery of Parts, Structure, and Dynamics
Humans easily recognize object parts and their hierarchical structure by
watching how they move; they can then predict how each part moves in the
future. In this paper, we propose a novel formulation that simultaneously
learns a hierarchical, disentangled object representation and a dynamics model
for object parts from unlabeled videos. Our Parts, Structure, and Dynamics
(PSD) model learns to, first, recognize the object parts via a layered image
representation; second, predict hierarchy via a structural descriptor that
composes low-level concepts into a hierarchical structure; and third, model the
system dynamics by predicting the future. Experiments on multiple real and
synthetic datasets demonstrate that our PSD model works well on all three
tasks: segmenting object parts, building their hierarchical structure, and
capturing their motion distributions.Comment: ICLR 2019. The first two authors contributed equally to this wor
Visual Object Networks: Image Generation with Disentangled 3D Representation
Recent progress in deep generative models has led to tremendous breakthroughs
in image generation. However, while existing models can synthesize
photorealistic images, they lack an understanding of our underlying 3D world.
We present a new generative model, Visual Object Networks (VON), synthesizing
natural images of objects with a disentangled 3D representation. Inspired by
classic graphics rendering pipelines, we unravel our image formation process
into three conditionally independent factors---shape, viewpoint, and
texture---and present an end-to-end adversarial learning framework that jointly
models 3D shapes and 2D images. Our model first learns to synthesize 3D shapes
that are indistinguishable from real shapes. It then renders the object's 2.5D
sketches (i.e., silhouette and depth map) from its shape under a sampled
viewpoint. Finally, it learns to add realistic texture to these 2.5D sketches
to generate natural images. The VON not only generates images that are more
realistic than state-of-the-art 2D image synthesis methods, but also enables
many 3D operations such as changing the viewpoint of a generated image, editing
of shape and texture, linear interpolation in texture and shape space, and
transferring appearance across different objects and viewpoints.Comment: NeurIPS 2018. Code: https://github.com/junyanz/VON Website:
http://von.csail.mit.edu
Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling
We study 3D shape modeling from a single image and make contributions to it
in three aspects. First, we present Pix3D, a large-scale benchmark of diverse
image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications
in shape-related tasks including reconstruction, retrieval, viewpoint
estimation, etc. Building such a large-scale dataset, however, is highly
challenging; existing datasets either contain only synthetic data, or lack
precise alignment between 2D images and 3D shapes, or only have a small number
of images. Second, we calibrate the evaluation criteria for 3D shape
reconstruction through behavioral studies, and use them to objectively and
systematically benchmark cutting-edge reconstruction algorithms on Pix3D.
Third, we design a novel model that simultaneously performs 3D reconstruction
and pose estimation; our multi-task learning approach achieves state-of-the-art
performance on both tasks.Comment: CVPR 2018. The first two authors contributed equally to this work.
Project page: http://pix3d.csail.mit.ed
ChainQueen: A Real-Time Differentiable Physical Simulator for Soft Robotics
Physical simulators have been widely used in robot planning and control.
Among them, differentiable simulators are particularly favored, as they can be
incorporated into gradient-based optimization algorithms that are efficient in
solving inverse problems such as optimal control and motion planning.
Simulating deformable objects is, however, more challenging compared to rigid
body dynamics. The underlying physical laws of deformable objects are more
complex, and the resulting systems have orders of magnitude more degrees of
freedom and therefore they are significantly more computationally expensive to
simulate. Computing gradients with respect to physical design or controller
parameters is typically even more computationally challenging. In this paper,
we propose a real-time, differentiable hybrid Lagrangian-Eulerian physical
simulator for deformable objects, ChainQueen, based on the Moving Least Squares
Material Point Method (MLS-MPM). MLS-MPM can simulate deformable objects
including contact and can be seamlessly incorporated into inference, control
and co-design systems. We demonstrate that our simulator achieves high
precision in both forward simulation and backward gradient computation. We have
successfully employed it in a diverse set of control tasks for soft robots,
including problems with nearly 3,000 decision variables.Comment: In submission to ICRA 2019. Supplemental Video:
https://www.youtube.com/watch?v=4IWD4iGIsB4 Project Page:
https://github.com/yuanming-hu/ChainQuee
Exploiting Compositionality to Explore a Large Space of Model Structures
The recent proliferation of richly structured probabilistic models raises the question of how to automatically determine an appropriate model for a dataset. We investigate this question for a space of matrix decomposition models which can express a variety of widely used models from unsupervised learning. To enable model selection, we organize these models into a context-free grammar which generates a wide variety of structures through the compositional application of a few simple rules. We use our grammar to generically and efficiently infer latent components and estimate predictive likelihood for nearly 2500 structures using a small toolbox of reusable algorithms. Using a greedy search over our grammar, we automatically choose the decomposition structure from raw data by evaluating only a small fraction of all models. The proposed method typically finds the correct structure for synthetic data and backs off gracefully to simpler models under heavy noise. It learns sensible structures for datasets as diverse as image patches, motion capture, 20 Questions, and U.S. Senate votes, all using exactly the same code.United States. Army Research Office (ARO grant W911NF-08-1-0242)American Society for Engineering Education. National Defense Science and Engineering Graduate Fellowshi
- …