7,367 research outputs found
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
Physical intuition is pivotal for intelligent agents to perform complex
tasks. In this paper we investigate the passive acquisition of an intuitive
understanding of physical principles as well as the active utilisation of this
intuition in the context of generalised object stacking. To this end, we
provide: a simulation-based dataset featuring 20,000 stack configurations
composed of a variety of elementary geometric primitives richly annotated
regarding semantics and structural stability. We train visual classifiers for
binary stability prediction on the ShapeStacks data and scrutinise their
learned physical intuition. Due to the richness of the training data our
approach also generalises favourably to real-world scenarios achieving
state-of-the-art stability prediction on a publicly available benchmark of
block towers. We then leverage the physical intuition learned by our model to
actively construct stable stacks and observe the emergence of an intuitive
notion of stackability - an inherent object affordance - induced by the active
stacking task. Our approach performs well even in challenging conditions where
it considerably exceeds the stack height observed during training or in cases
where initially unstable structures must be stabilised via counterbalancing.Comment: revised version to appear at ECCV 201
Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes
Visually predicting the stability of block towers is a popular task in the
domain of intuitive physics. While previous work focusses on prediction
accuracy, a one-dimensional performance measure, we provide a broader analysis
of the learned physical understanding of the final model and how the learning
process can be guided. To this end, we introduce neural stethoscopes as a
general purpose framework for quantifying the degree of importance of specific
factors of influence in deep neural networks as well as for actively promoting
and suppressing information as appropriate. In doing so, we unify concepts from
multitask learning as well as training with auxiliary and adversarial losses.
We apply neural stethoscopes to analyse the state-of-the-art neural network for
stability prediction. We show that the baseline model is susceptible to being
misled by incorrect visual cues. This leads to a performance breakdown to the
level of random guessing when training on scenarios where visual cues are
inversely correlated with stability. Using stethoscopes to promote meaningful
feature extraction increases performance from 51% to 90% prediction accuracy.
Conversely, training on an easy dataset where visual cues are positively
correlated with stability, the baseline model learns a bias leading to poor
performance on a harder dataset. Using an adversarial stethoscope, the network
is successfully de-biased, leading to a performance increase from 66% to 88%
Learning Manipulation under Physics Constraints with Visual Perception
Understanding physical phenomena is a key competence that enables humans and
animals to act and interact under uncertain perception in previously unseen
environments containing novel objects and their configurations. In this work,
we consider the problem of autonomous block stacking and explore solutions to
learning manipulation under physics constraints with visual perception inherent
to the task. Inspired by the intuitive physics in humans, we first present an
end-to-end learning-based approach to predict stability directly from
appearance, contrasting a more traditional model-based approach with explicit
3D representations and physical simulation. We study the model's behavior
together with an accompanied human subject test. It is then integrated into a
real-world robotic system to guide the placement of a single wood block into
the scene without collapsing existing tower structure. To further automate the
process of consecutive blocks stacking, we present an alternative approach
where the model learns the physics constraint through the interaction with the
environment, bypassing the dedicated physics learning as in the former part of
this work. In particular, we are interested in the type of tasks that require
the agent to reach a given goal state that may be different for every new
trial. Thereby we propose a deep reinforcement learning framework that learns
policies for stacking tasks which are parametrized by a target structure.Comment: arXiv admin note: substantial text overlap with arXiv:1609.04861,
arXiv:1711.00267, arXiv:1604.0006
Learning Manipulation under Physics Constraints with Visual Perception
Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel objects and their configurations. In this work, we consider the problem of autonomous block stacking and explore solutions to learning manipulation under physics constraints with visual perception inherent to the task. Inspired by the intuitive physics in humans, we first present an end-to-end learning-based approach to predict stability directly from appearance, contrasting a more traditional model-based approach with explicit 3D representations and physical simulation. We study the model's behavior together with an accompanied human subject test. It is then integrated into a real-world robotic system to guide the placement of a single wood block into the scene without collapsing existing tower structure. To further automate the process of consecutive blocks stacking, we present an alternative approach where the model learns the physics constraint through the interaction with the environment, bypassing the dedicated physics learning as in the former part of this work. In particular, we are interested in the type of tasks that require the agent to reach a given goal state that may be different for every new trial. Thereby we propose a deep reinforcement learning framework that learns policies for stacking tasks which are parametrized by a target structure
To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction
Understanding physical phenomena is a key competence that enables humans and
animals to act and interact under uncertain perception in previously unseen
environments containing novel object and their configurations. Developmental
psychology has shown that such skills are acquired by infants from observations
at a very early stage.
In this paper, we contrast a more traditional approach of taking a
model-based route with explicit 3D representations and physical simulation by
an end-to-end approach that directly predicts stability and related quantities
from appearance. We ask the question if and to what extent and quality such a
skill can directly be acquired in a data-driven way bypassing the need for an
explicit simulation.
We present a learning-based approach based on simulated data that predicts
stability of towers comprised of wooden blocks under different conditions and
quantities related to the potential fall of the towers. The evaluation is
carried out on synthetic data and compared to human judgments on the same
stimuli
Physical Primitive Decomposition
Objects are made of parts, each with distinct geometry, physics,
functionality, and affordances. Developing such a distributed, physical,
interpretable representation of objects will facilitate intelligent agents to
better explore and interact with the world. In this paper, we study physical
primitive decomposition---understanding an object through its components, each
with physical and geometric attributes. As annotated data for object parts and
physics are rare, we propose a novel formulation that learns physical
primitives by explaining both an object's appearance and its behaviors in
physical events. Our model performs well on block towers and tools in both
synthetic and real scenarios; we also demonstrate that visual and physical
observations often provide complementary signals. We further present ablation
and behavioral studies to better understand our model and contrast it with
human performance.Comment: ECCV 2018. Project page: http://ppd.csail.mit.edu
Occlusion resistant learning of intuitive physics from videos
To reach human performance on complex tasks, a key ability for artificial
systems is to understand physical interactions between objects, and predict
future outcomes of a situation. This ability, often referred to as intuitive
physics, has recently received attention and several methods were proposed to
learn these physical rules from video sequences. Yet, most of these methods are
restricted to the case where no, or only limited, occlusions occur. In this
work we propose a probabilistic formulation of learning intuitive physics in 3D
scenes with significant inter-object occlusions. In our formulation, object
positions are modeled as latent variables enabling the reconstruction of the
scene. We then propose a series of approximations that make this problem
tractable. Object proposals are linked across frames using a combination of a
recurrent interaction network, modeling the physics in object space, and a
compositional renderer, modeling the way in which objects project onto pixel
space. We demonstrate significant improvements over state-of-the-art in the
intuitive physics benchmark of IntPhys. We apply our method to a second dataset
with increasing levels of occlusions, showing it realistically predicts
segmentation masks up to 30 frames in the future. Finally, we also show results
on predicting motion of objects in real videos
- …