8,790 research outputs found
Learning Manipulation under Physics Constraints with Visual Perception
Understanding physical phenomena is a key competence that enables humans and
animals to act and interact under uncertain perception in previously unseen
environments containing novel objects and their configurations. In this work,
we consider the problem of autonomous block stacking and explore solutions to
learning manipulation under physics constraints with visual perception inherent
to the task. Inspired by the intuitive physics in humans, we first present an
end-to-end learning-based approach to predict stability directly from
appearance, contrasting a more traditional model-based approach with explicit
3D representations and physical simulation. We study the model's behavior
together with an accompanied human subject test. It is then integrated into a
real-world robotic system to guide the placement of a single wood block into
the scene without collapsing existing tower structure. To further automate the
process of consecutive blocks stacking, we present an alternative approach
where the model learns the physics constraint through the interaction with the
environment, bypassing the dedicated physics learning as in the former part of
this work. In particular, we are interested in the type of tasks that require
the agent to reach a given goal state that may be different for every new
trial. Thereby we propose a deep reinforcement learning framework that learns
policies for stacking tasks which are parametrized by a target structure.Comment: arXiv admin note: substantial text overlap with arXiv:1609.04861,
arXiv:1711.00267, arXiv:1604.0006
Learning Manipulation under Physics Constraints with Visual Perception
Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel objects and their configurations. In this work, we consider the problem of autonomous block stacking and explore solutions to learning manipulation under physics constraints with visual perception inherent to the task. Inspired by the intuitive physics in humans, we first present an end-to-end learning-based approach to predict stability directly from appearance, contrasting a more traditional model-based approach with explicit 3D representations and physical simulation. We study the model's behavior together with an accompanied human subject test. It is then integrated into a real-world robotic system to guide the placement of a single wood block into the scene without collapsing existing tower structure. To further automate the process of consecutive blocks stacking, we present an alternative approach where the model learns the physics constraint through the interaction with the environment, bypassing the dedicated physics learning as in the former part of this work. In particular, we are interested in the type of tasks that require the agent to reach a given goal state that may be different for every new trial. Thereby we propose a deep reinforcement learning framework that learns policies for stacking tasks which are parametrized by a target structure
Optimal Camera Placement to measure Distances Conservativly Regarding Static and Dynamic Obstacles
In modern production facilities industrial robots and humans are supposed to
interact sharing a common working area. In order to avoid collisions, the
distances between objects need to be measured conservatively which can be done
by a camera network. To estimate the acquired distance, unmodelled objects,
e.g., an interacting human, need to be modelled and distinguished from
premodelled objects like workbenches or robots by image processing such as the
background subtraction method.
The quality of such an approach massively depends on the settings of the
camera network, that is the positions and orientations of the individual
cameras. Of particular interest in this context is the minimization of the
error of the distance using the objects modelled by the background subtraction
method instead of the real objects. Here, we show how this minimization can be
formulated as an abstract optimization problem. Moreover, we state various
aspects on the implementation as well as reasons for the selection of a
suitable optimization method, analyze the complexity of the proposed method and
present a basic version used for extensive experiments.Comment: 9 pages, 10 figure
Event-Based Motion Segmentation by Motion Compensation
In contrast to traditional cameras, whose pixels have a common exposure time,
event-based cameras are novel bio-inspired sensors whose pixels work
independently and asynchronously output intensity changes (called "events"),
with microsecond resolution. Since events are caused by the apparent motion of
objects, event-based cameras sample visual information based on the scene
dynamics and are, therefore, a more natural fit than traditional cameras to
acquire motion, especially at high speeds, where traditional cameras suffer
from motion blur. However, distinguishing between events caused by different
moving objects and by the camera's ego-motion is a challenging task. We present
the first per-event segmentation method for splitting a scene into
independently moving objects. Our method jointly estimates the event-object
associations (i.e., segmentation) and the motion parameters of the objects (or
the background) by maximization of an objective function, which builds upon
recent results on event-based motion-compensation. We provide a thorough
evaluation of our method on a public dataset, outperforming the
state-of-the-art by as much as 10%. We also show the first quantitative
evaluation of a segmentation algorithm for event cameras, yielding around 90%
accuracy at 4 pixels relative displacement.Comment: When viewed in Acrobat Reader, several of the figures animate. Video:
https://youtu.be/0q6ap_OSBA
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
- …