26,224 research outputs found
SurfelWarp: Efficient Non-Volumetric Single View Dynamic Reconstruction
We contribute a dense SLAM system that takes a live stream of depth images as
input and reconstructs non-rigid deforming scenes in real time, without
templates or prior models. In contrast to existing approaches, we do not
maintain any volumetric data structures, such as truncated signed distance
function (TSDF) fields or deformation fields, which are performance and memory
intensive. Our system works with a flat point (surfel) based representation of
geometry, which can be directly acquired from commodity depth sensors. Standard
graphics pipelines and general purpose GPU (GPGPU) computing are leveraged for
all central operations: i.e., nearest neighbor maintenance, non-rigid
deformation field estimation and fusion of depth measurements. Our pipeline
inherently avoids expensive volumetric operations such as marching cubes,
volumetric fusion and dense deformation field update, leading to significantly
improved performance. Furthermore, the explicit and flexible surfel based
geometry representation enables efficient tackling of topology changes and
tracking failures, which makes our reconstructions consistent with updated
depth observations. Our system allows robots to maintain a scene description
with non-rigidly deformed objects that potentially enables interactions with
dynamic working environments.Comment: RSS 2018. The video and source code are available on
https://sites.google.com/view/surfelwarp/hom
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
XNect: Real-time Multi-Person 3D Motion Capture with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30
fps using a single RGB camera. It operates successfully in generic scenes which
may contain occlusions by objects and by other people. Our method operates in
subsequent stages. The first stage is a convolutional neural network (CNN) that
estimates 2D and 3D pose features along with identity assignments for all
visible joints of all individuals.We contribute a new architecture for this
CNN, called SelecSLS Net, that uses novel selective long and short range skip
connections to improve the information flow allowing for a drastically faster
network without compromising accuracy. In the second stage, a fully connected
neural network turns the possibly partial (on account of occlusion) 2Dpose and
3Dpose features for each subject into a complete 3Dpose estimate per
individual. The third stage applies space-time skeletal model fitting to the
predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose,
and enforce temporal coherence. Our method returns the full skeletal pose in
joint angles for each subject. This is a further key distinction from previous
work that do not produce joint angle results of a coherent skeleton in real
time for multi-person scenes. The proposed system runs on consumer hardware at
a previously unseen speed of more than 30 fps given 512x320 images as input
while achieving state-of-the-art accuracy, which we will demonstrate on a range
of challenging real-world scenes.Comment: To appear in ACM Transactions on Graphics (SIGGRAPH) 202
XNect: Real-time Multi-person 3D Human Pose Estimation with a Single RGB Camera
We present a real-time approach for multi-person 3D motion capture at over 30 fps using a single RGB camera. It operates in generic scenes and is robust to difficult occlusions both by other people and objects. Our method operates in subsequent stages. The first stage is a convolutional neural network (CNN) that estimates 2D and 3D pose features along with identity assignments for all visible joints of all individuals. We contribute a new architecture for this CNN, called SelecSLS Net, that uses novel selective long and short range skip connections to improve the information flow allowing for a drastically faster network without compromising accuracy. In the second stage, a fully-connected neural network turns the possibly partial (on account of occlusion) 2D pose and 3D pose features for each subject into a complete 3D pose estimate per individual. The third stage applies space-time skeletal model fitting to the predicted 2D and 3D pose per subject to further reconcile the 2D and 3D pose, and enforce temporal coherence. Our method returns the full skeletal pose in joint angles for each subject. This is a further key distinction from previous work that neither extracted global body positions nor joint angle results of a coherent skeleton in real time for multi-person scenes. The proposed system runs on consumer hardware at a previously unseen speed of more than 30 fps given 512x320 images as input while achieving state-of-the-art accuracy, which we will demonstrate on a range of challenging real-world scenes
Shape Animation with Combined Captured and Simulated Dynamics
We present a novel volumetric animation generation framework to create new
types of animations from raw 3D surface or point cloud sequence of captured
real performances. The framework considers as input time incoherent 3D
observations of a moving shape, and is thus particularly suitable for the
output of performance capture platforms. In our system, a suitable virtual
representation of the actor is built from real captures that allows seamless
combination and simulation with virtual external forces and objects, in which
the original captured actor can be reshaped, disassembled or reassembled from
user-specified virtual physics. Instead of using the dominant surface-based
geometric representation of the capture, which is less suitable for volumetric
effects, our pipeline exploits Centroidal Voronoi tessellation decompositions
as unified volumetric representation of the real captured actor, which we show
can be used seamlessly as a building block for all processing stages, from
capture and tracking to virtual physic simulation. The representation makes no
human specific assumption and can be used to capture and re-simulate the actor
with props or other moving scenery elements. We demonstrate the potential of
this pipeline for virtual reanimation of a real captured event with various
unprecedented volumetric visual effects, such as volumetric distortion,
erosion, morphing, gravity pull, or collisions
Wing and body motion during flight initiation in Drosophila revealed by automated visual tracking
The fruit fly Drosophila melanogaster is a widely used model organism in studies of genetics, developmental biology and biomechanics. One limitation for exploiting Drosophila as a model system for behavioral neurobiology is that measuring body kinematics during behavior is labor intensive and subjective. In order to quantify flight kinematics during different types of maneuvers, we have developed a visual tracking system that estimates the posture of the fly from multiple calibrated cameras. An accurate geometric fly model is designed using unit quaternions to capture complex body and wing rotations, which are automatically fitted to the images in each time frame. Our approach works across a range of flight behaviors, while also being robust to common environmental clutter. The tracking system is used in this paper to compare wing and body motion during both voluntary and escape take-offs. Using our automated algorithms, we are able to measure stroke amplitude, geometric angle of attack and other parameters important to a mechanistic understanding of flapping flight. When compared with manual tracking methods, the algorithm estimates body position within 4.4±1.3% of the body length, while body orientation is measured within 6.5±1.9 deg. (roll), 3.2±1.3 deg. (pitch) and 3.4±1.6 deg. (yaw) on average across six videos. Similarly, stroke amplitude and deviation are estimated within 3.3 deg. and 2.1 deg., while angle of attack is typically measured within 8.8 deg. comparing against a human digitizer. Using our automated tracker, we analyzed a total of eight voluntary and two escape take-offs. These sequences show that Drosophila melanogaster do not utilize clap and fling during take-off and are able to modify their wing kinematics from one wingstroke to the next. Our approach should enable biomechanists and ethologists to process much larger datasets than possible at present and, therefore, accelerate insight into the mechanisms of free-flight maneuvers of flying insects
- …