5,689 research outputs found
Non-rigid Reconstruction with a Single Moving RGB-D Camera
We present a novel non-rigid reconstruction method using a moving RGB-D
camera. Current approaches use only non-rigid part of the scene and completely
ignore the rigid background. Non-rigid parts often lack sufficient geometric
and photometric information for tracking large frame-to-frame motion. Our
approach uses camera pose estimated from the rigid background for foreground
tracking. This enables robust foreground tracking in situations where large
frame-to-frame motion occurs. Moreover, we are proposing a multi-scale
deformation graph which improves non-rigid tracking without compromising the
quality of the reconstruction. We are also contributing a synthetic dataset
which is made publically available for evaluating non-rigid reconstruction
methods. The dataset provides frame-by-frame ground truth geometry of the
scene, the camera trajectory, and masks for background foreground. Experimental
results show that our approach is more robust in handling larger frame-to-frame
motions and provides better reconstruction compared to state-of-the-art
approaches.Comment: Accepted in International Conference on Pattern Recognition (ICPR
2018
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Motion Cooperation: Smooth Piece-Wise Rigid Scene Flow from RGB-D Images
We propose a novel joint registration and segmentation approach to estimate scene flow from RGB-D images. Instead of assuming the scene to be composed of a number of independent rigidly-moving parts, we use non-binary labels to capture non-rigid deformations at transitions between
the rigid parts of the scene. Thus, the velocity of any point can be computed as a linear combination (interpolation) of the estimated rigid motions, which provides better results
than traditional sharp piecewise segmentations. Within a variational framework, the smooth segments of the scene and their corresponding rigid velocities are alternately refined
until convergence. A K-means-based segmentation is employed as an initialization, and the number of regions is subsequently adapted during the optimization process to capture any arbitrary number of independently moving objects.
We evaluate our approach with both synthetic and
real RGB-D images that contain varied and large motions. The experiments show that our method estimates the scene flow more accurately than the most recent works in the field, and at the same time provides a meaningful segmentation of the scene based on 3D motion.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Spanish Government under the grant programs FPI-MICINN 2012 and DPI2014- 55826-R (co-founded by the European Regional Development Fund), as well as by the EU ERC grant Convex Vision (grant agreement no. 240168)
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Multiframe Scene Flow with Piecewise Rigid Motion
We introduce a novel multiframe scene flow approach that jointly optimizes
the consistency of the patch appearances and their local rigid motions from
RGB-D image sequences. In contrast to the competing methods, we take advantage
of an oversegmentation of the reference frame and robust optimization
techniques. We formulate scene flow recovery as a global non-linear least
squares problem which is iteratively solved by a damped Gauss-Newton approach.
As a result, we obtain a qualitatively new level of accuracy in RGB-D based
scene flow estimation which can potentially run in real-time. Our method can
handle challenging cases with rigid, piecewise rigid, articulated and moderate
non-rigid motion, and does not rely on prior knowledge about the types of
motions and deformations. Extensive experiments on synthetic and real data show
that our method outperforms state-of-the-art.Comment: International Conference on 3D Vision (3DV), Qingdao, China, October
201
Multiframe Scene Flow with Piecewise Rigid Motion
We introduce a novel multiframe scene flow approach that jointly optimizes
the consistency of the patch appearances and their local rigid motions from
RGB-D image sequences. In contrast to the competing methods, we take advantage
of an oversegmentation of the reference frame and robust optimization
techniques. We formulate scene flow recovery as a global non-linear least
squares problem which is iteratively solved by a damped Gauss-Newton approach.
As a result, we obtain a qualitatively new level of accuracy in RGB-D based
scene flow estimation which can potentially run in real-time. Our method can
handle challenging cases with rigid, piecewise rigid, articulated and moderate
non-rigid motion, and does not rely on prior knowledge about the types of
motions and deformations. Extensive experiments on synthetic and real data show
that our method outperforms state-of-the-art.Comment: International Conference on 3D Vision (3DV), Qingdao, China, October
201
Fast Multi-frame Stereo Scene Flow with Motion Segmentation
We propose a new multi-frame method for efficiently computing scene flow
(dense depth and optical flow) and camera ego-motion for a dynamic scene
observed from a moving stereo camera rig. Our technique also segments out
moving objects from the rigid scene. In our method, we first estimate the
disparity map and the 6-DOF camera motion using stereo matching and visual
odometry. We then identify regions inconsistent with the estimated camera
motion and compute per-pixel optical flow only at these regions. This flow
proposal is fused with the camera motion-based flow proposal using fusion moves
to obtain the final optical flow and motion segmentation. This unified
framework benefits all four tasks - stereo, optical flow, visual odometry and
motion segmentation leading to overall higher accuracy and efficiency. Our
method is currently ranked third on the KITTI 2015 scene flow benchmark.
Furthermore, our CPU implementation runs in 2-3 seconds per frame which is 1-3
orders of magnitude faster than the top six methods. We also report a thorough
evaluation on challenging Sintel sequences with fast camera and object motion,
where our method consistently outperforms OSF [Menze and Geiger, 2015], which
is currently ranked second on the KITTI benchmark.Comment: 15 pages. To appear at IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2017). Our results were submitted to KITTI 2015 Stereo
Scene Flow Benchmark in November 201
- …