24,209 research outputs found
Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects
In this paper we introduce Co-Fusion, a dense SLAM system that takes a live
stream of RGB-D images as input and segments the scene into different objects
(using either motion or semantic cues) while simultaneously tracking and
reconstructing their 3D shape in real time. We use a multiple model fitting
approach where each object can move independently from the background and still
be effectively tracked and its shape fused over time using only the information
from pixels associated with that object label. Previous attempts to deal with
dynamic scenes have typically considered moving regions as outliers, and
consequently do not model their shape or track their motion over time. In
contrast, we enable the robot to maintain 3D models for each of the segmented
objects and to improve them over time through fusion. As a result, our system
can enable a robot to maintain a scene description at the object level which
has the potential to allow interactions with its working environment; even in
the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017,
http://visual.cs.ucl.ac.uk/pubs/cofusion,
https://github.com/martinruenz/co-fusio
Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction
This paper presents a method which can track and 3D reconstruct the non-rigid
surface motion of human performance using a moving RGB-D camera. 3D
reconstruction of marker-less human performance is a challenging problem due to
the large range of articulated motions and considerable non-rigid deformations.
Current approaches use local optimization for tracking. These methods need many
iterations to converge and may get stuck in local minima during sudden
articulated movements. We propose a puppet model-based tracking approach using
skeleton prior, which provides a better initialization for tracking articulated
movements. The proposed approach uses an aligned puppet model to estimate
correct correspondences for human performance capture. We also contribute a
synthetic dataset which provides ground truth locations for frame-by-frame
geometry and skeleton joints of human subjects. Experimental results show that
our approach is more robust when faced with sudden articulated motions, and
provides better 3D reconstruction compared to the existing state-of-the-art
approaches.Comment: Accepted in DICTA 201
Non-rigid Reconstruction with a Single Moving RGB-D Camera
We present a novel non-rigid reconstruction method using a moving RGB-D
camera. Current approaches use only non-rigid part of the scene and completely
ignore the rigid background. Non-rigid parts often lack sufficient geometric
and photometric information for tracking large frame-to-frame motion. Our
approach uses camera pose estimated from the rigid background for foreground
tracking. This enables robust foreground tracking in situations where large
frame-to-frame motion occurs. Moreover, we are proposing a multi-scale
deformation graph which improves non-rigid tracking without compromising the
quality of the reconstruction. We are also contributing a synthetic dataset
which is made publically available for evaluating non-rigid reconstruction
methods. The dataset provides frame-by-frame ground truth geometry of the
scene, the camera trajectory, and masks for background foreground. Experimental
results show that our approach is more robust in handling larger frame-to-frame
motions and provides better reconstruction compared to state-of-the-art
approaches.Comment: Accepted in International Conference on Pattern Recognition (ICPR
2018
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
Articulation-aware Canonical Surface Mapping
We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that
indicates the mapping from 2D pixels to corresponding points on a canonical
template shape, and 2) inferring the articulation and pose of the template
corresponding to the input image. While previous approaches rely on keypoint
supervision for learning, we present an approach that can learn without such
annotations. Our key insight is that these tasks are geometrically related, and
we can obtain supervisory signal via enforcing consistency among the
predictions. We present results across a diverse set of animal object
categories, showing that our method can learn articulation and CSM prediction
from image collections using only foreground mask labels for training. We
empirically show that allowing articulation helps learn more accurate CSM
prediction, and that enforcing the consistency with predicted CSM is similarly
critical for learning meaningful articulation.Comment: To appear at CVPR 2020, project page
https://nileshkulkarni.github.io/acsm
Multi-body Non-rigid Structure-from-Motion
Conventional structure-from-motion (SFM) research is primarily concerned with
the 3D reconstruction of a single, rigidly moving object seen by a static
camera, or a static and rigid scene observed by a moving camera --in both cases
there are only one relative rigid motion involved. Recent progress have
extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid}
relative motions in the scene), as well as {non-rigid SFM} (where there is a
single non-rigid, deformable object or scene). Along this line of thinking,
there is apparently a missing gap of "multi-body non-rigid SFM", in which the
task would be to jointly reconstruct and segment multiple 3D structures of the
multiple, non-rigid objects or deformable scenes from images. Such a multi-body
non-rigid scenario is common in reality (e.g. two persons shaking hands,
multi-person social event), and how to solve it represents a natural
{next-step} in SFM research. By leveraging recent results of subspace
clustering, this paper proposes, for the first time, an effective framework for
multi-body NRSFM, which simultaneously reconstructs and segments each 3D
trajectory into their respective low-dimensional subspace. Under our
formulation, 3D trajectories for each non-rigid structure can be well
approximated with a sparse affine combination of other 3D trajectories from the
same structure (self-expressiveness). We solve the resultant optimization with
the alternating direction method of multipliers (ADMM). We demonstrate the
efficacy of the proposed framework through extensive experiments on both
synthetic and real data sequences. Our method clearly outperforms other
alternative methods, such as first clustering the 2D feature tracks to groups
and then doing non-rigid reconstruction in each group or first conducting 3D
reconstruction by using single subspace assumption and then clustering the 3D
trajectories into groups.Comment: 21 pages, 16 figure
- …