24,209 research outputs found

    Co-Fusion: Real-time Segmentation, Tracking and Fusion of Multiple Objects

    Get PDF
    In this paper we introduce Co-Fusion, a dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects (using either motion or semantic cues) while simultaneously tracking and reconstructing their 3D shape in real time. We use a multiple model fitting approach where each object can move independently from the background and still be effectively tracked and its shape fused over time using only the information from pixels associated with that object label. Previous attempts to deal with dynamic scenes have typically considered moving regions as outliers, and consequently do not model their shape or track their motion over time. In contrast, we enable the robot to maintain 3D models for each of the segmented objects and to improve them over time through fusion. As a result, our system can enable a robot to maintain a scene description at the object level which has the potential to allow interactions with its working environment; even in the case of dynamic scenes.Comment: International Conference on Robotics and Automation (ICRA) 2017, http://visual.cs.ucl.ac.uk/pubs/cofusion, https://github.com/martinruenz/co-fusio

    Skeleton Driven Non-rigid Motion Tracking and 3D Reconstruction

    Full text link
    This paper presents a method which can track and 3D reconstruct the non-rigid surface motion of human performance using a moving RGB-D camera. 3D reconstruction of marker-less human performance is a challenging problem due to the large range of articulated motions and considerable non-rigid deformations. Current approaches use local optimization for tracking. These methods need many iterations to converge and may get stuck in local minima during sudden articulated movements. We propose a puppet model-based tracking approach using skeleton prior, which provides a better initialization for tracking articulated movements. The proposed approach uses an aligned puppet model to estimate correct correspondences for human performance capture. We also contribute a synthetic dataset which provides ground truth locations for frame-by-frame geometry and skeleton joints of human subjects. Experimental results show that our approach is more robust when faced with sudden articulated motions, and provides better 3D reconstruction compared to the existing state-of-the-art approaches.Comment: Accepted in DICTA 201

    Non-rigid Reconstruction with a Single Moving RGB-D Camera

    Full text link
    We present a novel non-rigid reconstruction method using a moving RGB-D camera. Current approaches use only non-rigid part of the scene and completely ignore the rigid background. Non-rigid parts often lack sufficient geometric and photometric information for tracking large frame-to-frame motion. Our approach uses camera pose estimated from the rigid background for foreground tracking. This enables robust foreground tracking in situations where large frame-to-frame motion occurs. Moreover, we are proposing a multi-scale deformation graph which improves non-rigid tracking without compromising the quality of the reconstruction. We are also contributing a synthetic dataset which is made publically available for evaluating non-rigid reconstruction methods. The dataset provides frame-by-frame ground truth geometry of the scene, the camera trajectory, and masks for background foreground. Experimental results show that our approach is more robust in handling larger frame-to-frame motions and provides better reconstruction compared to state-of-the-art approaches.Comment: Accepted in International Conference on Pattern Recognition (ICPR 2018

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201

    Articulation-aware Canonical Surface Mapping

    Full text link
    We tackle the tasks of: 1) predicting a Canonical Surface Mapping (CSM) that indicates the mapping from 2D pixels to corresponding points on a canonical template shape, and 2) inferring the articulation and pose of the template corresponding to the input image. While previous approaches rely on keypoint supervision for learning, we present an approach that can learn without such annotations. Our key insight is that these tasks are geometrically related, and we can obtain supervisory signal via enforcing consistency among the predictions. We present results across a diverse set of animal object categories, showing that our method can learn articulation and CSM prediction from image collections using only foreground mask labels for training. We empirically show that allowing articulation helps learn more accurate CSM prediction, and that enforcing the consistency with predicted CSM is similarly critical for learning meaningful articulation.Comment: To appear at CVPR 2020, project page https://nileshkulkarni.github.io/acsm

    Multi-body Non-rigid Structure-from-Motion

    Get PDF
    Conventional structure-from-motion (SFM) research is primarily concerned with the 3D reconstruction of a single, rigidly moving object seen by a static camera, or a static and rigid scene observed by a moving camera --in both cases there are only one relative rigid motion involved. Recent progress have extended SFM to the areas of {multi-body SFM} (where there are {multiple rigid} relative motions in the scene), as well as {non-rigid SFM} (where there is a single non-rigid, deformable object or scene). Along this line of thinking, there is apparently a missing gap of "multi-body non-rigid SFM", in which the task would be to jointly reconstruct and segment multiple 3D structures of the multiple, non-rigid objects or deformable scenes from images. Such a multi-body non-rigid scenario is common in reality (e.g. two persons shaking hands, multi-person social event), and how to solve it represents a natural {next-step} in SFM research. By leveraging recent results of subspace clustering, this paper proposes, for the first time, an effective framework for multi-body NRSFM, which simultaneously reconstructs and segments each 3D trajectory into their respective low-dimensional subspace. Under our formulation, 3D trajectories for each non-rigid structure can be well approximated with a sparse affine combination of other 3D trajectories from the same structure (self-expressiveness). We solve the resultant optimization with the alternating direction method of multipliers (ADMM). We demonstrate the efficacy of the proposed framework through extensive experiments on both synthetic and real data sequences. Our method clearly outperforms other alternative methods, such as first clustering the 2D feature tracks to groups and then doing non-rigid reconstruction in each group or first conducting 3D reconstruction by using single subspace assumption and then clustering the 3D trajectories into groups.Comment: 21 pages, 16 figure
    • …
    corecore