723 research outputs found

    Hi4D: 4D Instance Segmentation of Close Human Interaction

    Full text link
    We propose Hi4D, a method and dataset for the automatic analysis of physically close human-human interaction under prolonged contact. Robustly disentangling several in-contact subjects is a challenging task due to occlusions and complex shapes. Hence, existing multi-view systems typically fuse 3D surfaces of close subjects into a single, connected mesh. To address this issue we leverage i) individually fitted neural implicit avatars; ii) an alternating optimization scheme that refines pose and surface through periods of close proximity; and iii) thus segment the fused raw scans into individual instances. From these instances we compile Hi4D dataset of 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction-centric annotations in 2D and 3D alongside accurately registered parametric body models. We define varied human pose and shape estimation tasks on this dataset and provide results from state-of-the-art methods on these benchmarks.Comment: Project page: https://yifeiyin04.github.io/Hi4D

    BodyNet: Volumetric Inference of 3D Human Body Shapes

    Get PDF
    Human shape estimation is an important task for video editing, animation and fashion industry. Predicting 3D human body shape from natural images, however, is highly challenging due to factors such as variation in human bodies, clothing and viewpoint. Prior methods addressing this problem typically attempt to fit parametric body models with certain priors on pose and shape. In this work we argue for an alternative representation and propose BodyNet, a neural network for direct inference of volumetric body shape from a single image. BodyNet is an end-to-end trainable network that benefits from (i) a volumetric 3D loss, (ii) a multi-view re-projection loss, and (iii) intermediate supervision of 2D pose, 2D body part segmentation, and 3D pose. Each of them results in performance improvement as demonstrated by our experiments. To evaluate the method, we fit the SMPL model to our network output and show state-of-the-art results on the SURREAL and Unite the People datasets, outperforming recent approaches. Besides achieving state-of-the-art performance, our method also enables volumetric body-part segmentation.Comment: Appears in: European Conference on Computer Vision 2018 (ECCV 2018). 27 page

    08291 Abstracts Collection -- Statistical and Geometrical Approaches to Visual Motion Analysis

    Get PDF
    From 13.07.2008 to 18.07.2008, the Dagstuhl Seminar 08291 ``Statistical and Geometrical Approaches to Visual Motion Analysis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general

    Human detection, tracking and segmentation from low-level to high-level vision

    Get PDF
    The goal of this research is to detect, segment and track a human body as well as estimate its limb configuration from cluttered background. These are fundamental research issues that have attracted intensive attention in the computer vision community because of their wide applications. Meanwhile they also remain to be ones of the most challenging research issues largely due to the ubiquitous visual ambiguities in images/videos. The other challenging factor is the ill-posed nature of the problems. Inspired by the recent findings in cognitive psychology, we adopt several biologically plausible approaches to attack these challenging problems. This dissertation provides a comprehensive study of human detection, tracking and segmentation that covers several research issues ranging from low, middle, and high-level vision.In low-level vision, we investigate video segmentation where the main challenge is the non-convex classification problem, and we develop a cascaded multi-layer segmentation framework where no-convex classification problems are addressed in a split-and-merge paradigm by combining merits of both statistical modeling and graph theory.In middle-level vision, we propose a segmentation based hypothesis-and-test paradigm to achieve joint localization and segmentation that exploits the complementary nature of region-based and edge-based shape priors. In addition, we integrate both priors into a Graph-cut framework to improve the segmentation results.In high-level vision, our research has two related parts. First, we propose a hybrid body representation that embraces part-whole shape priors and part-based spatial prior for integrated pose recognition, localization and segmentation in a given image. Second, we further combine spatial and temporal priors in an integrated online learning and inference framework, where body parts can be detected, localized and segmented simultaneously from a video sequence. Both of them are supported by previous low-level and mid-level vision tasks.Experimental results show that the proposed algorithms can achieve accurate and robust tracking, localization and segmentation results for different walking subjects with significant appearance and motion variability and under cluttered background

    Learning Dense 3D Models from Monocular Video

    Get PDF
    Reconstructing dense, detailed, 3D shape of dynamic scenes from monocular sequences is a challenging problem in computer vision. While robust and even real-time solutions exist to this problem if the observed scene is static, for non-rigid dense shape capture current systems are typically restricted to the use of complex multi-camera rigs, taking advantage of the additional depth channel available in RGB-D cameras, or dealing with specific shapes such as faces or planar surfaces. In this thesis, we present two pieces of work for reconstructing dense generic shapes from monocular sequences. In the first work, we propose an unsupervised approach to the challenging problem of simultaneously segmenting the scene into its constituent objects and reconstructing a 3D model of the scene. The strength of our approach comes from the ability to deal with real-world dynamic scenes and to handle seamlessly different types of motion: rigid, articulated and non-rigid. We formulate the problem as a hierarchical graph-cuts based segmentation where we decompose the whole scene into background and foreground objects and model the complex motion of non-rigid or articulated objects as a set of overlapping rigid parts. To validate the capability of our approach to deal with real-world scenes, we provide 3D reconstructions of some challenging videos from the YouTube Objects and KITTI dataset, etc. In the second work, we propose a direct approach for capturing the dense, detailed 3D geometry of generic, complex non-rigid meshes using a single camera. Our method makes use of a single RGB video as input; it can capture the deformations of generic shapes; and the depth estimation is dense, per-pixel and direct. We first reconstruct a dense 3D template of the shape of the object, using a short rigid sequence, and subsequently perform online reconstruction of the non-rigid mesh as it evolves over time. In our experimental evaluation, we show a range of qualitative results on novel datasets and quantitative comparison results with stereo reconstruction
    • …
    corecore