2,798 research outputs found

    LiveCap: Real-time Human Performance Capture from Monocular Video

    Full text link
    We present the first real-time human performance capture approach that reconstructs dense, space-time coherent deforming geometry of entire humans in general everyday clothing from just a single RGB video. We propose a novel two-stage analysis-by-synthesis optimization whose formulation and implementation are designed for high performance. In the first stage, a skinned template model is jointly fitted to background subtracted input video, 2D and 3D skeleton joint positions found using a deep neural network, and a set of sparse facial landmark detections. In the second stage, dense non-rigid 3D deformations of skin and even loose apparel are captured based on a novel real-time capable algorithm for non-rigid tracking using dense photometric and silhouette constraints. Our novel energy formulation leverages automatically identified material regions on the template to model the differing non-rigid deformation behavior of skin and apparel. The two resulting non-linear optimization problems per-frame are solved with specially-tailored data-parallel Gauss-Newton solvers. In order to achieve real-time performance of over 25Hz, we design a pipelined parallel architecture using the CPU and two commodity GPUs. Our method is the first real-time monocular approach for full-body performance capture. Our method yields comparable accuracy with off-line performance capture techniques, while being orders of magnitude faster

    Statistical Model of Shape Moments with Active Contour Evolution for Shape Detection and Segmentation

    Get PDF
    This paper describes a novel method for shape representation and robust image segmentation. The proposed method combines two well known methodologies, namely, statistical shape models and active contours implemented in level set framework. The shape detection is achieved by maximizing a posterior function that consists of a prior shape probability model and image likelihood function conditioned on shapes. The statistical shape model is built as a result of a learning process based on nonparametric probability estimation in a PCA reduced feature space formed by the Legendre moments of training silhouette images. A greedy strategy is applied to optimize the proposed cost function by iteratively evolving an implicit active contour in the image space and subsequent constrained optimization of the evolved shape in the reduced shape feature space. Experimental results presented in the paper demonstrate that the proposed method, contrary to many other active contour segmentation methods, is highly resilient to severe random and structural noise that could be present in the data

    Review of Person Re-identification Techniques

    Full text link
    Person re-identification across different surveillance cameras with disjoint fields of view has become one of the most interesting and challenging subjects in the area of intelligent video surveillance. Although several methods have been developed and proposed, certain limitations and unresolved issues remain. In all of the existing re-identification approaches, feature vectors are extracted from segmented still images or video frames. Different similarity or dissimilarity measures have been applied to these vectors. Some methods have used simple constant metrics, whereas others have utilised models to obtain optimised metrics. Some have created models based on local colour or texture information, and others have built models based on the gait of people. In general, the main objective of all these approaches is to achieve a higher-accuracy rate and lowercomputational costs. This study summarises several developments in recent literature and discusses the various available methods used in person re-identification. Specifically, their advantages and disadvantages are mentioned and compared.Comment: Published 201

    A survey on 2d object tracking in digital video

    Get PDF
    This paper presents object tracking methods in video.Different algorithms based on rigid, non rigid and articulated object tracking are studied. The goal of this article is to review the state-of-the-art tracking methods, classify them into different categories, and identify new trends.It is often the case that tracking objects in consecutive frames is supported by a prediction scheme. Based on information extracted from previous frames and any high level information that can be obtained, the state (location) of the object is predicted.An excellent framework for prediction is kalman filter, which additionally estimates prediction error.In complex scenes, instead of single hypothesis, multiple hypotheses using Particle filter can be used.Different techniques are given for different types of constraints in video

    Monocular Pose Estimation Based on Global and Local Features

    Get PDF
    The presented thesis work deals with several mathematical and practical aspects of the monocular pose estimation problem. Pose estimation means to estimate the position and orientation of a model object with respect to a camera used as a sensor element. Three main aspects of the pose estimation problem are considered. These are the model representations, correspondence search and pose computation. Free-form contours and surfaces are considered for the approaches presented in this work. The pose estimation problem and the global representation of free-form contours and surfaces are defined in the mathematical framework of the conformal geometric algebra (CGA), which allows a compact and linear modeling of the monocular pose estimation scenario. Additionally, a new local representation of these entities is presented which is also defined in CGA. Furthermore, it allows the extraction of local feature information of these models in 3D space and in the image plane. This local information is combined with the global contour information obtained from the global representations in order to improve the pose estimation algorithms. The main contribution of this work is the introduction of new variants of the iterative closest point (ICP) algorithm based on the combination of local and global features. Sets of compatible model and image features are obtained from the proposed local model representation of free-form contours. This allows to translate the correspondence search problem onto the image plane and to use the feature information to develop new correspondence search criteria. The structural ICP algorithm is defined as a variant of the classical ICP algorithm with additional model and image structural constraints. Initially, this new variant is applied to planar 3D free-form contours. Then, the feature extraction process is adapted to the case of free-form surfaces. This allows to define the correlation ICP algorithm for free-form surfaces. In this case, the minimal Euclidean distance criterion is replaced by a feature correlation measure. The addition of structural information in the search process results in better conditioned correspondences and therefore in a better computed pose. Furthermore, global information (position and orientation) is used in combination with the correlation ICP to simplify and improve the pre-alignment approaches for the monocular pose estimation. Finally, all the presented approaches are combined to handle the pose estimation of surfaces when partial occlusions are present in the image. Experiments made on synthetic and real data are presented to demonstrate the robustness and behavior of the new ICP variants in comparison with standard approaches

    MonoPerfCap: Human Performance Capture from Monocular Video

    Full text link
    We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid surface deformations in general scenes. Human performance capture is a challenging problem due to the large range of articulation, potentially fast motion, and considerable non-rigid deformations, even from multi-view data. Reconstruction from monocular video alone is drastically more challenging, since strong occlusions and the inherent depth ambiguity lead to a highly ill-posed reconstruction problem. We tackle these challenges by a novel approach that employs sparse 2D and 3D human pose detections from a convolutional neural network using a batch-based pose estimation strategy. Joint recovery of per-batch motion allows to resolve the ambiguities of the monocular reconstruction problem based on a low dimensional trajectory subspace. In addition, we propose refinement of the surface geometry based on fully automatically extracted silhouettes to enable medium-scale non-rigid alignment. We demonstrate state-of-the-art performance capture results that enable exciting applications such as video editing and free viewpoint video, previously infeasible from monocular video. Our qualitative and quantitative evaluation demonstrates that our approach significantly outperforms previous monocular methods in terms of accuracy, robustness and scene complexity that can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
    corecore