553 research outputs found

    Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

    Full text link
    To facilitate the analysis of human actions, interactions and emotions, we compute a 3D model of human body pose, hand pose, and facial expression from a single monocular image. To achieve this, we use thousands of 3D scans to train a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with fully articulated hands and an expressive face. Learning to regress the parameters of SMPL-X directly from images is challenging without paired images and 3D ground truth. Consequently, we follow the approach of SMPLify, which estimates 2D features and then optimizes model parameters to fit the features. We improve on SMPLify in several significant ways: (1) we detect 2D features corresponding to the face, hands, and feet and fit the full SMPL-X model to these; (2) we train a new neural network pose prior using a large MoCap dataset; (3) we define a new interpenetration penalty that is both fast and accurate; (4) we automatically detect gender and the appropriate body models (male, female, or neutral); (5) our PyTorch implementation achieves a speedup of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to both controlled images and images in the wild. We evaluate 3D accuracy on a new curated dataset comprising 100 images with pseudo ground-truth. This is a step towards automatic expressive human capture from monocular RGB data. The models, code, and data are available for research purposes at https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201

    End-to-end Recovery of Human Shape and Pose

    Full text link
    We describe Human Mesh Recovery (HMR), an end-to-end framework for reconstructing a full 3D mesh of a human body from a single RGB image. In contrast to most current methods that compute 2D or 3D joint locations, we produce a richer and more useful mesh representation that is parameterized by shape and 3D joint angles. The main objective is to minimize the reprojection loss of keypoints, which allow our model to be trained using images in-the-wild that only have ground truth 2D annotations. However, the reprojection loss alone leaves the model highly under constrained. In this work we address this problem by introducing an adversary trained to tell whether a human body parameter is real or not using a large database of 3D human meshes. We show that HMR can be trained with and without using any paired 2D-to-3D supervision. We do not rely on intermediate 2D keypoint detections and infer 3D pose and shape parameters directly from image pixels. Our model runs in real-time given a bounding box containing the person. We demonstrate our approach on various images in-the-wild and out-perform previous optimization based methods that output 3D meshes and show competitive results on tasks such as 3D joint location estimation and part segmentation.Comment: CVPR 2018, Project page with code: https://akanazawa.github.io/hmr

    Adjustable Method Based on Body Parts for Improving the Accuracy of 3D Reconstruction in Visually Important Body Parts from Silhouettes

    Full text link
    This research proposes a novel adjustable algorithm for reconstructing 3D body shapes from front and side silhouettes. Most recent silhouette-based approaches use a deep neural network trained by silhouettes and key points to estimate the shape parameters but cannot accurately fit the model to the body contours and consequently are struggling to cover detailed body geometry, especially in the torso. In addition, in most of these cases, body parts have the same accuracy priority, making the optimization harder and avoiding reaching the optimum possible result in essential body parts, like the torso, which is visually important in most applications, such as virtual garment fitting. In the proposed method, we can adjust the expected accuracy for each body part based on our purpose by assigning coefficients for the distance of each body part between the projected 3D body and 2D silhouettes. To measure this distance, we first recognize the correspondent body parts using body segmentation in both views. Then, we align individual body parts by 2D rigid registration and match them using pairwise matching. The objective function tries to minimize the distance cost for the individual body parts in both views based on distances and coefficients by optimizing the statistical model parameters. We also handle the slight variation in the degree of arms and limbs by matching the pose. We evaluate the proposed method with synthetic body meshes from the normalized S-SCAPE. The result shows that the algorithm can more accurately reconstruct visually important body parts with high coefficients.Comment: 16 pages, 17 image

    Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models

    No full text
    Currently there is no complete face recognition system that is invariant to all facial expressions. Although humans find it easy to identify and recognise faces regardless of changes in illumination, pose and expression, producing a computer system with a similar capability has proved to be particularly di cult. Three dimensional face models are geometric in nature and therefore have the advantage of being invariant to head pose and lighting. However they are still susceptible to facial expressions. This can be seen in the decrease in the recognition results using principal component analysis when expressions are added to a data set. In order to achieve expression-invariant face recognition systems, we have employed a tensor algebra framework to represent 3D face data with facial expressions in a parsimonious space. Face variation factors are organised in particular subject and facial expression modes. We manipulate this using single value decomposition on sub-tensors representing one variation mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained environments and still preserves the integrity of the 3D data. The results show improved recognition rates for faces and facial expressions, even recognising high intensity expressions that are not in the training datasets. We have determined, experimentally, a set of anatomical landmarks that best describe facial expression e ectively. We found that the best placement of landmarks to distinguish di erent facial expressions are in areas around the prominent features, such as the cheeks and eyebrows. Recognition results using landmark-based face recognition could be improved with better placement. We looked into the possibility of achieving expression-invariant face recognition by reconstructing and manipulating realistic facial expressions. We proposed a tensor-based statistical discriminant analysis method to reconstruct facial expressions and in particular to neutralise facial expressions. The results of the synthesised facial expressions are visually more realistic than facial expressions generated using conventional active shape modelling (ASM). We then used reconstructed neutral faces in the sub-tensor framework for recognition purposes. The recognition results showed slight improvement. Besides biometric recognition, this novel tensor-based synthesis approach could be used in computer games and real-time animation applications

    Silhouette Body Measurement Benchmarks

    Get PDF
    Anthropometric body measurements are importantfor industrial design, garment fitting, medical diagnosis andergonomics. A number of methods have been proposed toestimate the body measurements from images, but progress hasbeen slow due to the lack of realistic and publicly availabledatasets. The existing works train and test on silhouettes of3D body meshes obtained by fitting a human body model tothe commercial CAESAR scans. In this work, we introduce theBODY-fit dataset that contains fitted meshes of 2,675 female and1,474 male 3D body scans. We unify evaluation on the CAESAR-fit and BODY-fit datasets by computing body measurements fromgeodesic surface paths as the ground truth and by generating two-view silhouette images. We also introduce BODY-rgb - a realisticdataset of 86 male and 108 female subjects captured with an RGBcamera and manually tape measured ground truth. We propose asimple yet effective deep CNN architecture as a baseline methodwhich obtains competitive accuracy on the three datasets.acceptedVersionPeer reviewe
    • …
    corecore