553 research outputs found
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image
To facilitate the analysis of human actions, interactions and emotions, we
compute a 3D model of human body pose, hand pose, and facial expression from a
single monocular image. To achieve this, we use thousands of 3D scans to train
a new, unified, 3D model of the human body, SMPL-X, that extends SMPL with
fully articulated hands and an expressive face. Learning to regress the
parameters of SMPL-X directly from images is challenging without paired images
and 3D ground truth. Consequently, we follow the approach of SMPLify, which
estimates 2D features and then optimizes model parameters to fit the features.
We improve on SMPLify in several significant ways: (1) we detect 2D features
corresponding to the face, hands, and feet and fit the full SMPL-X model to
these; (2) we train a new neural network pose prior using a large MoCap
dataset; (3) we define a new interpenetration penalty that is both fast and
accurate; (4) we automatically detect gender and the appropriate body models
(male, female, or neutral); (5) our PyTorch implementation achieves a speedup
of more than 8x over Chumpy. We use the new method, SMPLify-X, to fit SMPL-X to
both controlled images and images in the wild. We evaluate 3D accuracy on a new
curated dataset comprising 100 images with pseudo ground-truth. This is a step
towards automatic expressive human capture from monocular RGB data. The models,
code, and data are available for research purposes at
https://smpl-x.is.tue.mpg.de.Comment: To appear in CVPR 201
End-to-end Recovery of Human Shape and Pose
We describe Human Mesh Recovery (HMR), an end-to-end framework for
reconstructing a full 3D mesh of a human body from a single RGB image. In
contrast to most current methods that compute 2D or 3D joint locations, we
produce a richer and more useful mesh representation that is parameterized by
shape and 3D joint angles. The main objective is to minimize the reprojection
loss of keypoints, which allow our model to be trained using images in-the-wild
that only have ground truth 2D annotations. However, the reprojection loss
alone leaves the model highly under constrained. In this work we address this
problem by introducing an adversary trained to tell whether a human body
parameter is real or not using a large database of 3D human meshes. We show
that HMR can be trained with and without using any paired 2D-to-3D supervision.
We do not rely on intermediate 2D keypoint detections and infer 3D pose and
shape parameters directly from image pixels. Our model runs in real-time given
a bounding box containing the person. We demonstrate our approach on various
images in-the-wild and out-perform previous optimization based methods that
output 3D meshes and show competitive results on tasks such as 3D joint
location estimation and part segmentation.Comment: CVPR 2018, Project page with code: https://akanazawa.github.io/hmr
Adjustable Method Based on Body Parts for Improving the Accuracy of 3D Reconstruction in Visually Important Body Parts from Silhouettes
This research proposes a novel adjustable algorithm for reconstructing 3D
body shapes from front and side silhouettes. Most recent silhouette-based
approaches use a deep neural network trained by silhouettes and key points to
estimate the shape parameters but cannot accurately fit the model to the body
contours and consequently are struggling to cover detailed body geometry,
especially in the torso. In addition, in most of these cases, body parts have
the same accuracy priority, making the optimization harder and avoiding
reaching the optimum possible result in essential body parts, like the torso,
which is visually important in most applications, such as virtual garment
fitting. In the proposed method, we can adjust the expected accuracy for each
body part based on our purpose by assigning coefficients for the distance of
each body part between the projected 3D body and 2D silhouettes. To measure
this distance, we first recognize the correspondent body parts using body
segmentation in both views. Then, we align individual body parts by 2D rigid
registration and match them using pairwise matching. The objective function
tries to minimize the distance cost for the individual body parts in both views
based on distances and coefficients by optimizing the statistical model
parameters. We also handle the slight variation in the degree of arms and limbs
by matching the pose. We evaluate the proposed method with synthetic body
meshes from the normalized S-SCAPE. The result shows that the algorithm can
more accurately reconstruct visually important body parts with high
coefficients.Comment: 16 pages, 17 image
Geometric Expression Invariant 3D Face Recognition using Statistical Discriminant Models
Currently there is no complete face recognition system that is invariant to all facial expressions.
Although humans find it easy to identify and recognise faces regardless of changes in illumination,
pose and expression, producing a computer system with a similar capability has proved to
be particularly di cult. Three dimensional face models are geometric in nature and therefore
have the advantage of being invariant to head pose and lighting. However they are still susceptible
to facial expressions. This can be seen in the decrease in the recognition results using
principal component analysis when expressions are added to a data set.
In order to achieve expression-invariant face recognition systems, we have employed a tensor
algebra framework to represent 3D face data with facial expressions in a parsimonious
space. Face variation factors are organised in particular subject and facial expression modes.
We manipulate this using single value decomposition on sub-tensors representing one variation
mode. This framework possesses the ability to deal with the shortcomings of PCA in less constrained
environments and still preserves the integrity of the 3D data. The results show improved
recognition rates for faces and facial expressions, even recognising high intensity expressions
that are not in the training datasets.
We have determined, experimentally, a set of anatomical landmarks that best describe facial
expression e ectively. We found that the best placement of landmarks to distinguish di erent
facial expressions are in areas around the prominent features, such as the cheeks and eyebrows.
Recognition results using landmark-based face recognition could be improved with better placement.
We looked into the possibility of achieving expression-invariant face recognition by reconstructing
and manipulating realistic facial expressions. We proposed a tensor-based statistical
discriminant analysis method to reconstruct facial expressions and in particular to neutralise
facial expressions. The results of the synthesised facial expressions are visually more realistic
than facial expressions generated using conventional active shape modelling (ASM). We
then used reconstructed neutral faces in the sub-tensor framework for recognition purposes.
The recognition results showed slight improvement. Besides biometric recognition, this novel
tensor-based synthesis approach could be used in computer games and real-time animation
applications
Silhouette Body Measurement Benchmarks
Anthropometric body measurements are importantfor industrial design, garment fitting, medical diagnosis andergonomics. A number of methods have been proposed toestimate the body measurements from images, but progress hasbeen slow due to the lack of realistic and publicly availabledatasets. The existing works train and test on silhouettes of3D body meshes obtained by fitting a human body model tothe commercial CAESAR scans. In this work, we introduce theBODY-fit dataset that contains fitted meshes of 2,675 female and1,474 male 3D body scans. We unify evaluation on the CAESAR-fit and BODY-fit datasets by computing body measurements fromgeodesic surface paths as the ground truth and by generating two-view silhouette images. We also introduce BODY-rgb - a realisticdataset of 86 male and 108 female subjects captured with an RGBcamera and manually tape measured ground truth. We propose asimple yet effective deep CNN architecture as a baseline methodwhich obtains competitive accuracy on the three datasets.acceptedVersionPeer reviewe
- …