15 research outputs found
Tex2Shape: Detailed Full Human Body Geometry From a Single Image
We present a simple yet effective method to infer detailed full human body
shape from only a single photograph. Our model can infer full-body shape
including face, hair, and clothing including wrinkles at interactive
frame-rates. Results feature details even on parts that are occluded in the
input image. Our main idea is to turn shape regression into an aligned
image-to-image translation problem. The input to our method is a partial
texture map of the visible region obtained from off-the-shelf methods. From a
partial texture, we estimate detailed normal and vector displacement maps,
which can be applied to a low-resolution smooth body model to add detail and
clothing. Despite being trained purely with synthetic data, our model
generalizes well to real-world photographs. Numerous results demonstrate the
versatility and robustness of our method
PhoMoH: Implicit Photorealistic 3D Models of Human Heads
We present PhoMoH, a neural network methodology to construct generative
models of photo-realistic 3D geometry and appearance of human heads including
hair, beards, an oral cavity, and clothing. In contrast to prior work, PhoMoH
models the human head using neural fields, thus supporting complex topology.
Instead of learning a head model from scratch, we propose to augment an
existing expressive head model with new features. Concretely, we learn a highly
detailed geometry network layered on top of a mid-resolution head model
together with a detailed, local geometry-aware, and disentangled color field.
Our proposed architecture allows us to learn photo-realistic human head models
from relatively little data. The learned generative geometry and appearance
networks can be sampled individually and enable the creation of diverse and
realistic human heads. Extensive experiments validate our method qualitatively
and across different metrics.Comment: To be published at the International Conference on 3D Vision 202
Learning to Transfer Texture from Clothing Images to 3D Humans
In this paper, we present a simple yet effective method to automatically
transfer textures of clothing images (front and back) to 3D garments worn on
top SMPL, in real time. We first automatically compute training pairs of images
with aligned 3D garments using a custom non-rigid 3D to 2D registration method,
which is accurate but slow. Using these pairs, we learn a mapping from pixels
to the 3D garment surface. Our idea is to learn dense correspondences from
garment image silhouettes to a 2D-UV map of a 3D garment surface using shape
information alone, completely ignoring texture, which allows us to generalize
to the wide range of web images. Several experiments demonstrate that our model
is more accurate than widely used baselines such as thin-plate-spline warping
and image-to-image translation networks while being orders of magnitude faster.
Our model opens the door for applications such as virtual try-on, and allows
for generation of 3D humans with varied textures which is necessary for
learning.Comment: IEEE Conference on Computer Vision and Pattern Recognitio
Video Based Reconstruction of 3D People Models
This paper describes how to obtain accurate 3D body models and texture of
arbitrary people from a single, monocular video in which a person is moving.
Based on a parametric body model, we present a robust processing pipeline
achieving 3D model fits with 5mm accuracy also for clothed people. Our main
contribution is a method to nonrigidly deform the silhouette cones
corresponding to the dynamic human silhouettes, resulting in a visual hull in a
common reference frame that enables surface reconstruction. This enables
efficient estimation of a consensus 3D shape, texture and implanted animation
skeleton based on a large number of frames. We present evaluation results for a
number of test subjects and analyze overall performance. Requiring only a
smartphone or webcam, our method enables everyone to create their own fully
animatable digital double, e.g., for social VR applications or virtual try-on
for online fashion shopping.Comment: CVPR 2018 Spotlight, IEEE Conference on Computer Vision and Pattern
Recognition 2018 (CVPR
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of
people from a few frames (1-8) of a monocular video in which the person is
moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our
model learns to predict the parameters of a statistical body model and instance
displacements that add clothing and hair to the shape. The model achieves fast
and accurate predictions based on two key design choices. First, by predicting
shape in a canonical T-pose space, the network learns to encode the images of
the person into pose-invariant latent codes, where the information is fused.
Second, based on the observation that feed-forward predictions are fast but do
not always align with the input images, we predict using both, bottom-up and
top-down streams (one per view) allowing information to flow in both
directions. Learning relies only on synthetic 3D data. Once learned, the model
can take a variable number of frames as input, and is able to reconstruct
shapes even from a single image with an accuracy of 6mm. Results on 3 different
datasets demonstrate the efficacy and accuracy of our approach
Structured 3D Features for Reconstructing Controllable Avatars
We introduce Structured 3D Features, a model based on a novel implicit 3D
representation that pools pixel-aligned image features onto dense 3D points
sampled from a parametric, statistical human mesh surface. The 3D points have
associated semantics and can move freely in 3D space. This allows for optimal
coverage of the person of interest, beyond just the body shape, which in turn,
additionally helps modeling accessories, hair, and loose clothing. Owing to
this, we present a complete 3D transformer-based attention framework which,
given a single image of a person in an unconstrained pose, generates an
animatable 3D reconstruction with albedo and illumination decomposition, as a
result of a single end-to-end model, trained semi-supervised, and with no
additional postprocessing. We show that our S3F model surpasses the previous
state-of-the-art on various tasks, including monocular 3D reconstruction, as
well as albedo and shading estimation. Moreover, we show that the proposed
methodology allows novel view synthesis, relighting, and re-posing the
reconstruction, and can naturally be extended to handle multiple input images
(e.g. different views of a person, or the same view, in different poses, in
video). Finally, we demonstrate the editing capabilities of our model for 3D
virtual try-on applications.Comment: Accepted at CVPR 2023. Project page:
https://enriccorona.github.io/s3f/, Video:
https://www.youtube.com/watch?v=mcZGcQ6L-2