100 research outputs found
A Generative Model of People in Clothing
We present the first image-based generative model of people in clothing for
the full body. We sidestep the commonly used complex graphics rendering
pipeline and the need for high-quality 3D scans of dressed people. Instead, we
learn generative models from a large image database. The main challenge is to
cope with the high variance in human pose, shape and appearance. For this
reason, pure image-based approaches have not been considered so far. We show
that this challenge can be overcome by splitting the generating process in two
parts. First, we learn to generate a semantic segmentation of the body and
clothing. Second, we learn a conditional model on the resulting segments that
creates realistic images. The full model is differentiable and can be
conditioned on pose, shape or color. The result are samples of people in
different clothing items and styles. The proposed model can generate entirely
new people with realistic clothing. In several experiments we present
encouraging results that suggest an entirely data-driven approach to people
generation is possible
Implicit Feature Networks for Texture Completion from Partial 3D Data
Prior work to infer 3D texture use either texture atlases, which require
uv-mappings and hence have discontinuities, or colored voxels, which are memory
inefficient and limited in resolution. Recent work, predicts RGB color at every
XYZ coordinate forming a texture field, but focus on completing texture given a
single 2D image. Instead, we focus on 3D texture and geometry completion from
partial and incomplete 3D scans. IF-Nets have recently achieved
state-of-the-art results on 3D geometry completion using a multi-scale deep
feature encoding, but the outputs lack texture. In this work, we generalize
IF-Nets to texture completion from partial textured scans of humans and
arbitrary objects. Our key insight is that 3D texture completion benefits from
incorporating local and global deep features extracted from both the 3D partial
texture and completed geometry. Specifically, given the partial 3D texture and
the 3D geometry completed with IF-Nets, our model successfully in-paints the
missing texture parts in consistence with the completed geometry. Our model won
the SHARP ECCV'20 challenge, achieving highest performance on all challenges.Comment: SHARP Workshop, European Conference on Computer Vision (ECCV), 202
Tex2Shape: Detailed Full Human Body Geometry From a Single Image
We present a simple yet effective method to infer detailed full human body
shape from only a single photograph. Our model can infer full-body shape
including face, hair, and clothing including wrinkles at interactive
frame-rates. Results feature details even on parts that are occluded in the
input image. Our main idea is to turn shape regression into an aligned
image-to-image translation problem. The input to our method is a partial
texture map of the visible region obtained from off-the-shelf methods. From a
partial texture, we estimate detailed normal and vector displacement maps,
which can be applied to a low-resolution smooth body model to add detail and
clothing. Despite being trained purely with synthetic data, our model
generalizes well to real-world photographs. Numerous results demonstrate the
versatility and robustness of our method
4D Cardiac MRI Segmentation
Realitzat en col·laboració amb el centre o empresa: Northeastern Universit
Sparse Inertial Poser: Automatic 3D Human Pose Estimation from Sparse IMUs
We address the problem of making human motion capture in the wild more
practical by using a small set of inertial sensors attached to the body. Since
the problem is heavily under-constrained, previous methods either use a large
number of sensors, which is intrusive, or they require additional video input.
We take a different approach and constrain the problem by: (i) making use of a
realistic statistical body model that includes anthropometric constraints and
(ii) using a joint optimization framework to fit the model to orientation and
acceleration measurements over multiple frames. The resulting tracker Sparse
Inertial Poser (SIP) enables 3D human pose estimation using only 6 sensors
(attached to the wrists, lower legs, back and head) and works for arbitrary
human motions. Experiments on the recently released TNT15 dataset show that,
using the same number of sensors, SIP achieves higher accuracy than the dataset
baseline without using any video data. We further demonstrate the effectiveness
of SIP on newly recorded challenging motions in outdoor scenarios such as
climbing or jumping over a wall.Comment: 12 pages, Accepted at Eurographics 201
In the Wild Human Pose Estimation Using Explicit 2D Features and Intermediate 3D Representations
Convolutional Neural Network based approaches for monocular 3D human pose
estimation usually require a large amount of training images with 3D pose
annotations. While it is feasible to provide 2D joint annotations for large
corpora of in-the-wild images with humans, providing accurate 3D annotations to
such in-the-wild corpora is hardly feasible in practice. Most existing 3D
labelled data sets are either synthetically created or feature in-studio
images. 3D pose estimation algorithms trained on such data often have limited
ability to generalize to real world scene diversity. We therefore propose a new
deep learning based method for monocular 3D human pose estimation that shows
high accuracy and generalizes better to in-the-wild scenes. It has a network
architecture that comprises a new disentangled hidden space encoding of
explicit 2D and 3D features, and uses supervision by a new learned projection
model from predicted 3D pose. Our algorithm can be jointly trained on image
data with 3D labels and image data with only 2D labels. It achieves
state-of-the-art accuracy on challenging in-the-wild data.Comment: Accepted to CVPR 201
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
Learning to Transfer Texture from Clothing Images to 3D Humans
In this paper, we present a simple yet effective method to automatically
transfer textures of clothing images (front and back) to 3D garments worn on
top SMPL, in real time. We first automatically compute training pairs of images
with aligned 3D garments using a custom non-rigid 3D to 2D registration method,
which is accurate but slow. Using these pairs, we learn a mapping from pixels
to the 3D garment surface. Our idea is to learn dense correspondences from
garment image silhouettes to a 2D-UV map of a 3D garment surface using shape
information alone, completely ignoring texture, which allows us to generalize
to the wide range of web images. Several experiments demonstrate that our model
is more accurate than widely used baselines such as thin-plate-spline warping
and image-to-image translation networks while being orders of magnitude faster.
Our model opens the door for applications such as virtual try-on, and allows
for generation of 3D humans with varied textures which is necessary for
learning.Comment: IEEE Conference on Computer Vision and Pattern Recognitio
- …