20 research outputs found
From Multiview Image Curves to 3D Drawings
Reconstructing 3D scenes from multiple views has made impressive strides in
recent years, chiefly by correlating isolated feature points, intensity
patterns, or curvilinear structures. In the general setting - without
controlled acquisition, abundant texture, curves and surfaces following
specific models or limiting scene complexity - most methods produce unorganized
point clouds, meshes, or voxel representations, with some exceptions producing
unorganized clouds of 3D curve fragments. Ideally, many applications require
structured representations of curves, surfaces and their spatial relationships.
This paper presents a step in this direction by formulating an approach that
combines 2D image curves into a collection of 3D curves, with topological
connectivity between them represented as a 3D graph. This results in a 3D
drawing, which is complementary to surface representations in the same sense as
a 3D scaffold complements a tent taut over it. We evaluate our results against
truth on synthetic and real datasets.Comment: Expanded ECCV 2016 version with tweaked figures and including an
overview of the supplementary material available at
multiview-3d-drawing.sourceforge.ne
Motion capture based on RGBD data from multiple sensors for avatar animation
With recent advances in technology and emergence of affordable RGB-D sensors for a
wider range of users, markerless motion capture has become an active field of research
both in computer vision and computer graphics.
In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us
to perform motion capture by using a variable number of commodity RGB-D sensors of
different brands and technical specifications on constraint-less layout environments. The
main goal of this work is to provide a tool with motion capture capabilities by using a
handful of RGB-D sensors, without imposing strong requirements in terms of lighting,
background or extension of the motion capture area. Of course, the number of RGB-D
sensors needed is inversely proportional to their resolution, and directly proportional to
the size of the area to track to.
Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end
RGB-D sensors currently available in the market. Due to the lack of resources on
a single computer, in order to support more than a couple of sensors working simultaneously,
we need a setup composed of multiple computers. In order to keep data coherency
and synchronization across sensors and computers, our tool makes use of a semi-automatic
calibration method and a message-oriented network protocol.
From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation
of the environment. By combining pointclouds from multiple sensors, we can
collect a complete and animated 3D pointcloud that can be visualized from any viewpoint.
Given a 3D avatar model and its corresponding attached skeleton, we can use an
iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame
and a skeleton configuration, resulting in 3D avatar animation when using such skeleton
configurations as key frames
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach
Multi-Garment Net: {L}earning to Dress {3D} People from Images
We present Multi-Garment Network (MGN), a method to predict body shape and clothing, layered on top of the SMPL model from a few frames (1-8) of a video. Several experiments demonstrate that this representation allows higher level of control when compared to single mesh or voxel representations of shape. Our model allows to predict garment geometry, relate it to the body shape, and transfer it to new body shapes and poses. To train MGN, we leverage a digital wardrobe containing 712 digital garments in correspondence, obtained with a novel method to register a set of clothing templates to a dataset of real 3D scans of people in different clothing and poses. Garments from the digital wardrobe, or predicted by MGN, can be used to dress any body shape in arbitrary poses. We will make publicly available the digital wardrobe, the MGN model, and code to dress SMPL with the garments
Learning to Transfer Texture from Clothing Images to 3D Humans
In this paper, we present a simple yet effective method to automatically
transfer textures of clothing images (front and back) to 3D garments worn on
top SMPL, in real time. We first automatically compute training pairs of images
with aligned 3D garments using a custom non-rigid 3D to 2D registration method,
which is accurate but slow. Using these pairs, we learn a mapping from pixels
to the 3D garment surface. Our idea is to learn dense correspondences from
garment image silhouettes to a 2D-UV map of a 3D garment surface using shape
information alone, completely ignoring texture, which allows us to generalize
to the wide range of web images. Several experiments demonstrate that our model
is more accurate than widely used baselines such as thin-plate-spline warping
and image-to-image translation networks while being orders of magnitude faster.
Our model opens the door for applications such as virtual try-on, and allows
for generation of 3D humans with varied textures which is necessary for
learning.Comment: IEEE Conference on Computer Vision and Pattern Recognitio
Video Based Reconstruction of 3D People Models
This paper describes how to obtain accurate 3D body models and texture of
arbitrary people from a single, monocular video in which a person is moving.
Based on a parametric body model, we present a robust processing pipeline
achieving 3D model fits with 5mm accuracy also for clothed people. Our main
contribution is a method to nonrigidly deform the silhouette cones
corresponding to the dynamic human silhouettes, resulting in a visual hull in a
common reference frame that enables surface reconstruction. This enables
efficient estimation of a consensus 3D shape, texture and implanted animation
skeleton based on a large number of frames. We present evaluation results for a
number of test subjects and analyze overall performance. Requiring only a
smartphone or webcam, our method enables everyone to create their own fully
animatable digital double, e.g., for social VR applications or virtual try-on
for online fashion shopping.Comment: CVPR 2018 Spotlight, IEEE Conference on Computer Vision and Pattern
Recognition 2018 (CVPR
MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras
Synthesizing novel views of dynamic humans from stationary monocular cameras
is a popular scenario. This is particularly attractive as it does not require
static scenes, controlled environments, or specialized hardware. In contrast to
techniques that exploit multi-view observations to constrain the modeling,
given a single fixed viewpoint only, the problem of modeling the dynamic scene
is significantly more under-constrained and ill-posed. In this paper, we
introduce Neural Motion Consensus Flow (MoCo-Flow), a representation that
models the dynamic scene using a 4D continuous time-variant function. The
proposed representation is learned by an optimization which models a dynamic
scene that minimizes the error of rendering all observation images. At the
heart of our work lies a novel optimization formulation, which is constrained
by a motion consensus regularization on the motion flow. We extensively
evaluate MoCo-Flow on several datasets that contain human motions of varying
complexity, and compare, both qualitatively and quantitatively, to several
baseline methods and variants of our methods. Pretrained model, code, and data
will be released for research purposes upon paper acceptance