430 research outputs found
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Learning to Reconstruct People in Clothing from a Single RGB Camera
We present a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving, in less than 10 seconds with a reconstruction accuracy of 5mm. Our model learns to predict the parameters of a statistical body model and instance displacements that add clothing and hair to the shape. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, the model can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 6mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach
3D Face Synthesis with KINECT
This work describes the process of face synthesis by image morphing from less expensive 3D sensors such as KINECT that are prone to sensor noise. Its main aim is to create a useful face database for future face recognition studies.Peer reviewe
Motion capture based on RGBD data from multiple sensors for avatar animation
With recent advances in technology and emergence of affordable RGB-D sensors for a
wider range of users, markerless motion capture has become an active field of research
both in computer vision and computer graphics.
In this thesis, we designed a POC (Proof of Concept) for a new tool that enables us
to perform motion capture by using a variable number of commodity RGB-D sensors of
different brands and technical specifications on constraint-less layout environments. The
main goal of this work is to provide a tool with motion capture capabilities by using a
handful of RGB-D sensors, without imposing strong requirements in terms of lighting,
background or extension of the motion capture area. Of course, the number of RGB-D
sensors needed is inversely proportional to their resolution, and directly proportional to
the size of the area to track to.
Built on top of the OpenNI 2 library, we made this POC compatible with most of the nonhigh-end
RGB-D sensors currently available in the market. Due to the lack of resources on
a single computer, in order to support more than a couple of sensors working simultaneously,
we need a setup composed of multiple computers. In order to keep data coherency
and synchronization across sensors and computers, our tool makes use of a semi-automatic
calibration method and a message-oriented network protocol.
From color and depth data given by a sensor, we can also obtain a 3D pointcloud representation
of the environment. By combining pointclouds from multiple sensors, we can
collect a complete and animated 3D pointcloud that can be visualized from any viewpoint.
Given a 3D avatar model and its corresponding attached skeleton, we can use an
iterative optimization method (e.g. Simplex) to find a fit between each pointcloud frame
and a skeleton configuration, resulting in 3D avatar animation when using such skeleton
configurations as key frames
EgoFace: Egocentric Face Performance Capture and Videorealistic Reenactment
Face performance capture and reenactment techniques use multiple cameras and sensors, positioned at a distance from the face or mounted on heavy wearable devices. This limits their applications in mobile and outdoor environments. We present EgoFace, a radically new lightweight setup for face performance capture and front-view videorealistic reenactment using a single egocentric RGB camera. Our lightweight setup allows operations in uncontrolled environments, and lends itself to telepresence applications such as video-conferencing from dynamic environments. The input image is projected into a low dimensional latent space of the facial expression parameters. Through careful adversarial training of the parameter-space synthetic rendering, a videorealistic animation is produced. Our problem is challenging as the human visual system is sensitive to the smallest face irregularities that could occur in the final results. This sensitivity is even stronger for video results. Our solution is trained in a pre-processing stage, through a supervised manner without manual annotations. EgoFace captures a wide variety of facial expressions, including mouth movements and asymmetrical expressions. It works under varying illuminations, background, movements, handles people from different ethnicities and can operate in real time
- …