889 research outputs found
Single-frame Regularization for Temporally Stable CNNs
Convolutional neural networks (CNNs) can model complicated non-linear
relations between images. However, they are notoriously sensitive to small
changes in the input. Most CNNs trained to describe image-to-image mappings
generate temporally unstable results when applied to video sequences, leading
to flickering artifacts and other inconsistencies over time. In order to use
CNNs for video material, previous methods have relied on estimating dense
frame-to-frame motion information (optical flow) in the training and/or the
inference phase, or by exploring recurrent learning structures. We take a
different approach to the problem, posing temporal stability as a
regularization of the cost function. The regularization is formulated to
account for different types of motion that can occur between frames, so that
temporally stable CNNs can be trained without the need for video material or
expensive motion estimation. The training can be performed as a fine-tuning
operation, without architectural modifications of the CNN. Our evaluation shows
that the training strategy leads to large improvements in temporal smoothness.
Moreover, for small datasets the regularization can help in boosting the
generalization performance to a much larger extent than what is possible with
na\"ive augmentation strategies
MonoPerfCap: Human Performance Capture from Monocular Video
We present the first marker-less approach for temporally coherent 3D
performance capture of a human with general clothing from monocular video. Our
approach reconstructs articulated human skeleton motion as well as medium-scale
non-rigid surface deformations in general scenes. Human performance capture is
a challenging problem due to the large range of articulation, potentially fast
motion, and considerable non-rigid deformations, even from multi-view data.
Reconstruction from monocular video alone is drastically more challenging,
since strong occlusions and the inherent depth ambiguity lead to a highly
ill-posed reconstruction problem. We tackle these challenges by a novel
approach that employs sparse 2D and 3D human pose detections from a
convolutional neural network using a batch-based pose estimation strategy.
Joint recovery of per-batch motion allows to resolve the ambiguities of the
monocular reconstruction problem based on a low dimensional trajectory
subspace. In addition, we propose refinement of the surface geometry based on
fully automatically extracted silhouettes to enable medium-scale non-rigid
alignment. We demonstrate state-of-the-art performance capture results that
enable exciting applications such as video editing and free viewpoint video,
previously infeasible from monocular video. Our qualitative and quantitative
evaluation demonstrates that our approach significantly outperforms previous
monocular methods in terms of accuracy, robustness and scene complexity that
can be handled.Comment: Accepted to ACM TOG 2018, to be presented on SIGGRAPH 201
- …