1,066 research outputs found
LiveCap: Real-time Human Performance Capture from Monocular Video
We present the first real-time human performance capture approach that
reconstructs dense, space-time coherent deforming geometry of entire humans in
general everyday clothing from just a single RGB video. We propose a novel
two-stage analysis-by-synthesis optimization whose formulation and
implementation are designed for high performance. In the first stage, a skinned
template model is jointly fitted to background subtracted input video, 2D and
3D skeleton joint positions found using a deep neural network, and a set of
sparse facial landmark detections. In the second stage, dense non-rigid 3D
deformations of skin and even loose apparel are captured based on a novel
real-time capable algorithm for non-rigid tracking using dense photometric and
silhouette constraints. Our novel energy formulation leverages automatically
identified material regions on the template to model the differing non-rigid
deformation behavior of skin and apparel. The two resulting non-linear
optimization problems per-frame are solved with specially-tailored
data-parallel Gauss-Newton solvers. In order to achieve real-time performance
of over 25Hz, we design a pipelined parallel architecture using the CPU and two
commodity GPUs. Our method is the first real-time monocular approach for
full-body performance capture. Our method yields comparable accuracy with
off-line performance capture techniques, while being orders of magnitude
faster
Online Mutual Foreground Segmentation for Multispectral Stereo Videos
The segmentation of video sequences into foreground and background regions is
a low-level process commonly used in video content analysis and smart
surveillance applications. Using a multispectral camera setup can improve this
process by providing more diverse data to help identify objects despite adverse
imaging conditions. The registration of several data sources is however not
trivial if the appearance of objects produced by each sensor differs
substantially. This problem is further complicated when parallax effects cannot
be ignored when using close-range stereo pairs. In this work, we present a new
method to simultaneously tackle multispectral segmentation and stereo
registration. Using an iterative procedure, we estimate the labeling result for
one problem using the provisional result of the other. Our approach is based on
the alternating minimization of two energy functions that are linked through
the use of dynamic priors. We rely on the integration of shape and appearance
cues to find proper multispectral correspondences, and to properly segment
objects in low contrast regions. We also formulate our model as a frame
processing pipeline using higher order terms to improve the temporal coherence
of our results. Our method is evaluated under different configurations on
multiple multispectral datasets, and our implementation is available online.Comment: Preprint accepted for publication in IJCV (December 2018
Multi-frame scene-flow estimation using a patch model and smooth motion prior
This paper addresses the problem of estimating the dense 3D motion of a scene over several frames using a set of calibrated cameras. Most current 3D motion estimation techniques are limited to estimating the motion over a single frame, unless a strong prior model of the scene (such as a skeleton) is introduced. Estimating the 3D motion of a general scene is difficult due to untextured surfaces, complex movements and occlusions. In this paper, we show that it is possible to track the surfaces of a scene over several frames, by introducing an effective prior on the scene motion. Experimental results show that the proposed method estimates the dense scene-flow over multiple frames, without the need for multiple-view reconstructions at every frame. Furthermore, the accuracy of the proposed method is demonstrated by comparing the estimated motion against a ground truth
Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors
We present a method to infer 3D pose and shape of vehicles from a single
image. To tackle this ill-posed problem, we optimize two-scale projection
consistency between the generated 3D hypotheses and their 2D
pseudo-measurements. Specifically, we use a morphable wireframe model to
generate a fine-scaled representation of vehicle shape and pose. To reduce its
sensitivity to 2D landmarks, we jointly model the 3D bounding box as a coarse
representation which improves robustness. We also integrate three task priors,
including unsupervised monocular depth, a ground plane constraint as well as
vehicle shape priors, with forward projection errors into an overall energy
function.Comment: Proc. of the AAAI, September 201
Parsimonious Labeling
We propose a new family of discrete energy minimization problems, which we
call parsimonious labeling. Specifically, our energy functional consists of
unary potentials and high-order clique potentials. While the unary potentials
are arbitrary, the clique potentials are proportional to the {\em diversity} of
set of the unique labels assigned to the clique. Intuitively, our energy
functional encourages the labeling to be parsimonious, that is, use as few
labels as possible. This in turn allows us to capture useful cues for important
computer vision applications such as stereo correspondence and image denoising.
Furthermore, we propose an efficient graph-cuts based algorithm for the
parsimonious labeling problem that provides strong theoretical guarantees on
the quality of the solution. Our algorithm consists of three steps. First, we
approximate a given diversity using a mixture of a novel hierarchical
Potts model. Second, we use a divide-and-conquer approach for each mixture
component, where each subproblem is solved using an effficient
-expansion algorithm. This provides us with a small number of putative
labelings, one for each mixture component. Third, we choose the best putative
labeling in terms of the energy value. Using both sythetic and standard real
datasets, we show that our algorithm significantly outperforms other graph-cuts
based approaches
- …