951 research outputs found
Robust Non-Rigid Registration with Reweighted Position and Transformation Sparsity
Non-rigid registration is challenging because it is ill-posed with high
degrees of freedom and is thus sensitive to noise and outliers. We propose a
robust non-rigid registration method using reweighted sparsities on position
and transformation to estimate the deformations between 3-D shapes. We
formulate the energy function with position and transformation sparsity on both
the data term and the smoothness term, and define the smoothness constraint
using local rigidity. The double sparsity based non-rigid registration model is
enhanced with a reweighting scheme, and solved by transferring the model into
four alternately-optimized subproblems which have exact solutions and
guaranteed convergence. Experimental results on both public datasets and real
scanned datasets show that our method outperforms the state-of-the-art methods
and is more robust to noise and outliers than conventional non-rigid
registration methods.Comment: IEEE Transactions on Visualization and Computer Graphic
Global alignment of deformable objects captured by a single RGB-D camera
We present a novel global registration method for deformable objects captured using a single RGB-D camera. Our algorithm allows objects to undergo large non-rigid deformations, and achieves high quality results without constraining the actor's pose or camera motion. We compute the deformations of all the scans simultaneously by optimizing a global alignment problem to avoid the well-known loop closure problem, and use an as-rigid-as-possible constraint to eliminate the shrinkage problem of the deformed model. To attack large scale problems, we design a coarse-to-fine multi-resolution scheme, which also avoids the optimization being trapped into local minima. The proposed method is evaluated on public datasets and real datasets captured by an RGB-D sensor. Experimental results demonstrate that the proposed method obtains better results than the state-of-the-art methods
3-D motion recovery via low rank matrix analysis
Skeleton tracking is a useful and popular application
of Kinect. However, it cannot provide accurate reconstructions
for complex motions, especially in the presence of occlusion. This
paper proposes a new 3-D motion recovery method based on lowrank
matrix analysis to correct invalid or corrupted motions.
We address this problem by representing a motion sequence as
a matrix, and introducing a convex low-rank matrix recovery
model, which fixes erroneous entries and finds the correct
low-rank matrix by minimizing nuclear norm and `1-norm
of constituent clean motion and error matrices. Experimental
results show that our method recovers the corrupted skeleton
joints, achieving accurate and smooth reconstructions even for
complicated motions
Global 3D non-rigid registration of deformable objects using a single RGB-D camera
We present a novel global non-rigid registration method for dynamic 3D objects. Our method allows objects to undergo large non-rigid deformations, and achieves high quality results even with substantial pose change or camera motion between views. In addition, our method does not require a template prior and uses less raw data than tracking based methods since only a sparse set of scans is needed. We compute the deformations of all the scans simultaneously by optimizing a global alignment problem to avoid the well-known loop closure problem, and use an as-rigid-as-possible constraint to eliminate the shrinkage problem of the deformed shapes, especially near open boundaries of scans. To cope with large-scale problems, we design a coarse-to-fine multi-resolution scheme, which also avoids the optimization being trapped into local minima. The proposed method is evaluated on public datasets and real datasets captured by an RGB-D sensor. Experimental results demonstrate that the proposed method obtains better results than several state-of-the-art methods
3-D motion recovery via low rank matrix restoration on articulation graphs
This paper addresses the challenge of 3-D skeleton recovery by exploiting the spatio-temporal correlations of corrupted 3D skeleton sequences. A skeleton sequence is represented as a matrix. We propose a novel low-rank solution that effectively integrates both a low-rank model for robust skeleton recovery based on temporal coherence, and an articulation-graph-based isometric constraint for spatial coherence, namely consistency of bone lengths. The proposed model is formulated as a constrained optimization problem, which is efficiently solved by the Augmented Lagrangian Method with a Gauss-Newton solver for the subproblem of isometric optimization. Experimental results on the CMU motion capture dataset and a Kinect dataset show that the proposed approach achieves better recovery accuracy over a state-of-the-art method. The proposed method has wide applicability for skeleton tracking devices, such as the Kinect, because these devices cannot provide accurate reconstructions of complex motions, especially in the presence of occlusion
FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction
The advent of deep learning has led to significant progress in monocular
human reconstruction. However, existing representations, such as parametric
models, voxel grids, meshes and implicit neural representations, have
difficulties achieving high-quality results and real-time speed at the same
time. In this paper, we propose Fourier Occupancy Field (FOF), a novel
powerful, efficient and flexible 3D representation, for monocular real-time and
accurate human reconstruction. The FOF represents a 3D object with a 2D field
orthogonal to the view direction where at each 2D position the occupancy field
of the object along the view direction is compactly represented with the first
few terms of Fourier series, which retains the topology and neighborhood
relation in the 2D domain. A FOF can be stored as a multi-channel image, which
is compatible with 2D convolutional neural networks and can bridge the gap
between 3D geometries and 2D images. The FOF is very flexible and extensible,
e.g., parametric models can be easily integrated into a FOF as a prior to
generate more robust results. Based on FOF, we design the first 30+FPS
high-fidelity real-time monocular human reconstruction framework. We
demonstrate the potential of FOF on both public dataset and real captured data.
The code will be released for research purposes
High-Quality Animatable Dynamic Garment Reconstruction from Monocular Videos
Much progress has been made in reconstructing garments from an image or a
video. However, none of existing works meet the expectations of digitizing
high-quality animatable dynamic garments that can be adjusted to various unseen
poses. In this paper, we propose the first method to recover high-quality
animatable dynamic garments from monocular videos without depending on scanned
data. To generate reasonable deformations for various unseen poses, we propose
a learnable garment deformation network that formulates the garment
reconstruction task as a pose-driven deformation problem. To alleviate the
ambiguity estimating 3D garments from monocular videos, we design a
multi-hypothesis deformation module that learns spatial representations of
multiple plausible deformations. Experimental results on several public
datasets demonstrate that our method can reconstruct high-quality dynamic
garments with coherent surface details, which can be easily animated under
unseen poses. The code will be provided for research purposes
Generating 3D faces using multi-column graph convolutional networks
In this work, we introduce multi-column graph convolutional networks (MGCNs), a deep generative model for 3D mesh surfaces
that effectively learns a non-linear facial representation. We perform spectral decomposition of meshes and apply convolutions
directly in the frequency domain. Our network architecture involves multiple columns of graph convolutional networks (GCNs),
namely large GCN (L-GCN), medium GCN (M-GCN) and small GCN (S-GCN), with different filter sizes to extract features
at different scales. L-GCN is more useful to extract large-scale features, whereas S-GCN is effective for extracting subtle and
fine-grained features, and M-GCN captures information in between. Therefore, to obtain a high-quality representation, we
propose a selective fusion method that adaptively integrates these three kinds of information. Spatially non-local relationships
are also exploited through a self-attention mechanism to further improve the representation ability in the latent vector space.
Through extensive experiments, we demonstrate the superiority of our end-to-end framework in improving the accuracy of 3D
face reconstruction. Moreover, with the help of variational inference, our model has excellent generating ability
PISE: person image synthesis and editing with decoupled GAN
Person image synthesis, e.g., pose transfer, is a challenging problem due to large variation and occlusion. Existing methods have difficulties predicting reasonable invisible regions and fail to decouple the shape and style of clothing, which limits their applications on person image editing. In this paper, we propose PISE, a novel two-stage generative model for Person Image Synthesis and Editing, which is able to generate realistic person images with desired poses, textures, or semantic layouts. For human pose transfer, we first synthesize a human parsing map aligned with the target pose to represent the shape of clothing by a parsing generator, and then generate the final image by an image generator. To decouple the shape and style of clothing, we propose joint global and local per-region encoding and normalization to predict the reasonable style of clothing for invisible regions. We also propose spatial-aware normalization to retain the spatial context relationship in the source image. The results of qualitative and quantitative experiments demonstrate the superiority of our model on human pose transfer. Besides, the results of texture transfer and region editing show that our model can be applied to person image editing. The code is available for research purposes at https://github.com/Zhangjinso/PISE
SPA: Sparse Photorealistic Animation using a single RGB-D camera
Photorealistic animation is a desirable technique for computer games and movie production. We propose a new method to synthesize plausible videos of human actors with new motions using a single cheap RGB-D camera. A small database is captured in a usual office environment, which happens only once for synthesizing different motions. We propose a markerless performance capture method using sparse deformation to obtain the geometry and pose of the actor for each time instance in the database. Then, we synthesize an animation video of the actor performing the new motion that is defined by the user. An adaptive model-guided texture synthesis method based on weighted low-rank matrix completion is proposed to be less sensitive to noise and outliers, which enables us to easily create photorealistic animation videos with new motions that are different from the motions in the database. Experimental results on the public dataset and our captured dataset have verified the effectiveness of the proposed method
- …