21 research outputs found
Color-aware surface registration
Shape registration is fundamental to 3D object acquisition; it is used to fuse scans from multiple views. Existing algorithms mainly utilize geometric information to determine alignment, but this typically results in noticeable misalignment of textures (i.e. surface colors) when using RGB-depth cameras. We address this problem using a novel approach to color-aware registration, which takes both color and geometry into consideration simultaneously. Color information is exploited throughout the pipeline to provide more effective sampling, correspondence and alignment, in particular for surfaces with detailed textures. Our method can furthermore tackle both rigid and non-rigid registration problems (arising, for example, due to small changes in the object during scanning, or camera distortions). We demonstrate that our approach produces significantly better results than previous methods
Neural 3D Video Synthesis
We propose a novel approach for 3D video synthesis that is able to represent
multi-view video recordings of a dynamic real-world scene in a compact, yet
expressive representation that enables high-quality view synthesis and motion
interpolation. Our approach takes the high quality and compactness of static
neural radiance fields in a new direction: to a model-free, dynamic setting. At
the core of our approach is a novel time-conditioned neural radiance fields
that represents scene dynamics using a set of compact latent codes. To exploit
the fact that changes between adjacent frames of a video are typically small
and locally consistent, we propose two novel strategies for efficient training
of our neural network: 1) An efficient hierarchical training scheme, and 2) an
importance sampling strategy that selects the next rays for training based on
the temporal variation of the input videos. In combination, these two
strategies significantly boost the training speed, lead to fast convergence of
the training process, and enable high quality results. Our learned
representation is highly compact and able to represent a 10 second 30 FPS
multi-view video recording by 18 cameras with a model size of just 28MB. We
demonstrate that our method can render high-fidelity wide-angle novel views at
over 1K resolution, even for highly complex and dynamic scenes. We perform an
extensive qualitative and quantitative evaluation that shows that our approach
outperforms the current state of the art. We include additional video and
information at: https://neural-3d-video.github.io/Comment: Project website: https://neural-3d-video.github.io
Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments
Despite the impressive progress of telepresence systems for room-scale scenes
with static and dynamic scene entities, expanding their capabilities to
scenarios with larger dynamic environments beyond a fixed size of a few
square-meters remains challenging.
In this paper, we aim at sharing 3D live-telepresence experiences in
large-scale environments beyond room scale with both static and dynamic scene
entities at practical bandwidth requirements only based on light-weight scene
capture with a single moving consumer-grade RGB-D camera. To this end, we
present a system which is built upon a novel hybrid volumetric scene
representation in terms of the combination of a voxel-based scene
representation for the static contents, that not only stores the reconstructed
surface geometry but also contains information about the object semantics as
well as their accumulated dynamic movement over time, and a point-cloud-based
representation for dynamic scene parts, where the respective separation from
static parts is achieved based on semantic and instance information extracted
for the input frames. With an independent yet simultaneous streaming of both
static and dynamic content, where we seamlessly integrate potentially moving
but currently static scene entities in the static model until they are becoming
dynamic again, as well as the fusion of static and dynamic data at the remote
client, our system is able to achieve VR-based live-telepresence at close to
real-time rates. Our evaluation demonstrates the potential of our novel
approach in terms of visual quality, performance, and ablation studies
regarding involved design choices
HIGH QUALITY HUMAN 3D BODY MODELING, TRACKING AND APPLICATION
Geometric reconstruction of dynamic objects is a fundamental task of computer vision and graphics, and modeling human body of high fidelity is considered to be a core of this problem. Traditional human shape and motion capture techniques require an array of surrounding cameras or subjects wear reflective markers, resulting in a limitation of working space and portability. In this dissertation, a complete process is designed from geometric modeling detailed 3D human full body and capturing shape dynamics over time using a flexible setup to guiding clothes/person re-targeting with such data-driven models. As the mechanical movement of human body can be considered as an articulate motion, which is easy to guide the skin animation but has difficulties in the reverse process to find parameters from images without manual intervention, we present a novel parametric model, GMM-BlendSCAPE, jointly taking both linear skinning model and the prior art of BlendSCAPE (Blend Shape Completion and Animation for PEople) into consideration and develop a Gaussian Mixture Model (GMM) to infer both body shape and pose from incomplete observations. We show the increased accuracy of joints and skin surface estimation using our model compared to the skeleton based motion tracking. To model the detailed body, we start with capturing high-quality partial 3D scans by using a single-view commercial depth camera. Based on GMM-BlendSCAPE, we can then reconstruct multiple complete static models of large pose difference via our novel non-rigid registration algorithm. With vertex correspondences established, these models can be further converted into a personalized drivable template and used for robust pose tracking in a similar GMM framework. Moreover, we design a general purpose real-time non-rigid deformation algorithm to accelerate this registration. Last but not least, we demonstrate a novel virtual clothes try-on application based on our personalized model utilizing both image and depth cues to synthesize and re-target clothes for single-view videos of different people