97,510 research outputs found
Differentiable Rendering for Pose Estimation in Proximity Operations
Differentiable rendering aims to compute the derivative of the image
rendering function with respect to the rendering parameters. This paper
presents a novel algorithm for 6-DoF pose estimation through gradient-based
optimization using a differentiable rendering pipeline. We emphasize two key
contributions: (1) instead of solving the conventional 2D to 3D correspondence
problem and computing reprojection errors, images (rendered using the 3D model)
are compared only in the 2D feature space via sparse 2D feature
correspondences. (2) Instead of an analytical image formation model, we compute
an approximate local gradient of the rendering process through online learning.
The learning data consists of image features extracted from multi-viewpoint
renders at small perturbations in the pose neighborhood. The gradients are
propagated through the rendering pipeline for the 6-DoF pose estimation using
nonlinear least squares. This gradient-based optimization regresses directly
upon the pose parameters by aligning the 3D model to reproduce a reference
image shape. Using representative experiments, we demonstrate the application
of our approach to pose estimation in proximity operations.Comment: AIAA SciTech Forum 2023, 13 pages, 9 figure
NeRF-GAN Distillation for Efficient 3D-Aware Generation with Convolutions
Pose-conditioned convolutional generative models struggle with high-quality
3D-consistent image generation from single-view datasets, due to their lack of
sufficient 3D priors. Recently, the integration of Neural Radiance Fields
(NeRFs) and generative models, such as Generative Adversarial Networks (GANs),
has transformed 3D-aware generation from single-view images. NeRF-GANs exploit
the strong inductive bias of neural 3D representations and volumetric rendering
at the cost of higher computational complexity. This study aims at revisiting
pose-conditioned 2D GANs for efficient 3D-aware generation at inference time by
distilling 3D knowledge from pretrained NeRF-GANs. We propose a simple and
effective method, based on re-using the well-disentangled latent space of a
pre-trained NeRF-GAN in a pose-conditioned convolutional network to directly
generate 3D-consistent images corresponding to the underlying 3D
representations. Experiments on several datasets demonstrate that the proposed
method obtains results comparable with volumetric rendering in terms of quality
and 3D consistency while benefiting from the computational advantage of
convolutional networks. The code will be available at:
https://github.com/mshahbazi72/NeRF-GAN-Distillatio
Robust Pose Transfer with Dynamic Details using Neural Video Rendering
Pose transfer of human videos aims to generate a high fidelity video of a
target person imitating actions of a source person. A few studies have made
great progress either through image translation with deep latent features or
neural rendering with explicit 3D features. However, both of them rely on large
amounts of training data to generate realistic results, and the performance
degrades on more accessible internet videos due to insufficient training
frames. In this paper, we demonstrate that the dynamic details can be preserved
even trained from short monocular videos. Overall, we propose a neural video
rendering framework coupled with an image-translation-based dynamic details
generation network (D2G-Net), which fully utilizes both the stability of
explicit 3D features and the capacity of learning components. To be specific, a
novel texture representation is presented to encode both the static and
pose-varying appearance characteristics, which is then mapped to the image
space and rendered as a detail-rich frame in the neural rendering stage.
Moreover, we introduce a concise temporal loss in the training stage to
suppress the detail flickering that is made more visible due to high-quality
dynamic details generated by our method. Through extensive comparisons, we
demonstrate that our neural human video renderer is capable of achieving both
clearer dynamic details and more robust performance even on accessible short
videos with only 2k - 4k frames.Comment: Video link: https://www.bilibili.com/video/BV1y64y1C7ge
MoSculp: Interactive Visualization of Shape and Time
We present a system that allows users to visualize complex human motion via
3D motion sculptures---a representation that conveys the 3D structure swept by
a human body as it moves through space. Given an input video, our system
computes the motion sculptures and provides a user interface for rendering it
in different styles, including the options to insert the sculpture back into
the original video, render it in a synthetic scene or physically print it.
To provide this end-to-end workflow, we introduce an algorithm that estimates
that human's 3D geometry over time from a set of 2D images and develop a
3D-aware image-based rendering approach that embeds the sculpture back into the
scene. By automating the process, our system takes motion sculpture creation
out of the realm of professional artists, and makes it applicable to a wide
range of existing video material.
By providing viewers with 3D information, motion sculptures reveal space-time
motion information that is difficult to perceive with the naked eye, and allow
viewers to interpret how different parts of the object interact over time. We
validate the effectiveness of this approach with user studies, finding that our
motion sculpture visualizations are significantly more informative about motion
than existing stroboscopic and space-time visualization methods.Comment: UIST 2018. Project page: http://mosculp.csail.mit.edu
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Adaptive User Perspective Rendering for Handheld Augmented Reality
Handheld Augmented Reality commonly implements some variant of magic lens
rendering, which turns only a fraction of the user's real environment into AR
while the rest of the environment remains unaffected. Since handheld AR devices
are commonly equipped with video see-through capabilities, AR magic lens
applications often suffer from spatial distortions, because the AR environment
is presented from the perspective of the camera of the mobile device. Recent
approaches counteract this distortion based on estimations of the user's head
position, rendering the scene from the user's perspective. To this end,
approaches usually apply face-tracking algorithms on the front camera of the
mobile device. However, this demands high computational resources and therefore
commonly affects the performance of the application beyond the already high
computational load of AR applications. In this paper, we present a method to
reduce the computational demands for user perspective rendering by applying
lightweight optical flow tracking and an estimation of the user's motion before
head tracking is started. We demonstrate the suitability of our approach for
computationally limited mobile devices and we compare it to device perspective
rendering, to head tracked user perspective rendering, as well as to fixed
point of view user perspective rendering
- …