94 research outputs found
VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis
Differentiable rendering allows the application of computer graphics onvision tasks, e.g. object pose and shape fitting, via analysis-by-synthesis,where gradients at occluded regions are important when inverting the renderingprocess. To obtain those gradients, state-of-the-art (SoTA) differentiablerenderers use rasterization to collect a set of nearest components for eachpixel and aggregate them based on the viewing distance. In this paper, wepropose VoGE, which uses ray tracing to capture nearest components with theirvolume density distributions on the rays and aggregates via integral of thevolume densities based on Gaussian ellipsoids, which brings more efficient andstable gradients. To efficiently render via VoGE, we propose an approximateclose-form solution for the volume density aggregation and a coarse-to-finerendering strategy. Finally, we provide a CUDA implementation of VoGE, whichgives a competitive rendering speed in comparison to PyTorch3D. Quantitativeand qualitative experiment results show VoGE outperforms SoTA counterparts whenapplied to various vision tasks,e.g., object pose estimation, shape/texturefitting, and occlusion reasoning. The VoGE library and demos are available athttps://github.com/Angtian/VoGE.<br
State of the Art on Neural Rendering
Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems
StyleRig: Rigging StyleGAN for 3D Control over Portrait Images
StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face
Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation
Generating videos for visual storytelling can be a tedious and complex
process that typically requires either live-action filming or graphics
animation rendering. To bypass these challenges, our key idea is to utilize the
abundance of existing video clips and synthesize a coherent storytelling video
by customizing their appearances. We achieve this by developing a framework
comprised of two functional modules: (i) Motion Structure Retrieval, which
provides video candidates with desired scene or motion context described by
query texts, and (ii) Structure-Guided Text-to-Video Synthesis, which generates
plot-aligned videos under the guidance of motion structure and text prompts.
For the first module, we leverage an off-the-shelf video retrieval system and
extract video depths as motion structure. For the second module, we propose a
controllable video generation model that offers flexible controls over
structure and characters. The videos are synthesized by following the
structural guidance and appearance instruction. To ensure visual consistency
across clips, we propose an effective concept personalization approach, which
allows the specification of the desired character identities through text
prompts. Extensive experiments demonstrate that our approach exhibits
significant advantages over various existing baselines.Comment: Github: https://github.com/VideoCrafter/Animate-A-Story Project page:
https://videocrafter.github.io/Animate-A-Stor
- …