94 research outputs found

    VoGE: A Differentiable Volume Renderer using Gaussian Ellipsoids for Analysis-by-Synthesis

    Get PDF
    Differentiable rendering allows the application of computer graphics onvision tasks, e.g. object pose and shape fitting, via analysis-by-synthesis,where gradients at occluded regions are important when inverting the renderingprocess. To obtain those gradients, state-of-the-art (SoTA) differentiablerenderers use rasterization to collect a set of nearest components for eachpixel and aggregate them based on the viewing distance. In this paper, wepropose VoGE, which uses ray tracing to capture nearest components with theirvolume density distributions on the rays and aggregates via integral of thevolume densities based on Gaussian ellipsoids, which brings more efficient andstable gradients. To efficiently render via VoGE, we propose an approximateclose-form solution for the volume density aggregation and a coarse-to-finerendering strategy. Finally, we provide a CUDA implementation of VoGE, whichgives a competitive rendering speed in comparison to PyTorch3D. Quantitativeand qualitative experiment results show VoGE outperforms SoTA counterparts whenapplied to various vision tasks,e.g., object pose estimation, shape/texturefitting, and occlusion reasoning. The VoGE library and demos are available athttps://github.com/Angtian/VoGE.<br

    State of the Art on Neural Rendering

    Get PDF
    Efficient rendering of photo-realistic virtual worlds is a long standing effort of computer graphics. Modern graphics techniques have succeeded in synthesizing photo-realistic images from hand-crafted scene representations. However, the automatic generation of shape, materials, lighting, and other aspects of scenes remains a challenging problem that, if solved, would make photo-realistic computer graphics more widely accessible. Concurrently, progress in computer vision and machine learning have given rise to a new approach to image synthesis and editing, namely deep generative models. Neural rendering is a new and rapidly emerging field that combines generative machine learning techniques with physical knowledge from computer graphics, e.g., by the integration of differentiable rendering into network training. With a plethora of applications in computer graphics and vision, neural rendering is poised to become a new area in the graphics community, yet no survey of this emerging field exists. This state-of-the-art report summarizes the recent trends and applications of neural rendering. We focus on approaches that combine classic computer graphics techniques with deep generative models to obtain controllable and photo-realistic outputs. Starting with an overview of the underlying computer graphics and machine learning concepts, we discuss critical aspects of neural rendering approaches. This state-of-the-art report is focused on the many important use cases for the described algorithms such as novel view synthesis, semantic photo manipulation, facial and body reenactment, relighting, free-viewpoint video, and the creation of photo-realistic avatars for virtual and augmented reality telepresence. Finally, we conclude with a discussion of the social implications of such technology and investigate open research problems

    StyleRig: Rigging StyleGAN for 3D Control over Portrait Images

    Get PDF
    StyleGAN generates photorealistic portrait images of faces with eyes, teeth, hair and context (neck, shoulders, background), but lacks a rig-like control over semantic face parameters that are interpretable in 3D, such as face pose, expressions, and scene illumination. Three-dimensional morphable face models (3DMMs) on the other hand offer control over the semantic parameters, but lack photorealism when rendered and only model the face interior, not other parts of a portrait image (hair, mouth interior, background). We present the first method to provide a face rig-like control over a pretrained and fixed StyleGAN via a 3DMM. A new rigging network, RigNet is trained between the 3DMM's semantic parameters and StyleGAN's input. The network is trained in a self-supervised manner, without the need for manual annotations. At test time, our method generates portrait images with the photorealism of StyleGAN and provides explicit control over the 3D semantic parameters of the face

    {StyleRig}: {R}igging {StyleGAN} for {3D} Control Over Portrait Images

    Get PDF

    Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

    Full text link
    Generating videos for visual storytelling can be a tedious and complex process that typically requires either live-action filming or graphics animation rendering. To bypass these challenges, our key idea is to utilize the abundance of existing video clips and synthesize a coherent storytelling video by customizing their appearances. We achieve this by developing a framework comprised of two functional modules: (i) Motion Structure Retrieval, which provides video candidates with desired scene or motion context described by query texts, and (ii) Structure-Guided Text-to-Video Synthesis, which generates plot-aligned videos under the guidance of motion structure and text prompts. For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure. For the second module, we propose a controllable video generation model that offers flexible controls over structure and characters. The videos are synthesized by following the structural guidance and appearance instruction. To ensure visual consistency across clips, we propose an effective concept personalization approach, which allows the specification of the desired character identities through text prompts. Extensive experiments demonstrate that our approach exhibits significant advantages over various existing baselines.Comment: Github: https://github.com/VideoCrafter/Animate-A-Story Project page: https://videocrafter.github.io/Animate-A-Stor
    corecore