14 research outputs found
TexFusion: Synthesizing 3D Textures with Text-Guided Image Diffusion Models
We present TexFusion (Texture Diffusion), a new method to synthesize textures
for given 3D geometries, using large-scale text-guided image diffusion models.
In contrast to recent works that leverage 2D text-to-image diffusion models to
distill 3D objects using a slow and fragile optimization process, TexFusion
introduces a new 3D-consistent generation technique specifically designed for
texture synthesis that employs regular diffusion model sampling on different 2D
rendered views. Specifically, we leverage latent diffusion models, apply the
diffusion model's denoiser on a set of 2D renders of the 3D object, and
aggregate the different denoising predictions on a shared latent texture map.
Final output RGB textures are produced by optimizing an intermediate neural
color field on the decodings of 2D renders of the latent texture. We thoroughly
validate TexFusion and show that we can efficiently generate diverse, high
quality and globally coherent textures. We achieve state-of-the-art text-guided
texture synthesis performance using only image diffusion models, while avoiding
the pitfalls of previous distillation-based methods. The text-conditioning
offers detailed control and we also do not rely on any ground truth 3D textures
for training. This makes our method versatile and applicable to a broad range
of geometry and texture types. We hope that TexFusion will advance AI-based
texturing of 3D assets for applications in virtual reality, game design,
simulation, and more.Comment: Videos and more results on
https://research.nvidia.com/labs/toronto-ai/texfusion
A Sampling Approach to Generating Closely Interacting 3D Pose-pairs from 2D Annotations
We introduce a data-driven method to generate a large number of plausible, closely interacting 3D human pose-pairs, for a given motion category, e.g., wrestling or salsa dance. With much difficulty in acquiring close interactions using 3D sensors, our approach utilizes abundant existing video data which cover many human activities. Instead of treating the data generation problem as one of reconstruction, either through 3D acquisition or direct 2D-to-3D data lifting from video annotations, we present a solution based on Markov Chain Monte Carlo (MCMC) sampling. With a focus on efficient sampling over the space of close interactions, rather than pose spaces, we develop a novel representation called interaction coordinates (IC) to encode both poses and their interactions in an integrated manner. Plausibility of a 3D pose-pair is then defined based on the ICs and with respect to the annotated 2D pose-pairs from video. We show that our sampling-based approach is able to efficiently synthesize a large volume of plausible, closely interacting 3D pose-pairs which provide a good coverage of the input 2D pose-pairs
GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images
As several industries are moving towards modeling massive 3D virtual worlds,
the need for content creation tools that can scale in terms of the quantity,
quality, and diversity of 3D content is becoming evident. In our work, we aim
to train performant 3D generative models that synthesize textured meshes which
can be directly consumed by 3D rendering engines, thus immediately usable in
downstream applications. Prior works on 3D generative modeling either lack
geometric details, are limited in the mesh topology they can produce, typically
do not support textures, or utilize neural renderers in the synthesis process,
which makes their use in common 3D software non-trivial. In this work, we
introduce GET3D, a Generative model that directly generates Explicit Textured
3D meshes with complex topology, rich geometric details, and high-fidelity
textures. We bridge recent success in the differentiable surface modeling,
differentiable rendering as well as 2D Generative Adversarial Networks to train
our model from 2D image collections. GET3D is able to generate high-quality 3D
textured meshes, ranging from cars, chairs, animals, motorbikes and human
characters to buildings, achieving significant improvements over previous
methods.Comment: NeurIPS 2022, Project Page: https://nv-tlabs.github.io/GET3D
NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models
Automatically generating high-quality real world 3D scenes is of enormous
interest for applications such as virtual reality and robotics simulation.
Towards this goal, we introduce NeuralField-LDM, a generative model capable of
synthesizing complex 3D environments. We leverage Latent Diffusion Models that
have been successfully utilized for efficient high-quality 2D content creation.
We first train a scene auto-encoder to express a set of image and pose pairs as
a neural field, represented as density and feature voxel grids that can be
projected to produce novel views of the scene. To further compress this
representation, we train a latent-autoencoder that maps the voxel grids to a
set of latent representations. A hierarchical diffusion model is then fit to
the latents to complete the scene generation pipeline. We achieve a substantial
improvement over existing state-of-the-art scene generation models.
Additionally, we show how NeuralField-LDM can be used for a variety of 3D
content creation applications, including conditional scene generation, scene
inpainting and scene style manipulation.Comment: CVPR 202