Text-to-image diffusion models pre-trained on billions of image-text pairs
have recently enabled 3D content creation by optimizing a randomly initialized
differentiable 3D representation with score distillation. However, the
optimization process suffers slow convergence and the resultant 3D models often
exhibit two limitations: (a) quality concerns such as missing attributes and
distorted shape and texture; (b) extremely low diversity comparing to
text-guided image synthesis. In this paper, we show that the conflict between
the 3D optimization process and uniform timestep sampling in score distillation
is the main reason for these limitations. To resolve this conflict, we propose
to prioritize timestep sampling with monotonically non-increasing functions,
which aligns the 3D optimization process with the sampling process of diffusion
model. Extensive experiments show that our simple redesign significantly
improves 3D content creation with faster convergence, better quality and
diversity.Comment: ICLR 202