Text-to-3D generation is to craft a 3D object according to a natural language
description. This can significantly reduce the workload for manually designing
3D models and provide a more natural way of interaction for users. However,
this problem remains challenging in recovering the fine-grained details
effectively and optimizing a large-size 3D output efficiently. Inspired by the
success of progressive learning, we propose a Multi-Scale Triplane Network
(MTN) and a new progressive learning strategy. As the name implies, the
Multi-Scale Triplane Network consists of four triplanes transitioning from low
to high resolution. The low-resolution triplane could serve as an initial shape
for the high-resolution ones, easing the optimization difficulty. To further
enable the fine-grained details, we also introduce the progressive learning
strategy, which explicitly demands the network to shift its focus of attention
from simple coarse-grained patterns to difficult fine-grained patterns. Our
experiment verifies that the proposed method performs favorably against
existing methods. For even the most challenging descriptions, where most
existing methods struggle to produce a viable shape, our proposed method
consistently delivers. We aspire for our work to pave the way for automatic 3D
prototyping via natural language descriptions