City modeling and generation have attracted an increased interest in various
applications, including gaming, urban planning, and autonomous driving. Unlike
previous works focused on the generation of single objects or indoor scenes,
the huge volumes of spatial data in cities pose a challenge to the generative
models. Furthermore, few publicly available 3D real-world city datasets also
hinder the development of methods for city generation. In this paper, we first
collect over 3,000,000 geo-referenced objects for the city of New York, Zurich,
Tokyo, Berlin, Boston and several other large cities. Based on this dataset, we
propose AETree, a tree-structured auto-encoder neural network, for city
generation. Specifically, we first propose a novel Spatial-Geometric Distance
(SGD) metric to measure the similarity between building layouts and then
construct a binary tree over the raw geometric data of building based on the
SGD metric. Next, we present a tree-structured network whose encoder learns to
extract and merge spatial information from bottom-up iteratively. The resulting
global representation is reversely decoded for reconstruction or generation. To
address the issue of long-dependency as the level of the tree increases, a Long
Short-Term Memory (LSTM) Cell is employed as a basic network element of the
proposed AETree. Moreover, we introduce a novel metric, Overlapping Area Ratio
(OAR), to quantitatively evaluate the generation results. Experiments on the
collected dataset demonstrate the effectiveness of the proposed model on 2D and
3D city generation. Furthermore, the latent features learned by AETree can
serve downstream urban planning applications