In this work, we use multi-view aerial images to reconstruct the geometry,
lighting, and material of facades using neural signed distance fields (SDFs).
Without the requirement of complex equipment, our method only takes simple RGB
images captured by a drone as inputs to enable physically based and
photorealistic novel-view rendering, relighting, and editing. However, a
real-world facade usually has complex appearances ranging from diffuse rocks
with subtle details to large-area glass windows with specular reflections,
making it hard to attend to everything. As a result, previous methods can
preserve the geometry details but fail to reconstruct smooth glass windows or
verse vise. In order to address this challenge, we introduce three spatial- and
semantic-adaptive optimization strategies, including a semantic regularization
approach based on zero-shot segmentation techniques to improve material
consistency, a frequency-aware geometry regularization to balance surface
smoothness and details in different surfaces, and a visibility probe-based
scheme to enable efficient modeling of the local lighting in large-scale
outdoor environments. In addition, we capture a real-world facade aerial 3D
scanning image set and corresponding point clouds for training and
benchmarking. The experiment demonstrates the superior quality of our method on
facade holistic inverse rendering, novel view synthesis, and scene editing
compared to state-of-the-art baselines