2 research outputs found
Deep Learning Guided Building Reconstruction from Satellite Imagery-derived Point Clouds
3D urban reconstruction of buildings from remotely sensed imagery has drawn
significant attention during the past two decades. While aerial imagery and
LiDAR provide higher resolution, satellite imagery is cheaper and more
efficient to acquire for large scale need. However, the high, orbital altitude
of satellite observation brings intrinsic challenges, like unpredictable
atmospheric effect, multi view angles, significant radiometric differences due
to the necessary multiple views, diverse land covers and urban structures in a
scene, small base-height ratio or narrow field of view, all of which may
degrade 3D reconstruction quality. To address these major challenges, we
present a reliable and effective approach for building model reconstruction
from the point clouds generated from multi-view satellite images. We utilize
multiple types of primitive shapes to fit the input point cloud. Specifically,
a deep-learning approach is adopted to distinguish the shape of building roofs
in complex and yet noisy scenes. For points that belong to the same roof shape,
a multi-cue, hierarchical RANSAC approach is proposed for efficient and
reliable segmenting and reconstructing the building point cloud. Experimental
results over four selected urban areas (0.34 to 2.04 sq km in size) demonstrate
the proposed method can generate detailed roof structures under noisy data
environments. The average successful rate for building shape recognition is
83.0%, while the overall completeness and correctness are over 70% with
reference to ground truth created from airborne lidar. As the first effort to
address the public need of large scale city model generation, the development
is deployed as open source software
Towards Single Stage Weakly Supervised Semantic Segmentation
The costly process of obtaining semantic segmentation labels has driven
research towards weakly supervised semantic segmentation (WSSS) methods, using
only image-level, point, or box labels. The lack of dense scene representation
requires methods to increase complexity to obtain additional semantic
information about the scene, often done through multiple stages of training and
refinement. Current state-of-the-art (SOTA) models leverage image-level labels
to produce class activation maps (CAMs) which go through multiple stages of
refinement before they are thresholded to make pseudo-masks for supervision.
The multi-stage approach is computationally expensive, and dependency on
image-level labels for CAMs generation lacks generalizability to more complex
scenes. In contrary, our method offers a single-stage approach generalizable to
arbitrary dataset, that is trainable from scratch, without any dependency on
pre-trained backbones, classification, or separate refinement tasks. We utilize
point annotations to generate reliable, on-the-fly pseudo-masks through refined
and filtered features. While our method requires point annotations that are
only slightly more expensive than image-level annotations, we are to
demonstrate SOTA performance on benchmark datasets (PascalVOC 2012), as well as
significantly outperform other SOTA WSSS methods on recent real-world datasets
(CRAID, CityPersons, IAD)