Reconstructing urban areas in 3D out of satellite raster images has been a
long-standing and challenging goal of both academical and industrial research.
The rare methods today achieving this objective at a Level Of Details 2 rely
on procedural approaches based on geometry, and need stereo images and/or LIDAR
data as input. We here propose a method for urban 3D reconstruction named
KIBS(\textit{Keypoints Inference By Segmentation}), which comprises two novel
features: i) a full deep learning approach for the 3D detection of the roof
sections, and ii) only one single (non-orthogonal) satellite raster image as
model input. This is achieved in two steps: i) by a Mask R-CNN model performing
a 2D segmentation of the buildings' roof sections, and after blending these
latter segmented pixels within the RGB satellite raster image, ii) by another
identical Mask R-CNN model inferring the heights-to-ground of the roof
sections' corners via panoptic segmentation, unto full 3D reconstruction of the
buildings and city. We demonstrate the potential of the KIBS method by
reconstructing different urban areas in a few minutes, with a Jaccard index for
the 2D segmentation of individual roof sections of 88.55% and 75.21% on
our two data sets resp., and a height's mean error of such correctly segmented
pixels for the 3D reconstruction of 1.60 m and 2.06 m on our two data sets
resp., hence within the LOD2 precision range