Fusing satellite imagery acquired with different sensors has been a
long-standing challenge of Earth observation, particularly across different
modalities such as optical and Synthetic Aperture Radar (SAR) images. Here, we
explore the joint analysis of imagery from different sensors in the light of
representation learning: we propose to learn a joint embedding of multiple
satellite sensors within a deep neural network. Our application problem is the
monitoring of lake ice on Alpine lakes. To reach the temporal resolution
requirement of the Swiss Global Climate Observing System (GCOS) office, we
combine three image sources: Sentinel-1 SAR (S1-SAR), Terra MODIS, and
Suomi-NPP VIIRS. The large gaps between the optical and SAR domains and between
the sensor resolutions make this a challenging instance of the sensor fusion
problem. Our approach can be classified as a late fusion that is learned in a
data-driven manner. The proposed network architecture has separate encoding
branches for each image sensor, which feed into a single latent embedding.
I.e., a common feature representation shared by all inputs, such that
subsequent processing steps deliver comparable output irrespective of which
sort of input image was used. By fusing satellite data, we map lake ice at a
temporal resolution of < 1.5 days. The network produces spatially explicit lake
ice maps with pixel-wise accuracies > 91% (respectively, mIoU scores > 60%) and
generalises well across different lakes and winters. Moreover, it sets a new
state-of-the-art for determining the important ice-on and ice-off dates for the
target lakes, in many cases meeting the GCOS requirement