Change detection in remote sensing imagery is essential for a variety of
applications such as urban planning, disaster management, and climate research.
However, existing methods for identifying semantically changed areas overlook
the availability of semantic information in the form of existing maps
describing features of the earth's surface. In this paper, we leverage this
information for change detection in bi-temporal images. We show that the simple
integration of the additional information via concatenation of latent
representations suffices to significantly outperform state-of-the-art change
detection methods. Motivated by this observation, we propose the new task of
Conditional Change Detection, where pre-change semantic information is used as
input next to bi-temporal images. To fully exploit the extra information, we
propose MapFormer, a novel architecture based on a multi-modal feature fusion
module that allows for feature processing conditioned on the available semantic
information. We further employ a supervised, cross-modal contrastive loss to
guide the learning of visual representations. Our approach outperforms existing
change detection methods by an absolute 11.7% and 18.4% in terms of binary
change IoU on DynamicEarthNet and HRSCD, respectively. Furthermore, we
demonstrate the robustness of our approach to the quality of the pre-change
semantic information and the absence pre-change imagery. The code will be made
publicly available