Recent progress in self-supervision has shown that pre-training large neural
networks on vast amounts of unsupervised data can lead to substantial increases
in generalization to downstream tasks. Such models, recently coined foundation
models, have been transformational to the field of natural language processing.
Variants have also been proposed for image data, but their applicability to
remote sensing tasks is limited. To stimulate the development of foundation
models for Earth monitoring, we propose a benchmark comprised of six
classification and six segmentation tasks, which were carefully curated and
adapted to be both relevant to the field and well-suited for model evaluation.
We accompany this benchmark with a robust methodology for evaluating models and
reporting aggregated results to enable a reliable assessment of progress.
Finally, we report results for 20 baselines to gain information about the
performance of existing models. We believe that this benchmark will be a driver
of progress across a variety of Earth monitoring tasks