Remote sensing observations of the Earth's surface are frequently stymied by
clouds, water vapour, and aerosols in our atmosphere. These degrade or preclude
the measurementof quantities critical to scientific and, hence, societal
applications. In this study, we train a natural language processing (NLP)
algorithm with high-fidelity ocean simulations in order to accurately
reconstruct masked or missing data in sea surface temperature (SST)--i.e. one
of 54 essential climate variables identified by the Global Climate Observing
System. We demonstrate that the Enki model repeatedly outperforms previously
adopted inpainting techniques by up to an order-of-magnitude in reconstruction
error, while displaying high performance even in circumstances where the
majority of pixels are masked. Furthermore, experiments on real infrared sensor
data with masking fractions of at least 40% show reconstruction errors of less
than the known sensor uncertainty (RMSE < ~0.1K). We attribute Enki's success
to the attentive nature of NLP combined with realistic SST model outputs, an
approach that may be extended to other remote sensing variables. This study
demonstrates that systems built upon Enki--or other advanced systems like
it--may therefore yield the optimal solution to accurate estimates of otherwise
missing or masked parameters in climate-critical datasets sampling a rapidly
changing Earth.Comment: 21 pages, 6 main figure, 3 in Appendix; submitte