5 research outputs found
Mitigating masked pixels in climate-critical datasets
Remote sensing observations of the Earth's surface are frequently stymied by
clouds, water vapour, and aerosols in our atmosphere. These degrade or preclude
the measurementof quantities critical to scientific and, hence, societal
applications. In this study, we train a natural language processing (NLP)
algorithm with high-fidelity ocean simulations in order to accurately
reconstruct masked or missing data in sea surface temperature (SST)--i.e. one
of 54 essential climate variables identified by the Global Climate Observing
System. We demonstrate that the Enki model repeatedly outperforms previously
adopted inpainting techniques by up to an order-of-magnitude in reconstruction
error, while displaying high performance even in circumstances where the
majority of pixels are masked. Furthermore, experiments on real infrared sensor
data with masking fractions of at least 40% show reconstruction errors of less
than the known sensor uncertainty (RMSE < ~0.1K). We attribute Enki's success
to the attentive nature of NLP combined with realistic SST model outputs, an
approach that may be extended to other remote sensing variables. This study
demonstrates that systems built upon Enki--or other advanced systems like
it--may therefore yield the optimal solution to accurate estimates of otherwise
missing or masked parameters in climate-critical datasets sampling a rapidly
changing Earth.Comment: 21 pages, 6 main figure, 3 in Appendix; submitte
Mitigating masked pixels in a climate-critical ocean dataset
Clouds and other data artefacts frequently limit the retrieval of key variables from remotely sensed Earth observations. We train a natural language processing (NLP)-inspired algorithm with high-fidelity ocean simulations to accurately reconstruct masked or missing data in sea surface temperature (SST) fields—one of 54 essential climate variables identified by the Global Climate Observing System. We demonstrate that the resulting model, referred to as Enki, repeatedly outperforms previously adopted inpainting techniques by up to an order of magnitude in reconstruction error, while displaying exceptional performance even in circumstances where the majority of pixels are masked. Furthermore, experiments on real infrared sensor data with masked percentages of at least 40% show reconstruction errors of less than the known uncertainty of this sensor (root mean square error (RMSE) ≲0.1 K). We attribute Enki’s success to the attentive nature of NLP combined with realistic SST model outputs—an approach that could be extended to other remotely sensed variables. This study demonstrates that systems built upon Enki—or other advanced systems like it—may therefore yield the optimal solution to mitigating masked pixels in in climate-critical ocean datasets sampling a rapidly changing Earth
Recommended from our members
Reconstructing Sea Surface Temperature Images: A Masked Autoencoder Approach for Cloud Masking and Reconstruction
This thesis presents a new algorithm to mitigate cloud masking in the analysis of sea surface temperature (SST) data generated by remote sensing technologies such as infrared sensor satellites like the Level-2 Visible-Infrared Imager-Radiometer Suite (VIIRS). Cloud coverage interferes with the analysis of all remote sensing data using wavelengths shorter than ≈ 2 microns, significantly limiting the quantity of usable data and creating a biased geographical distribution towards equatorial and coastal regions. Prior studies have led to use of in-painting algorithms like Navier-Stokes but was typically only used up to 5% masking and had limited success. To address this issue, we propose an unsupervised machine learning algorithm called ENKI which uses a Vision Transformer with Masked Autoencoding to reconstruct pixels that are masked out by clouds. We train four different models of ENKI with training mask ratios (referred to as t) set to 10%, 35%, 50%, and 75% on a generated Ocean General Circulation Model (OGCM) dataset known as LLC4320. To evaluate performance we reconstruct LLC 4320 SST images at a patch masking ratio of 10%, 20%, 30%, 40%, 50% (referred to as p) and examine reconstruction qualitatively and statistically by calculating the root means squared error (RMSE) of reconstructed patches. Through our analysis we discover that edge patches contain a higher error rate and that a bias appears in some models when reconstructing images at p masking ratios away from their training mask ratio t. But we consistently find that at all levels of p masking ratios there is one or multiple models that create reconstructions with a mean RMSE of less than ≈ 0.03K which is lower than the estimated sensor error of VIIRS data which is ≈ 0.078 K for daytime, along scan, and ≈ 0.05 K for nighttime, along-scan. We also conclude the complexity of dynamics within an image and the p masking ratio affect RMSE with higher complexity and p masking seeing higher RMSE values. Critically, we also discover at a patch level that despite RMSE having some correlation to complexity, they are not directly proportional, and RMSE increases at a slower rate as complexity within a patch increases. Our analysis concludes that ENKI shows great promise in surpassing in-painting as a means of reconstructing cloud masking, and future research seeks to analyze ENKI’s capabilities in reconstructing real world data