62 research outputs found
Coupling Data Science Techniques and Numerical Weather Prediction Models for High-Impact Weather Prediction
Meteorologists have access to more model guidance and observations than ever before, but this additional information does not necessarily lead to better forecasts. New tools are needed to reduce the cognitive load on forecasters and to provide them with accurate, reliable consensus guidance. Techniques from the data science community, such as machine learning and image processing, have the potential to summarize and calibrate numerical weather prediction model output and to generate deterministic and probabilistic forecasts of high-impact weather. In this dissertation, I developed data-science-based approaches to improve the predictions of two high-impact weather domains: hail and solar irradiance. Both hail and solar irradiance produce large economic impacts, have non-Gaussian distributions of occurrence, are poorly observed, and are partially driven by processes too small to be resolved by numerical weather prediction models.
Hail forecasts were produced with convection-allowing model output from the Center for Analysis and Prediction of Storms and National Center for Atmospheric Research ensembles. The machine learning hail forecasts were compared against storm surrogate variables and physics-based diagnostic models of hail size. Initial machine learning hail forecasts reduced size errors but struggled with predicting extreme events. By coupling the machine learning model to predicting hail size distributions and estimating the distribution parameters jointly, the machine learning methods were able to show skill and reliability in predicting both severe and significant hail.
Machine learning model and data configurations for gridded solar irradiance forecasting were evaluated on two numerical modeling systems. The evaluation determined how machine learning model choice, closeness of fit to training data, training data aggregation, and interpolation method affected forecasts of clearness index at Oklahoma Mesonet sites not included in the training data. The choice of machine learning model, interpolation scheme, and loss function had the biggest impacts on performance. Errors tended to be lower at testing sites with sunnier weather and those that were closer to training sites. All of the machine learning methods produced reliable predictions but underestimated the frequency of cloudiness compared to observations
Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model
An ensemble post-processing method is developed for the probabilistic
prediction of severe weather (tornadoes, hail, and wind gusts) over the
conterminous United States (CONUS). The method combines conditional generative
adversarial networks (CGANs), a type of deep generative model, with a
convolutional neural network (CNN) to post-process convection-allowing model
(CAM) forecasts. The CGANs are designed to create synthetic ensemble members
from deterministic CAM forecasts, and their outputs are processed by the CNN to
estimate the probability of severe weather. The method is tested using
High-Resolution Rapid Refresh (HRRR) 1--24 hr forecasts as inputs and Storm
Prediction Center (SPC) severe weather reports as targets. The method produced
skillful predictions with up to 20% Brier Skill Score (BSS) increases compared
to other neural-network-based reference methods using a testing dataset of HRRR
forecasts in 2021. For the evaluation of uncertainty quantification, the method
is overconfident but produces meaningful ensemble spreads that can distinguish
good and bad forecasts. The quality of CGAN outputs is also evaluated. Results
show that the CGAN outputs behave similarly to a numerical ensemble; they
preserved the inter-variable correlations and the contribution of influential
predictors as in the original HRRR forecasts. This work provides a novel
approach to post-process CAM output using neural networks that can be applied
to severe weather prediction
Mimicking non-ideal instrument behavior for hologram processing using neural style translation
Holographic cloud probes provide unprecedented information on cloud particle
density, size and position. Each laser shot captures particles within a large
volume, where images can be computationally refocused to determine particle
size and shape. However, processing these holograms, either with standard
methods or with machine learning (ML) models, requires considerable
computational resources, time and occasional human intervention. ML models are
trained on simulated holograms obtained from the physical model of the probe
since real holograms have no absolute truth labels. Using another processing
method to produce labels would be subject to errors that the ML model would
subsequently inherit. Models perform well on real holograms only when image
corruption is performed on the simulated images during training, thereby
mimicking non-ideal conditions in the actual probe (Schreck et. al, 2022).
Optimizing image corruption requires a cumbersome manual labeling effort.
Here we demonstrate the application of the neural style translation approach
(Gatys et. al, 2016) to the simulated holograms. With a pre-trained
convolutional neural network (VGG-19), the simulated holograms are ``stylized''
to resemble the real ones obtained from the probe, while at the same time
preserving the simulated image ``content'' (e.g. the particle locations and
sizes). Two image similarity metrics concur that the stylized images are more
like real holograms than the synthetic ones. With an ML model trained to
predict particle locations and shapes on the stylized data sets, we observed
comparable performance on both simulated and real holograms, obviating the need
to perform manual labeling. The described approach is not specific to hologram
images and could be applied in other domains for capturing noise and
imperfections in observational instruments to make simulated data more like
real world observations.Comment: 23 pages, 9 figure
Physically Explainable Deep Learning for Convective Initiation Nowcasting Using GOES-16 Satellite Observations
Convection initiation (CI) nowcasting remains a challenging problem for both
numerical weather prediction models and existing nowcasting algorithms. In this
study, object-based probabilistic deep learning models are developed to predict
CI based on multichannel infrared GOES-R satellite observations. The data come
from patches surrounding potential CI events identified in Multi-Radar
Multi-Sensor Doppler weather radar products over the Great Plains region from
June and July 2020 and June 2021. An objective radar-based approach is used to
identify these events. The deep learning models significantly outperform the
classical logistic model at lead times up to 1 hour, especially on the false
alarm ratio. Through case studies, the deep learning model exhibits the
dependence on the characteristics of clouds and moisture at multiple levels.
Model explanation further reveals the model's decision-making process with
different baselines. The explanation results highlight the importance of
moisture and cloud features at different levels depending on the choice of
baseline. Our study demonstrates the advantage of using different baselines in
further understanding model behavior and gaining scientific insights
Machine Learning for Stochastic Parameterization: Generative Adversarial Networks in the Lorenz '96 Model
Stochastic parameterizations account for uncertainty in the representation of
unresolved sub-grid processes by sampling from the distribution of possible
sub-grid forcings. Some existing stochastic parameterizations utilize
data-driven approaches to characterize uncertainty, but these approaches
require significant structural assumptions that can limit their scalability.
Machine learning models, including neural networks, are able to represent a
wide range of distributions and build optimized mappings between a large number
of inputs and sub-grid forcings. Recent research on machine learning
parameterizations has focused only on deterministic parameterizations. In this
study, we develop a stochastic parameterization using the generative
adversarial network (GAN) machine learning framework. The GAN stochastic
parameterization is trained and evaluated on output from the Lorenz '96 model,
which is a common baseline model for evaluating both parameterization and data
assimilation techniques. We evaluate different ways of characterizing the input
noise for the model and perform model runs with the GAN parameterization at
weather and climate timescales. Some of the GAN configurations perform better
than a baseline bespoke parameterization at both timescales, and the networks
closely reproduce the spatio-temporal correlations and regimes of the Lorenz
'96 system. We also find that in general those models which produce skillful
forecasts are also associated with the best climate simulations.Comment: Submitted to Journal of Advances in Modeling Earth Systems (JAMES
Recommended from our members
Neural Network Emulation of the Formation of Organic Aerosols Based on the Explicit GECKO-A Chemistry Model
Secondary organic aerosols (SOA) are formed from oxidation of hundreds of volatile organic compounds (VOCs) emitted from anthropogenic and natural sources. Accurate predictions of this chemistry are key for air quality and climate studies due to the large contribution of organic aerosols to submicron aerosol mass. Currently, only explicit models, such as the Generator for Explicit Chemistry and Kinetics of Organics in the Atmosphere (GECKO-A), can fully represent the chemical processing of thousands of organic species. However, their extreme computational cost prohibits their use in current chemistry-climate models, which rely on simplified empirical parameterizations to predict SOA concentrations. This study demonstrates that machine learning can accurately emulate SOA formation from an explicit chemistry model with an approximate error of 2%–8%, up to five days for several precursors and for potentially up to one month for recurrent neural network models, and with 100 to 100,000 times speedup over GECKO-A, making it computationally useable in a chemistry-climate model. We generated the training data using thousands of GECKO-A box simulations sampled from a broad range of initial environmental conditions, and focused on three representative SOA precursors: the oxidation by OH of two anthropogenic (toluene, dodecane), and the oxidation by O3 of one biogenic VOC (α-pinene). We compare several neural models and quantify their underlying uncertainty and robustness. These are promising results, suggesting that neural network models could be applied to predict SOA in chemistry-climate models, limited however to the range of environmental conditions that were considered in the training datasets.
</p
Evidential Deep Learning: Enhancing Predictive Uncertainty Estimation for Earth System Science Applications
Robust quantification of predictive uncertainty is critical for understanding
factors that drive weather and climate outcomes. Ensembles provide predictive
uncertainty estimates and can be decomposed physically, but both physics and
machine learning ensembles are computationally expensive. Parametric deep
learning can estimate uncertainty with one model by predicting the parameters
of a probability distribution but do not account for epistemic uncertainty..
Evidential deep learning, a technique that extends parametric deep learning to
higher-order distributions, can account for both aleatoric and epistemic
uncertainty with one model. This study compares the uncertainty derived from
evidential neural networks to those obtained from ensembles. Through
applications of classification of winter precipitation type and regression of
surface layer fluxes, we show evidential deep learning models attaining
predictive accuracy rivaling standard methods, while robustly quantifying both
sources of uncertainty. We evaluate the uncertainty in terms of how well the
predictions are calibrated and how well the uncertainty correlates with
prediction error. Analyses of uncertainty in the context of the inputs reveal
sensitivities to underlying meteorological processes, facilitating
interpretation of the models. The conceptual simplicity, interpretability, and
computational efficiency of evidential neural networks make them highly
extensible, offering a promising approach for reliable and practical
uncertainty quantification in Earth system science modeling. In order to
encourage broader adoption of evidential deep learning in Earth System Science,
we have developed a new Python package, MILES-GUESS
(https://github.com/ai2es/miles-guess), that enables users to train and
evaluate both evidential and ensemble deep learning
The importance of sea ice area biases in 21st century multimodel projections of Antarctic temperature and precipitation
Climate models exhibit large biases in sea ice area (SIA) in their historical simulations. This study explores the impacts of these biases on multimodel uncertainty in Coupled Model Intercomparison Project phase 5 (CMIP5) ensemble projections of 21st century change in Antarctic surface temperature, net precipitation, and SIA. The analysis is based on time slice climatologies in the Representative Concentration Pathway 8.5 future scenario (2070–2099) and historical (1970–1999) simulations across 37 different CMIP5 models. Projected changes in net precipitation, temperature, and SIA are found to be strongly associated with simulated historical mean SIA (e.g., cross-model correlations of r = 0.77, 0.71, and −0.85, respectively). Furthermore, historical SIA bias is found to have a large impact on the simulated ratio between net precipitation response and temperature response. This ratio is smaller in models with smaller-than-observed SIA. These strong emergent relationships on SIA bias could, if found to be physically robust, be exploited to give more precise climate projections for Antarctica
- …