58 research outputs found
Stochastic and Statistical Methods in Climate, Atmosphere, and Ocean Science
Introduction
The behavior of the atmosphere, oceans, and climate is intrinsically uncertain. The basic physical principles that govern atmospheric and oceanic flows are well known, for example, the Navier-Stokes equations for fluid flow, thermodynamic properties of moist air, and the effects of density stratification and Coriolis force. Notwithstanding, there are major sources of randomness and uncertainty that prevent perfect prediction and complete understanding of these flows.
The climate system involves a wide spectrum of space and time scales due to processes occurring on the order of microns and milliseconds such as the formation of cloud and rain droplets to global phenomena involving annual and decadal oscillations such as the EL Nio-Southern Oscillation (ENSO) and the Pacific Decadal Oscillation (PDO) [5]. Moreover, climate records display a spectral variability ranging from 1 cycle per month to 1 cycle per 100, 000 years [23]. The complexity of the climate system stems in large part from the inherent nonlinearities of fluid mechanics and the phase changes of water substances. The atmosphere and oceans are turbulent, nonlinear systems that display chaotic behavior (e.g., [39]). The time evolutions of the same chaotic system starting from two slightly different initial states diverge exponentially fast, so that chaotic systems are marked by limited predictability. Beyond the so-called predictability horizon (on the order of 10 days for the atmosphere), initial state uncertainties (e.g., due to imperfect observations) have grown to the point that straightforward forecasts are no longer useful.
Another major source of uncertainty stems from the fact that numerical models for atmospheric and oceanic flows cannot describe all relevant physical processes at once. These models are in essence discretized partial differential equations (PDEs), and the derivation of suitable PDEs (e.g., the so-called primitive equations) from more general ones that are less convenient for computation (e.g., the full Navier-Stokes equations) involves approximations and simplifications that introduce errors in the equations. Furthermore, as a result of spatial discretization of the PDEs, numerical models have finite resolution so that small-scale processes with length scales below the model grid scale are not resolved. These limitations are unavoidable, leading to model error and uncertainty.
The uncertainties due to chaotic behavior and unresolved processes motivate the use of stochastic and statistical methods for modeling and understanding climate, atmosphere, and oceans. Models can be augmented with random elements in order to represent time-evolving uncertainties, leading to stochastic models. Weather forecasts and climate predictions are increasingly expressed in probabilistic terms, making explicit the margins of uncertainty inherent to any prediction
Reduced model-error source terms for fluid flow
It is well known that the wide range of spatial and temporal scales present in
geophysical flow problems represents a (currently) insurmountable computational bottleneck,
which must be circumvented by a coarse-graining procedure. The effect of the unresolved fluid
motions enters the coarse-grained equations as an unclosed forcing term, denoted as the ’eddy
forcing’. Traditionally, the system is closed by approximate deterministic closure models, i.e.
so-called parameterizations. Instead of creating a deterministic parameterization, some recent
efforts have focused on creating a stochastic, data-driven surrogate model for the eddy forcing
from a (limited) set of reference data, with the goal of accurately capturing the long-term flow
statistics. Since the eddy forcing is a dynamically evolving field, a surrogate should be able to
mimic the complex spatial patterns displayed by the eddy forcing. Rather than creating such a
(fully data-driven) surrogate, we propose to precede the surrogate construction step by a proce-
dure that replaces the eddy forcing with a new model-error source term which: i) is tailor-made
to capture spatially-integrated statistics of interest, ii) strikes a balance between physical in-
sight and data-driven modelling , and iii) significantly reduces the amount of training data that
is needed. Instead of creating a surrogate for an evolving field, we now only require a surrogate
model for one scalar time series per statistical quantity-of-interest. Our current surrogate mod-
elling approach builds on a resampling strategy, where we create a probability density function
of the reduced training data that is conditional on (time-lagged) resolved-scale variables. We
derive the model-error source terms, and construct the reduced surrogate using an ocean model
of two-dimensional turbulence in a doubly periodic square domain
Reducing data-driven dynamical subgrid scale models by physical constraints
Recent years have seen a growing interest in using data-driven (machine-learning) techniques for the construction of cheap surrogate models of turbulent subgrid scale stresses. These stresses display complex spatio-temporal structures, and constitute a difficult surrogate target. In this paper we propose a data-preprocessing step, in which we derive alternative subgrid scale models which are virtually exact for a user-specified set of spatially integrated quantities of interest. The unclosed component of these new subgrid scale models is of the same size as this set of integrated quantities of interest. As a result, the corresponding training data is massively reduced in size, decreasing the complexity of the subsequent surrogate construction
Quantifying dependencies for sensitivity analysis with multivariate input sample data
We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the RĂ©nyi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown and only a sample of the inputs is available. We introduce an estimator to quantify dependency based on the MST length, and investigate its properties with several numerical examples. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. We apply our proposed method to an artificial testcase based on the Ishigami function, as well as to a real-world testcase involving sediment transport in the North Sea. The results are consistent with prior knowledge and heuristic understanding, as well as with variance-based analysis using Sobol indices in the case where these indices can be computed
Towards data-driven dynamic surrogate models for ocean flow
Coarse graining of (geophysical) flow problems is a necessity brought upon us by the wide range of spatial and temporal scales present in these problems, which cannot be all represented on a numerical grid without an inordinate amount of computational resources. Traditionally, the effect of the unresolved eddies is approximated by deterministic closure models, i.e. so-called parameterizations. The effect of the unresolved eddy field enters the resolved-scale equations as a forcing term, denoted as the’eddy forcing’. Instead of creating a deterministic parameterization, our goal is to infer a stochastic, data-driven surrogate model for the eddy forcing from a (limited) set of reference data, with the goal of accurately capturing the long-term flow statistics. Our surrogate modelling approach essentially builds on a resampling strategy, where we create a probability density function of the reference data that is conditional on (time-lagged) resolved-scale variables. The choice of resolved-scale variables, as well as the employed time lag, is essential to the performance of the surrogate. We will demonstrate the effect of different modelling choices on a simplified ocean model of two-dimensional turbulence in a doubly periodic square domain
- …