984 research outputs found

    Quantifying dependencies for sensitivity analysis with multivariate input sample data

    Get PDF
    We present a novel method for quantifying dependencies in multivariate datasets, based on estimating the R\'{e}nyi entropy by minimum spanning trees (MSTs). The length of the MSTs can be used to order pairs of variables from strongly to weakly dependent, making it a useful tool for sensitivity analysis with dependent input variables. It is well-suited for cases where the input distribution is unknown and only a sample of the inputs is available. We introduce an estimator to quantify dependency based on the MST length, and investigate its properties with several numerical examples. To reduce the computational cost of constructing the exact MST for large datasets, we explore methods to compute approximations to the exact MST, and find the multilevel approach introduced recently by Zhong et al. (2015) to be the most accurate. We apply our proposed method to an artificial testcase based on the Ishigami function, as well as to a real-world testcase involving sediment transport in the North Sea. The results are consistent with prior knowledge and heuristic understanding, as well as with variance-based analysis using Sobol indices in the case where these indices can be computed

    The relationship between biologicals and innovation

    Get PDF
    There are no two European countries with the same – or even similar – health care systems. But they share one common denominator: in all European countries the costs for health care keep on rising faster than their GDP. The growing number of elderly people and the related extra claim to the system can only partly explain this cost increase. There are other drivers as well. Although the increasing use of generic drugs tends to reduce the cost of medicines, there is an upward pressure through the category of novel medicines, in particular biologicals: medicinal product made through recombinant DNA technology. In the list of 10 best-selling drugs (total sales 75 billion USin2013),7outof10arebiologicals.All7sellbetween5and10billionUS in 2013), 7 out of 10 are biologicals. All 7 sell between 5 and 10 billion US per annum. These biologicals are used to treat serious, often life-threatening diseases, such as cancer and diabetes. And the price for the annualised cost of treatment per patient can be as high 100,000 Euros or even higher. To explain the high prices of biologicals, two arguments are being used: I) these products are very costly to produce, because of the complex manufacturing process including downstream processing, and /or II) the cost for innovative drug product development is high: 4.2 billion+ euros (period 2006-2012) for a successful product including the money to be recouped for the many failed drug products in the pipeline (‘attrition’) (PWC, 2012). And, somebody has to pay the bill. In the following I will demonstrate that the manufacturing costs argument is incorrect and that indeed ‘big pharma’ is –for now- still profitable because of these highly successful biologicals. But there is more to it.peer-reviewe

    Resampling with neural networks for stochastic parameterization in multiscale systems

    Get PDF
    In simulations of multiscale dynamical systems, not all relevant processes can be resolved explicitly. Taking the effect of the unresolved processes into account is important, which introduces the need for paramerizations. We present a machine-learning method, used for the conditional resampling of observations or reference data from a fully resolved simulation. It is based on the probabilistic classiffcation of subsets of reference data, conditioned on macroscopic variables. This method is used to formulate a parameterization that is stochastic, taking the uncertainty of the unresolved scales into account. We validate our approach on the Lorenz 96 system, using two different parameter settings which are challenging for parameterization methods.Comment: 27 pages, 17 figures. Submitte

    Stochastic and Statistical Methods in Climate, Atmosphere, and Ocean Science

    Get PDF
    Introduction The behavior of the atmosphere, oceans, and climate is intrinsically uncertain. The basic physical principles that govern atmospheric and oceanic flows are well known, for example, the Navier-Stokes equations for fluid flow, thermodynamic properties of moist air, and the effects of density stratification and Coriolis force. Notwithstanding, there are major sources of randomness and uncertainty that prevent perfect prediction and complete understanding of these flows. The climate system involves a wide spectrum of space and time scales due to processes occurring on the order of microns and milliseconds such as the formation of cloud and rain droplets to global phenomena involving annual and decadal oscillations such as the EL Nio-Southern Oscillation (ENSO) and the Pacific Decadal Oscillation (PDO) [5]. Moreover, climate records display a spectral variability ranging from 1 cycle per month to 1 cycle per 100, 000 years [23]. The complexity of the climate system stems in large part from the inherent nonlinearities of fluid mechanics and the phase changes of water substances. The atmosphere and oceans are turbulent, nonlinear systems that display chaotic behavior (e.g., [39]). The time evolutions of the same chaotic system starting from two slightly different initial states diverge exponentially fast, so that chaotic systems are marked by limited predictability. Beyond the so-called predictability horizon (on the order of 10 days for the atmosphere), initial state uncertainties (e.g., due to imperfect observations) have grown to the point that straightforward forecasts are no longer useful. Another major source of uncertainty stems from the fact that numerical models for atmospheric and oceanic flows cannot describe all relevant physical processes at once. These models are in essence discretized partial differential equations (PDEs), and the derivation of suitable PDEs (e.g., the so-called primitive equations) from more general ones that are less convenient for computation (e.g., the full Navier-Stokes equations) involves approximations and simplifications that introduce errors in the equations. Furthermore, as a result of spatial discretization of the PDEs, numerical models have finite resolution so that small-scale processes with length scales below the model grid scale are not resolved. These limitations are unavoidable, leading to model error and uncertainty. The uncertainties due to chaotic behavior and unresolved processes motivate the use of stochastic and statistical methods for modeling and understanding climate, atmosphere, and oceans. Models can be augmented with random elements in order to represent time-evolving uncertainties, leading to stochastic models. Weather forecasts and climate predictions are increasingly expressed in probabilistic terms, making explicit the margins of uncertainty inherent to any prediction

    Reduced model-error source terms for fluid flow

    Get PDF
    It is well known that the wide range of spatial and temporal scales present in geophysical flow problems represents a (currently) insurmountable computational bottleneck, which must be circumvented by a coarse-graining procedure. The effect of the unresolved fluid motions enters the coarse-grained equations as an unclosed forcing term, denoted as the ’eddy forcing’. Traditionally, the system is closed by approximate deterministic closure models, i.e. so-called parameterizations. Instead of creating a deterministic parameterization, some recent efforts have focused on creating a stochastic, data-driven surrogate model for the eddy forcing from a (limited) set of reference data, with the goal of accurately capturing the long-term flow statistics. Since the eddy forcing is a dynamically evolving field, a surrogate should be able to mimic the complex spatial patterns displayed by the eddy forcing. Rather than creating such a (fully data-driven) surrogate, we propose to precede the surrogate construction step by a proce- dure that replaces the eddy forcing with a new model-error source term which: i) is tailor-made to capture spatially-integrated statistics of interest, ii) strikes a balance between physical in- sight and data-driven modelling , and iii) significantly reduces the amount of training data that is needed. Instead of creating a surrogate for an evolving field, we now only require a surrogate model for one scalar time series per statistical quantity-of-interest. Our current surrogate mod- elling approach builds on a resampling strategy, where we create a probability density function of the reduced training data that is conditional on (time-lagged) resolved-scale variables. We derive the model-error source terms, and construct the reduced surrogate using an ocean model of two-dimensional turbulence in a doubly periodic square domain

    Reducing data-driven dynamical subgrid scale models by physical constraints

    Get PDF
    Recent years have seen a growing interest in using data-driven (machine-learning) techniques for the construction of cheap surrogate models of turbulent subgrid scale stresses. These stresses display complex spatio-temporal structures, and constitute a difficult surrogate target. In this paper we propose a data-preprocessing step, in which we derive alternative subgrid scale models which are virtually exact for a user-specified set of spatially integrated quantities of interest. The unclosed component of these new subgrid scale models is of the same size as this set of integrated quantities of interest. As a result, the corresponding training data is massively reduced in size, decreasing the complexity of the subsequent surrogate construction
    • …
    corecore