20 research outputs found

    A filtered multilevel Monte Carlo method for estimating the expectation of discretized random fields

    Full text link
    We investigate the use of multilevel Monte Carlo (MLMC) methods for estimating the expectation of discretized random fields. Specifically, we consider a setting in which the input and output vectors of the numerical simulators have inconsistent dimensions across the multilevel hierarchy. This requires the introduction of grid transfer operators borrowed from multigrid methods. Starting from a simple 1D illustration, we demonstrate numerically that the resulting MLMC estimator deteriorates the estimation of high-frequency components of the discretized expectation field compared to a Monte Carlo (MC) estimator. By adapting mathematical tools initially developed for multigrid methods, we perform a theoretical spectral analysis of the MLMC estimator of the expectation of discretized random fields, in the specific case of linear, symmetric and circulant simulators. This analysis provides a spectral decomposition of the variance into contributions associated with each scale component of the discretized field. We then propose improved MLMC estimators using a filtering mechanism similar to the smoothing process of multigrid methods. The filtering operators improve the estimation of both the small- and large-scale components of the variance, resulting in a reduction of the total variance of the estimator. These improvements are quantified for the specific class of simulators considered in our spectral analysis. The resulting filtered MLMC (F-MLMC) estimator is applied to the problem of estimating the discretized variance field of a diffusion-based covariance operator, which amounts to estimating the expectation of a discretized random field. The numerical experiments support the conclusions of the theoretical analysis even with non-linear simulators, and demonstrate the improvements brought by the proposed F-MLMC estimator compared to both a crude MC and an unfiltered MLMC estimator

    Multilevel assimilation of inverted seismic data

    Get PDF
    I ensemble-basert data-assimilering (DA) er stÞrrelsen pÄ ensemblet vanligvis begrenset til hundre medlemmer. Rett frem bruk av ensemble-basert DA kan resultere i betydelig Monte Carlo-feil, som ofte viser seg som alvorlig undervurdering av parameterusikkerheter. Assimilering av store mengder samtidige data forsterker de negative effektene av Monte Carlo-feilen. Avstandsbasert lokalisering er det konvensjonelle middelet for Ä begrense dette problemet. Denne metoden har imidlertid sine egne ulemper. Den vil, f.eks., fjerne sanne korrelasjoner over lange distanser og det er svÊrt vanskelig Ä benytte pÄ data som ikke har en unik fysisk plassering. Bruk av modeller med lavere kvalitet reduserer beregningskostnadene per ensemble-medlem og gir derfor muligheten til Ä redusere Monte Carlo-feilen ved Ä Þke ensemble-stÞrrelsen. Men, modeller med lavere kvalitet Þker ogsÄ modelleringsfeilen. Data-assimilering pÄ flere nivÄer (MLDA) bruker et utvalg av modeller som danner hierarkier av bÄde beregningskostnad og beregningsnÞyaktighet, og prÞver ÄÄ oppnÄ en bedre balanse mellom Monte Carlo-feil og modelleringsfeil. I dette PhD-prosjektet ble flere MLDA-algoritmer utviklet og deres kvalitet for assimilering av inverterte seismiske data ble vurdert pÄ forenklede reservoarproblemer. Bruk av modeller pÄ flere nivÄ innebÊrer introduksjon av noen numeriske feil (multilevel modeling error, MLME), i tillegg til de allerede eksisterende numeriske feilene. Flere beregningsmessig rimelige metoder ble utviklet for delvis Ä kompansere for MLME i gjennomfÞring av data-assimilering pÄ flere nivÄer. Metodene ble ogsÄ undersÞkt under historie tilpassing pÄ forenklede reservoar problemer. Til slutt ble en av de nye MLDA-algoritmene valgt og ytelsen ble vurdert pÄ et historie tilpassings problem med en realistisk reservoar modell.In ensemble-based data assimilation (DA), the ensemble size is usually limited to around one hundred. Straightforward application of ensemble-based DA can therefore result in significant Monte Carlo errors, often manifesting themselves as severe underestimation of parameter uncertainties. Assimilation of large amounts of simultaneous data enhances the negative effects of Monte Carlo errors. Distance-based localization is the conventional remedy for this problem. However, it has its own drawbacks, e.g. not allowing for true long-range correlations and difficulty in assimilation of data which do not have a specific physical location. Use of lower-fidelity models reduces the computational cost per ensemble member and therefore renders the possibility to reduce Monte Carlo errors by increasing the ensemble size, but it also adds to the modeling error. Multilevel data assimilation (MLDA) uses a selection of models forming hierarchies of both computational cost and computational accuracy, and tries to obtain a better balance between Monte Carlo errors and modeling errors. In this PhD project, several MLDA algorithms were developed and their quality for assimilation of inverted seismic data was assessed in simplistic reservoir problems. Utilization of multilevel models entails introduction of some numerical errors (multilevel modeling error, MLME) to the problem in addition to the already existing numerical errors. Several computationally inexpensive methods were devised for partially accounting for MLME in the context of multilevel data assimilation. They were also investigated in simplistic reservoir history-matching problems. Finally, one of the novel MLDA algorithms was chosen and its performance was assessed in a realistic reservoir history-matching problem.Doktorgradsavhandlin

    Addressing Variability in Hydrologic Systems Using Efficient Uncertainty Quantification

    Full text link
    The scale and complexity of environmental and earth systems introduce an array of uncertainties that need to be systematically addressed. In numerical modeling, the ever-increasing complexity of representation of these systems confounds our ability to resolve relevant uncertainties. Specifically, the numerical simulation of the governing processes involve many inputs and parameters that have been traditionally treated as deterministic. Considering them as uncertain with traditional approaches introduces a large computational burden, stemming from the requirement of a prohibitive number of model simulations. Furthermore, within hydrology, most catchments are sparsely monitored, and there are limited, disparate types of data available to confirm the model's behavior. Here I present a blueprint of a general, computationally efficient approach to uncertainty quantification for complex hydrologic models, taking advantage of recent methodological developments. The framework is used in two basic science problems in hydrology. First, it is applied to the problem of combining heterogeneous data sources representing different physical processes to infer physical parameters for the complex hydrologic model tRIBS-VEGGIE. The inference provides a probabilistic interpretation of bulk soil characteristics and related hydraulic properties for an experimental watershed in central Amazonia. These parameters are then used to propagate uncertainty in hydrologic response to an array of quantities of interest through tRIBS-VEGGIE and determine their sensitivity to uncertain model inputs. Second, the framework is used to explore landscape controls mediated by subsurface hydrologic dynamics on the distribution of vegetative traits in a mature Amazon rainforest. This study features a large parameter set as uncertain across three different soil types and three layers of vegetation, explicitly incorporating interactions between subsurface moisture and vegetation biophysical function. Vegetative performance is examined using a hypothesized cost-benefit approach between vegetation carbon uptake and hydraulic effort required to maintain long-term production. The research enables model-driven inference using a disparate set of observed hydrologic variables including stream discharge, water table depth, evapotranspiration, soil moisture, and gross primary production from the Asu experimental catchment near Manaus, Brazil. Computationally inexpensive model surrogates are constructed and shown to mimic solution of the complex hydrologic model tRIBS-VEGGIE with a high skill. The two applications demonstrate the flexibility of the framework for hydrologic inference in watershed with sparse, irregular observations of varying accuracy. Significant computational savings imply that problems of greater computational complexity and dimension can be addressed. Furthermore, the framework simultaneously yields probabilistic representation of model behavior, robust parameter inference, and sensitivity analysis without the need for greater investment in computational resources.PHDEnvironmental EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147605/1/dwellem_1.pd

    2022 Review of Data-Driven Plasma Science

    Get PDF
    Data-driven science and technology offer transformative tools and methods to science. This review article highlights the latest development and progress in the interdisciplinary field of data-driven plasma science (DDPS), i.e., plasma science whose progress is driven strongly by data and data analyses. Plasma is considered to be the most ubiquitous form of observable matter in the universe. Data associated with plasmas can, therefore, cover extremely large spatial and temporal scales, and often provide essential information for other scientific disciplines. Thanks to the latest technological developments, plasma experiments, observations, and computation now produce a large amount of data that can no longer be analyzed or interpreted manually. This trend now necessitates a highly sophisticated use of high-performance computers for data analyses, making artificial intelligence and machine learning vital components of DDPS. This article contains seven primary sections, in addition to the introduction and summary. Following an overview of fundamental data-driven science, five other sections cover widely studied topics of plasma science and technologies, i.e., basic plasma physics and laboratory experiments, magnetic confinement fusion, inertial confinement fusion and high-energy-density physics, space and astronomical plasmas, and plasma technologies for industrial and other applications. The final section before the summary discusses plasma-related databases that could significantly contribute to DDPS. Each primary section starts with a brief introduction to the topic, discusses the state-of-the-art developments in the use of data and/or data-scientific approaches, and presents the summary and outlook. Despite the recent impressive signs of progress, the DDPS is still in its infancy. This article attempts to offer a broad perspective on the development of this field and identify where further innovations are required

    Deep Learning And Uncertainty Quantification: Methodologies And Applications

    Get PDF
    Uncertainty quantification is a recent emerging interdisciplinary area that leverages the power of statistical methods, machine learning models, numerical methods and data-driven approach to provide reliable inference for quantities of interest in natural science and engineering problems. In practice, the sources of uncertainty come from different aspects such as: aleatoric uncertainty where the uncertainty comes from the observations or is due to the stochastic nature of the problem; epistemic uncertainty where the uncertainty comes from inaccurate mathematical models, computational methods or model parametrization. Cope with the above different types of uncertainty, a successful and scalable model for uncertainty quantification requires prior knowledge in the problem, careful design of mathematical models, cautious selection of computational tools, etc. The fast growth in deep learning, probabilistic methods and the large volume of data available across different research areas enable researchers to take advantage of these recent advances to propose novel methodologies to solve scientific problems where uncertainty quantification plays important roles. The objective of this dissertation is to address the existing gaps and propose new methodologies for uncertainty quantification with deep learning methods and demonstrate their power in engineering applications. On the methodology side, we first present a generative adversarial framework to model aleatoric uncertainty in stochastic systems. Secondly, we leverage the proposed generative model with recent advances in physics-informed deep learning to learn the uncertainty propagation in solutions of partial differential equations. Thirdly, we introduce a simple and effective approach for posterior uncertainty quantification for learning nonlinear operators. Fourthly, we consider inverse problems of physical systems on identifying unknown forms and parameters in dynamical systems via observed noisy data. On the application side, we first propose an importance sampling approach for sequential decision making. Second, we propose a physics-informed neural network method to quantify the epistemic uncertainty in cardiac activation mapping modeling and conduct active learning. Third, we present an anto-encoder based framework for data augmentation and generation for data that is expensive to obtain such as single-cell RNA sequencing

    Developing Efficient Strategies for Automatic Calibration of Computationally Intensive Environmental Models

    Get PDF
    Environmental simulation models have been playing a key role in civil and environmental engineering decision making processes for decades. The utility of an environmental model depends on how well the model is structured and calibrated. Model calibration is typically in an automated form where the simulation model is linked to a search mechanism (e.g., an optimization algorithm) such that the search mechanism iteratively generates many parameter sets (e.g., thousands of parameter sets) and evaluates them through running the model in an attempt to minimize differences between observed data and corresponding model outputs. The challenge rises when the environmental model is computationally intensive to run (with run-times of minutes to hours, for example) as then any automatic calibration attempt would impose a large computational burden. Such a challenge may make the model users accept sub-optimal solutions and not achieve the best model performance. The objective of this thesis is to develop innovative strategies to circumvent the computational burden associated with automatic calibration of computationally intensive environmental models. The first main contribution of this thesis is developing a strategy called “deterministic model preemption” which opportunistically evades unnecessary model evaluations in the course of a calibration experiment and can save a significant portion of the computational budget (even as much as 90% in some cases). Model preemption monitors the intermediate simulation results while the model is running and terminates (i.e., pre-empts) the simulation early if it recognizes that further running the model would not guide the search mechanism. This strategy is applicable to a range of automatic calibration algorithms (i.e., search mechanisms) and is deterministic in that it leads to exactly the same calibration results as when preemption is not applied. One other main contribution of this thesis is developing and utilizing the concept of “surrogate data” which is basically a reasonably small but representative proportion of a full set of calibration data. This concept is inspired by the existing surrogate modelling strategies where a surrogate model (also called a metamodel) is developed and utilized as a fast-to-run substitute of an original computationally intensive model. A framework is developed to efficiently calibrate hydrologic models to the full set of calibration data while running the original model only on surrogate data for the majority of candidate parameter sets, a strategy which leads to considerable computational saving. To this end, mapping relationships are developed to approximate the model performance on the full data based on the model performance on surrogate data. This framework can be applicable to the calibration of any environmental model where appropriate surrogate data and mapping relationships can be identified. As another main contribution, this thesis critically reviews and evaluates the large body of literature on surrogate modelling strategies from various disciplines as they are the most commonly used methods to relieve the computational burden associated with computationally intensive simulation models. To reliably evaluate these strategies, a comparative assessment and benchmarking framework is developed which presents a clear computational budget dependent definition for the success/failure of surrogate modelling strategies. Two large families of surrogate modelling strategies are critically scrutinized and evaluated: “response surface surrogate” modelling which involves statistical or data–driven function approximation techniques (e.g., kriging, radial basis functions, and neural networks) and “lower-fidelity physically-based surrogate” modelling strategies which develop and utilize simplified models of the original system (e.g., a groundwater model with a coarse mesh). This thesis raises fundamental concerns about response surface surrogate modelling and demonstrates that, although they might be less efficient, lower-fidelity physically-based surrogates are generally more reliable as they to-some-extent preserve the physics involved in the original model. Five different surface water and groundwater models are used across this thesis to test the performance of the developed strategies and elaborate the discussions. However, the strategies developed are typically simulation-model-independent and can be applied to the calibration of any computationally intensive simulation model that has the required characteristics. This thesis leaves the reader with a suite of strategies for efficient calibration of computationally intensive environmental models while providing some guidance on how to select, implement, and evaluate the appropriate strategy for a given environmental model calibration problem

    Bayesian Multi-Model Frameworks - Properly Addressing Conceptual Uncertainty in Applied Modelling

    Get PDF
    We use models to understand or predict a system. Often, there are multiple plausible but competing model concepts. Hence, modelling is associated with conceptual uncertainty, i.e., the question about proper handling of such model alternatives. For mathematical models, it is possible to quantify their plausibility based on data and rate them accordingly. Bayesian probability calculus offers several formal multi-model frameworks to rate models in a finite set and to quantify their conceptual uncertainty as model weights. These frameworks are Bayesian model selection and averaging (BMS/BMA), Pseudo-BMS/BMA and Bayesian Stacking. The goal of this dissertation is to facilitate proper utilization of these Bayesian multi-model frameworks. They follow different principles in model rating, which is why derived model weights have to be interpreted differently, too. These principles always concern the model setting, i.e., how the models in the set relate to one another and the true model of the system that generated observed data. This relation is formalized in model scores that are used for model weighting within each framework. The scores resemble framework-specific compromises between the ability of a model to fit the data and the therefore required model complexity. Hence, first, the scores are investigated systematically regarding their respective take on model complexity and are allocated in a developed classification scheme. This shows that BMS/BMA always pursues to identify the true model in the set, that Pseudo-BMS/BMA searches the model with largest predictive power despite none of the models being the true one, and that, on that condition, Bayesian Stacking seeks reliability in prediction by combining predictive distributions of multiple models. An application example with numerical models illustrates these behaviours and demonstrates which misinterpretations of model weights impend, if a certain framework is applied despite being unsuitable for the underlying model setting. Regarding applied modelling, first, a new setting is proposed that allows to identify a ``quasi-true'' model in a set. Second, Bayesian Bootstrapping is employed to take into account that rating of predictive capability is based on only limited data. To ensure that the Bayesian multi-model frameworks are employed properly and goal-oriented, a guideline is set up. With respect to a clearly defined modelling goal and the allocation of available models to the respective setting, it leads to the suitable multi-model framework. Aside of the three investigated frameworks, this guideline further contains an additional one that allows to identify a (quasi-)true model if it is composed of a linear combination of the model alternatives in the set. The gained insights enable a broad range of users in science practice to properly employ Bayesian multi-model frameworks in order to quantify and handle conceptual uncertainty. Thus, maximum reliability in system understanding and prediction with multiple models can be achieved. Further, the insights pave the way for systematic model development and improvement.Wir benutzen Modelle, um ein System zu verstehen oder vorherzusagen. Oft gibt es dabei mehrere plausible aber konkurrierende Modellkonzepte. Daher geht Modellierung einher mit konzeptioneller Unsicherheit, also der Frage nach dem angemessenen Umgang mit solchen Modellalternativen. Bei mathematischen Modellen ist es möglich, die PlausibilitĂ€t jedes Modells anhand von Daten des Systems zu quantifizieren und Modelle entsprechend zu bewerten. Bayes'sche Wahrscheinlichkeitsrechnung bietet dazu verschiedene formale Multi-Modellrahmen, um Modellalternativen in einem endlichen Set zu bewerten und ihre konzeptionelle Unsicherheit als Modellgewichte zu beziffern. Diese Rahmen sind Bayes'sche Modellwahl und -mittelung (BMS/BMA), Pseudo-BMS/BMA und Bayes'sche Modellstapelung. Das Ziel dieser Dissertation ist es, den adĂ€quaten Umgang mit diesen Bayes'schen Multi-Modellrahmen zu ermöglichen. Sie folgen unterschiedlichen Prinzipien in der Modellbewertung weshalb die abgeleiteten Modellgewichte auch unterschiedlich zu interpretieren sind. Diese Prinzipien beziehen sich immer auf das Modellsetting, also darauf, wie sich die Modelle im Set zueinander und auf das wahre Modell des Systems beziehen, welches bereits gemessene Daten erzeugt hat. Dieser Bezug ist in KenngrĂ¶ĂŸen formalisiert, die innerhalb jedes Rahmens der Modellgewichtung dienen. Die KenngrĂ¶ĂŸen stellen rahmenspezifische Kompromisse dar, zwischen der FĂ€higkeit eines Modells die Daten zu treffen und der dazu benötigten ModellkomplexitĂ€t. Daher werden die KenngrĂ¶ĂŸen zunĂ€chst systematisch auf ihre jeweilige Bewertung von ModellkomplexitĂ€t untersucht und in einem entsprechend entwickelten Klassifikationschema zugeordnet. Dabei zeigt sich, dass BMS/BMA stets verfolgt das wahre Modell im Set zu identifizieren, dass Pseudo-BMS/BMA das Modell mit der höchsten Vorsagekraft sucht, obwohl kein wahres Modell verfĂŒgbar ist, und dass Bayes'sche Modellstapelung unter dieser Bedingung VerlĂ€sslichkeit von Vorhersagen anstrebt, indem die Vorhersageverteilungen mehrerer Modelle kombiniert werden. Ein Anwendungsbeispiel mit numerischen Modellen verdeutlicht diese Verhaltenweisen und zeigt auf, welche Fehlinterpretationen der Modellgewichte drohen, wenn ein bestimmter Rahmen angewandt wird, obwohl er nicht zum zugrundeliegenden Modellsetting passt. Mit Bezug auf anwendungsorientierte Modellierung wird dabei erstens ein neues Setting vorgestellt, das es ermöglicht, ein ``quasi-wahres'' Modell in einem Set zu identifizieren. Zweitens wird Bayes'sches Bootstrapping eingesetzt um bei der Bewertung der VorhersagegĂŒte zu berĂŒcksichtigen, dass diese auf Basis weniger Daten erfolgt. Um zu gewĂ€hrleisten, dass die Bayes'schen Multi-Modellrahmen angemessen und zielfĂŒhrend eingesetzt werden, wird schließlich ein Leitfaden erstellt. Anhand eines klar definierten Modellierungszieles und der Einordnung der gegebenen Modelle in das entspechende Setting leitet dieser zum geeigneten Multi-Modellrahmen. Neben den drei untersuchten Rahmen enthĂ€lt dieser Leitfaden zudem einen weiteren, der es ermöglicht ein (quasi-)wahres Modell zu identifizieren, wenn dieses aus einer Linearkombination der Modellalternativen im Set besteht. Die gewonnenen Erkenntnisse ermöglichen es einer breiten Anwenderschaft in Wissenschaft und Praxis, Bayes'sche Multi-Modellrahmen zur Quantifizierung und Handhabung konzeptioneller Unsicherheit adĂ€quat einzusetzen. Dadurch lĂ€sst sich maximale VerlĂ€sslichkeit in SystemverstĂ€ndis und -vorhersage durch mehrere Modelle erreichen. Die Erkenntnisse ebnen darĂŒber hinaus den Weg fĂŒr systematische Modellentwicklung und -verbesserung

    Generalized averaged Gaussian quadrature and applications

    Get PDF
    A simple numerical method for constructing the optimal generalized averaged Gaussian quadrature formulas will be presented. These formulas exist in many cases in which real positive GaussKronrod formulas do not exist, and can be used as an adequate alternative in order to estimate the error of a Gaussian rule. We also investigate the conditions under which the optimal averaged Gaussian quadrature formulas and their truncated variants are internal

    MS FT-2-2 7 Orthogonal polynomials and quadrature: Theory, computation, and applications

    Get PDF
    Quadrature rules find many applications in science and engineering. Their analysis is a classical area of applied mathematics and continues to attract considerable attention. This seminar brings together speakers with expertise in a large variety of quadrature rules. It is the aim of the seminar to provide an overview of recent developments in the analysis of quadrature rules. The computation of error estimates and novel applications also are described
    corecore