740 research outputs found

    Applying big data paradigms to a large scale scientific workflow: lessons learned and future directions

    Get PDF
    The increasing amounts of data related to the execution of scientific workflows has raised awareness of their shift towards parallel data-intensive problems. In this paper, we deliver our experience combining the traditional high-performance computing and grid-based approaches with Big Data analytics paradigms, in the context of scientific ensemble workflows. Our goal was to assess and discuss the suitability of such data-oriented mechanisms for production-ready workflows, especially in terms of scalability. We focused on two key elements in the Big Data ecosystem: the data-centric programming model, and the underlying infrastructure that integrates storage and computation in each node. We experimented with a representative MPI-based iterative workflow from the hydrology domain, EnKF-HGS, which we re-implemented using the Spark data analysis framework. We conducted experiments on a local cluster, a private cloud running OpenNebula, and the Amazon Elastic Compute Cloud (AmazonEC2). The results we obtained were analysed to synthesize the lessons we learned from this experience, while discussing promising directions for further research.This work was supported by the Spanish Ministry of Economics and Competitiveness grant TIN-2013-41350-P, the IC1305 COST Action “Network for Sustainable Ultrascale Computing Platforms” (NESUS), and the FPU Training Program for Academic and Teaching Staff Grant FPU15/00422 by the Spanish Ministry of Education

    Predictive Scale-Bridging Simulations through Active Learning

    Full text link
    Throughout computational science, there is a growing need to utilize the continual improvements in raw computational horsepower to achieve greater physical fidelity through scale-bridging over brute-force increases in the number of mesh elements. For instance, quantitative predictions of transport in nanoporous media, critical to hydrocarbon extraction from tight shale formations, are impossible without accounting for molecular-level interactions. Similarly, inertial confinement fusion simulations rely on numerical diffusion to simulate molecular effects such as non-local transport and mixing without truly accounting for molecular interactions. With these two disparate applications in mind, we develop a novel capability which uses an active learning approach to optimize the use of local fine-scale simulations for informing coarse-scale hydrodynamics. Our approach addresses three challenges: forecasting continuum coarse-scale trajectory to speculatively execute new fine-scale molecular dynamics calculations, dynamically updating coarse-scale from fine-scale calculations, and quantifying uncertainty in neural network models

    Fully Integrated Hydrocarbon Reservoir Studies: Myth or Reality?

    Get PDF
    Abstract: Problem statement: In the petroleum industry and especially during reservoir studies, data coming from different disciplines must be combined in order to generate a model that is representative of the reservoir being studied and can be used for defining the most viable development strategy of the field from both an economic and technical standpoint. Each of these disciplines represents an independent piece of a puzzle that is solved by professionals from various scientific fields who have different educational backgrounds. Integration among geophysics, geology, fluid dynamics and geomechanics is truly essential, but requires specific approaches and procedures for generating and calibrating a reservoir model capable of dealing with all and each of these aspects. Approach: Independent workflows were examined for each of the disciplines involved so as to highlight unavoidable interdependencies between static, dynamic and geomechanical models, even when the goal is to tackle each issue separately. Then, the traditional working method was compared to the integrated approach that supports the generation and calibration of models based on data and interpretation results from all the disciplines involved in the entire project. Results: The Construction of a reservoir model should be regarded as a dynamic process, subject to repeated updates as new data is made available and by frequent modifications when inconsistencies are found between the understanding that different specialists have of the same system. This approach has exhibited great advantages in terms of improvement in the quality and flexibility of the model, reduction of working time and generation of a single final model that can be adapted or used for any kind of simulation problem. Conclusion: An integrated approach is necessary for reservoir modeling purposes. Modern reservoir studies should be designed accordingly to permit the full integration of static, dynamic and geomechanical data into a single reservoir model. Integration is always beneficial, even though there still remains a misconception that it is not needed at all times. For this reason, it is recommended that an effort is made to set up a model capable to handle all aspects of a reservoir study each time a new field study is undertaken, even when it is not envisioned that all aspects might be of interest in the futur

    Model Order Reduction in Porous Media Flow Simulation and Optimization

    Get PDF
    Subsurface flow modeling and simulation is ubiquitous in many energy related processes, including oil and gas production. These models are usually large scale and simulating them can be very computationally demanding, particularly in work-flows that require hundreds, if not thousands, runs of a model to achieve the optimal production solution. The primary objective of this study is to reduce the complexity of reservoir simulation, and to accelerate production optimization via model order reduction (MOR) by proposing two novel strategies, Proper Orthogonal Decomposition with Discrete Empirical Interpolation Method (POD-DEIM), and Quadratic Bilinear Formulation (QBLF). While the former is a training-based approach whereby one runs several reservoir models for different input strategies before reducing the model, the latter is a training-free approach. Model order reduction by POD has been shown to be a viable way to reduce the computational cost of flow simulation. However, in the case of porous media flow models, this type of MOR scheme does not immediately yield a computationally efficient reduced system. The main difficulty arises in evaluating nonlinear terms on a reduced subspace. One way to overcome this difficulty is to apply DEIM onto the nonlinear functions (fractional flow, for instance) and to select a small set of grid blocks based on a greedy algorithm. The nonlinear terms are evaluated at these few grid blocks and interpolation based on projection is used for the rest of them. Furthermore, to reduce the number of POD-DEIM basis and the error, a new approach is integrated in this study to update the basis online. In the regular POD-DEIM work flow all the snapshots are used to find one single reduced subspace, whereas in the new technique, namely the localized POD-DEIM, the snapshots are clustered into different groups by means of clustering techniques (k-means), and the reduced subspaces are computed for each cluster in the online (pre-processing) phase. In the online phase, at each time step, the reduced states are used in a classifier to find the most representative basis and to update the reduced subspace. In the second approach in order to overcome the issue of nonlinearity, the QBLF of the original nonlinear porous media flow system is introduced, yielding a system that is linear in the input and linear in the state, but not in both input and state jointly. Primarily, a new set of variables is used to change the problem into QBLF. To highlight the superiority of this approach, the new formulation is compared with a Taylor's series expansion of the system. At this initial phase of development, a POD-based model reduction is integrated with the QBLF in this study in order to reduce the computational costs. This new reduced model has the same form as the original high fidelity model and thus preserves the properties such as stability and passivity. This new form also facilitates the investigation of systematic MOR, where no training or snapshot is required. We test these MOR algorithms on the SPE10 and the results suggest twofold runtime speedups for a case study with more than 60,000 grid blocks. In the case of the QBLF, the results suggests moderate speedups, but more investigation is needed to accommodate an efficient implementation. Finally, MOR is integrated in the optimization work flow for accelerating it. The gradient based optimization framework is used due to its efficiency and fast convergence. This work flow is modified to include the reduced order model and consequently to reduce the computational cost. The water flooding optimization is applied to an offshore reservoir benchmark model, UNISIM-I-D, which has around 38,000 active grid blocks and 25 wells. The numerical solutions demonstrate that the POD-based model order reduction can reproduce accurate optimization results while providing reasonable speedups

    Fully Integrated Hydrocarbon Reservoir Studies: Myth or Reality?

    Get PDF
    Abstract: Problem statement: In the petroleum industry and especially during reservoir studies, data coming from different disciplines must be combined in order to generate a model that is representative of the reservoir being studied and can be used for defining the most viable development strategy of the field from both an economic and technical standpoint. Each of these disciplines represents an independent piece of a puzzle that is solved by professionals from various scientific fields who have different educational backgrounds. Integration among geophysics, geology, fluid dynamics and geomechanics is truly essential, but requires specific approaches and procedures for generating and calibrating a reservoir model capable of dealing with all and each of these aspects. Approach: Independent workflows were examined for each of the disciplines involved so as to highlight unavoidable interdependencies between static, dynamic and geomechanical models, even when the goal is to tackle each issue separately. Then, the traditional working method was compared to the integrated approach that supports the generation and calibration of models based on data and interpretation results from all the disciplines involved in the entire project. Results: The Construction of a reservoir model should be regarded as a dynamic process, subject to repeated updates as new data is made available and by frequent modifications when inconsistencies are found between the understanding that different specialists have of the same system. This approach has exhibited great advantages in terms of improvement in the quality and flexibility of the model, reduction of working time and generation of a single final model that can be adapted or used for any kind of simulation problem. Conclusion: An integrated approach is necessary for reservoir modeling purposes. Modern reservoir studies should be designed accordingly to permit the full integration of static, dynamic and geomechanical data into a single reservoir model. Integration is always beneficial, even though there still remains a misconception that it is not needed at all times. For this reason, it is recommended that an effort is made to set up a model capable to handle all aspects of a reservoir study each time a new field study is undertaken, even when it is not envisioned that all aspects might be of interest in the future

    Scaling full seismic waveform inversions

    Get PDF
    The main goal of this research study is to scale full seismic waveform inversions using the adjoint-state method to the data volumes that are nowadays available in seismology. Practical issues hinder the routine application of this, to a certain extent theoretically well understood, method. To a large part this comes down to outdated or flat out missing tools and ways to automate the highly iterative procedure in a reliable way. This thesis tackles these issues in three successive stages. It first introduces a modern and properly designed data processing framework sitting at the very core of all the consecutive developments. The ObsPy toolkit is a Python library providing a bridge for seismology into the scientific Python ecosystem and bestowing seismologists with effortless I/O and a powerful signal processing library, amongst other things. The following chapter deals with a framework designed to handle the specific data management and organization issues arising in full seismic waveform inversions, the Large-scale Seismic Inversion Framework. It has been created to orchestrate the various pieces of data accruing in the course of an iterative waveform inversion. Then, the Adaptable Seismic Data Format, a new, self-describing, and scalable data format for seismology is introduced along with the rationale why it is needed for full waveform inversions in particular and seismology in general. Finally, these developments are put into service to construct a novel full seismic waveform inversion model for elastic subsurface structure beneath the North American continent and the Northern Atlantic well into Europe. The spectral element method is used for the forward and adjoint simulations coupled with windowed time-frequency phase misfit measurements. Later iterations use 72 events, all happening after the USArray project has commenced, resulting in approximately 150`000 three components recordings that are inverted for. 20 L-BFGS iterations yield a model that can produce complete seismograms at a period range between 30 and 120 seconds while comparing favorably to observed data
    • …
    corecore