172 research outputs found
An Improved Multi-Stage Preconditioner on GPUs for Compositional Reservoir Simulation
The compositional model is often used to describe multicomponent multiphase
porous media flows in the petroleum industry. The fully implicit method with
strong stability and weak constraints on time-step sizes is commonly used in
the mainstream commercial reservoir simulators. In this paper, we develop an
efficient multi-stage preconditioner for the fully implicit compositional flow
simulation. The method employs an adaptive setup phase to improve the parallel
efficiency on GPUs. Furthermore, a multi-color Gauss-Seidel algorithm based on
the adjacency matrix is applied in the algebraic multigrid methods for the
pressure part. Numerical results demonstrate that the proposed algorithm
achieves good parallel speedup while yields the same convergence behavior as
the corresponding sequential version.Comment: 24 pages, 4 figures, and 8 tables. arXiv admin note: text overlap
with arXiv:2201.0197
An Evaluation and Comparison of GPU Hardware and Solver Libraries for Accelerating the OPM Flow Reservoir Simulator
Realistic reservoir simulation is known to be prohibitively expensive in
terms of computation time when increasing the accuracy of the simulation or by
enlarging the model grid size. One method to address this issue is to
parallelize the computation by dividing the model in several partitions and
using multiple CPUs to compute the result using techniques such as MPI and
multi-threading. Alternatively, GPUs are also a good candidate to accelerate
the computation due to their massively parallel architecture that allows many
floating point operations per second to be performed. The numerical iterative
solver takes thus the most computational time and is challenging to solve
efficiently due to the dependencies that exist in the model between cells. In
this work, we evaluate the OPM Flow simulator and compare several
state-of-the-art GPU solver libraries as well as custom developed solutions for
a BiCGStab solver using an ILU0 preconditioner and benchmark their performance
against the default DUNE library implementation running on multiple CPU
processors using MPI. The evaluated GPU software libraries include a manual
linear solver in OpenCL and the integration of several third party sparse
linear algebra libraries, such as cuSparse, rocSparse, and amgcl. To perform
our bench-marking, we use small, medium, and large use cases, starting with the
public test case NORNE that includes approximately 50k active cells and ending
with a large model that includes approximately 1 million active cells. We find
that a GPU can accelerate a single dual-threaded MPI process up to 5.6 times,
and that it can compare with around 8 dual-threaded MPI processes
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
Parallel Numerical Solution of Two-Phase Flow in Porous Media On Non-Orthogonal Geometries: a Performance Study Using Different GPU Architectures
A parallel numerical model for two phase flow (water and oil) in porous media on nonorthogonal geometries is solved by using different Graphics Processing Unit (GPU) architectures to carry out a comparison of the performance that can be reached by each of them. The mathematical model is based on the mass conservation transformed equations for water and oil phases, which results in two coupled non-linear partial differential equations (PDEs). The Finite Volume Method (FVM) is used to discretize the set of PDEs that govern this problem and the Newton-Raphson method is utilized to linearize and solve them simultaneously. Solution of the linear equations system is computationally expensive and requires a large amount of time as the number of unknowns increases. We take advantage of the current GPUs computing technology for constructing massive parallel numerical algorithms for modeling multi-phase flow in porous media [1, 2]. The construction of the Jacobian is directly done in the GPU, which reduces the information that needs to be exchanged between the CPU (Central Processing Unit) and the GPU. Libraries that include Krylov methods are used and tested. The numerical results indicate until 12x of speed up over a single CPU by applying the GPU parallelism with the different architectures tested in this study (Kepler, Pascal and Turing). Furthermore, this study also tries to identify which of these architectures is the best option according to our computing needs
- …