4,186 research outputs found
A GPU-accelerated package for simulation of flow in nanoporous source rocks with many-body dissipative particle dynamics
Mesoscopic simulations of hydrocarbon flow in source shales are challenging,
in part due to the heterogeneous shale pores with sizes ranging from a few
nanometers to a few micrometers. Additionally, the sub-continuum fluid-fluid
and fluid-solid interactions in nano- to micro-scale shale pores, which are
physically and chemically sophisticated, must be captured. To address those
challenges, we present a GPU-accelerated package for simulation of flow in
nano- to micro-pore networks with a many-body dissipative particle dynamics
(mDPD) mesoscale model. Based on a fully distributed parallel paradigm, the
code offloads all intensive workloads on GPUs. Other advancements, such as
smart particle packing and no-slip boundary condition in complex pore
geometries, are also implemented for the construction and the simulation of the
realistic shale pores from 3D nanometer-resolution stack images. Our code is
validated for accuracy and compared against the CPU counterpart for speedup. In
our benchmark tests, the code delivers nearly perfect strong scaling and weak
scaling (with up to 512 million particles) on up to 512 K20X GPUs on Oak Ridge
National Laboratory's (ORNL) Titan supercomputer. Moreover, a single-GPU
benchmark on ORNL's SummitDev and IBM's AC922 suggests that the host-to-device
NVLink can boost performance over PCIe by a remarkable 40\%. Lastly, we
demonstrate, through a flow simulation in realistic shale pores, that the CPU
counterpart requires 840 Power9 cores to rival the performance delivered by our
package with four V100 GPUs on ORNL's Summit architecture. This simulation
package enables quick-turnaround and high-throughput mesoscopic numerical
simulations for investigating complex flow phenomena in nano- to micro-porous
rocks with realistic pore geometries
Reproducibility, accuracy and performance of the Feltor code and library on parallel computer architectures
Feltor is a modular and free scientific software package. It allows
developing platform independent code that runs on a variety of parallel
computer architectures ranging from laptop CPUs to multi-GPU distributed memory
systems. Feltor consists of both a numerical library and a collection of
application codes built on top of the library. Its main target are two- and
three-dimensional drift- and gyro-fluid simulations with discontinuous Galerkin
methods as the main numerical discretization technique. We observe that
numerical simulations of a recently developed gyro-fluid model produce
non-deterministic results in parallel computations. First, we show how we
restore accuracy and bitwise reproducibility algorithmically and
programmatically. In particular, we adopt an implementation of the exactly
rounded dot product based on long accumulators, which avoids accuracy losses
especially in parallel applications. However, reproducibility and accuracy
alone fail to indicate correct simulation behaviour. In fact, in the physical
model slightly different initial conditions lead to vastly different end
states. This behaviour translates to its numerical representation. Pointwise
convergence, even in principle, becomes impossible for long simulation times.
In a second part, we explore important performance tuning considerations. We
identify latency and memory bandwidth as the main performance indicators of our
routines. Based on these, we propose a parallel performance model that predicts
the execution time of algorithms implemented in Feltor and test our model on a
selection of parallel hardware architectures. We are able to predict the
execution time with a relative error of less than 25% for problem sizes between
0.1 and 1000 MB. Finally, we find that the product of latency and bandwidth
gives a minimum array size per compute node to achieve a scaling efficiency
above 50% (both strong and weak)
Accelerating moderately stiff chemical kinetics in reactive-flow simulations using GPUs
The chemical kinetics ODEs arising from operator-split reactive-flow
simulations were solved on GPUs using explicit integration algorithms. Nonstiff
chemical kinetics of a hydrogen oxidation mechanism (9 species and 38
irreversible reactions) were computed using the explicit fifth-order
Runge-Kutta-Cash-Karp method, and the GPU-accelerated version performed faster
than single- and six-core CPU versions by factors of 126 and 25, respectively,
for 524,288 ODEs. Moderately stiff kinetics, represented with mechanisms for
hydrogen/carbon-monoxide (13 species and 54 irreversible reactions) and methane
(53 species and 634 irreversible reactions) oxidation, were computed using the
stabilized explicit second-order Runge-Kutta-Chebyshev (RKC) algorithm. The
GPU-based RKC implementation demonstrated an increase in performance of nearly
59 and 10 times, for problem sizes consisting of 262,144 ODEs and larger, than
the single- and six-core CPU-based RKC algorithms using the
hydrogen/carbon-monoxide mechanism. With the methane mechanism, RKC-GPU
performed more than 65 and 11 times faster, for problem sizes consisting of
131,072 ODEs and larger, than the single- and six-core RKC-CPU versions, and up
to 57 times faster than the six-core CPU-based implicit VODE algorithm on
65,536 ODEs. In the presence of more severe stiffness, such as ethylene
oxidation (111 species and 1566 irreversible reactions), RKC-GPU performed more
than 17 times faster than RKC-CPU on six cores for 32,768 ODEs and larger, and
at best 4.5 times faster than VODE on six CPU cores for 65,536 ODEs. With a
larger time step size, RKC-GPU performed at best 2.5 times slower than six-core
VODE for 8192 ODEs and larger. Therefore, the need for developing new
strategies for integrating stiff chemistry on GPUs was discussed.Comment: 27 pages, LaTeX; corrected typos in Appendix equations A.10 and A.1
Fast Hydraulic Erosion Simulation and Visualization on GPU
International audienceNatural mountains and valleys are gradually eroded by rainfall and river flows. Physically-based modeling of this complex phenomenon is a major concern in producing realistic synthesized terrains. However, despite some recent improvements, existing algorithms are still computationally expensive, leading to a time-consuming process fairly impractical for terrain designers and 3D artists. In this paper, we present a new method to model the hydraulic erosion phenomenon which runs at interactive rates on today's computers. The method is based on the velocity field of the running water, which is created with an efficient shallow-water fluid model. The velocity field is used to calculate the erosion and deposition process, and the sediment transportation process. The method has been carefully designed to be implemented totally on GPU, and thus takes full advantage of the parallelism of current graphics hardware. Results from experiments demonstrate that our method is effective and efficient. It can create realistic erosion effects by rainfall and river flows, and produce fast simulation results for terrains with large sizes
- …