4 research outputs found
Alternating direction implicit time integrations for finite difference acoustic wave propagation: Parallelization and convergence
This work studies the parallelization and empirical convergence of two finite
difference acoustic wave propagation methods on 2-D rectangular grids, that use
the same alternating direction implicit (ADI) time integration. This ADI
integration is based on a second-order implicit Crank-Nicolson temporal
discretization that is factored out by a Peaceman-Rachford decomposition of the
time and space equation terms. In space, these methods highly diverge and apply
different fourth-order accurate differentiation techniques. The first method
uses compact finite differences (CFD) on nodal meshes that requires solving
tridiagonal linear systems along each grid line, while the second one employs
staggered-grid mimetic finite differences (MFD). For each method, we implement
three parallel versions: (i) a multithreaded code in Octave, (ii) a C++ code
that exploits OpenMP loop parallelization, and (iii) a CUDA kernel for a NVIDIA
GTX 960 Maxwell card. In these implementations, the main source of parallelism
is the simultaneous ADI updating of each wave field matrix, either column-wise
or row-wise, according to the differentiation direction. In our numerical
applications, the highest performances are displayed by the CFD and MFD CUDA
codes that achieve speedups of 7.21x and 15.81x, respectively, relative to
their C++ sequential counterparts with optimal compilation flags. Our test
cases also allow to assess the numerical convergence and accuracy of both
methods. In a problem with exact harmonic solution, both methods exhibit
convergence rates close to 4 and the MDF accuracy is practically higher.
Alternatively, both convergences decay to second order on smooth problems with
severe gradients at boundaries, and the MDF rates degrade in highly-resolved
grids leading to larger inaccuracies. This transition of empirical convergences
agrees with the nominal truncation errors in space and time.Comment: 20 pages, 5 figure
A novel approach to evaluating compact finite differences and similar tridiagonal schemes on GPU-accelerated clusters
Compact finite difference schemes are widely used in the direct numerical simulation of fluid flows for their ability to better resolve the small scales of turbulence. However, they can be expensive to evaluate and difficult to parallelize. In this work, we present an approach for the computation of compact finite differences and similar tridiagonal schemes on graphics processing units (GPUs). We present a variant of the cyclic reduction algorithm for solving the tridiagonal linear systems that arise in such numerical schemes. We study the impact of the matrix structure on the cyclic reduction algorithm and show that precomputing forward reduction coefficients can be especially effective for obtaining good performance. Our tridiagonal solver is able to outperform the NVIDIA CUSPARSE and the multithreaded Intel MKL tridiagonal solvers on GPU and CPU respectively. In addition, we present a parallelization strategy for GPU-accelerated clusters, and show scalabality of a 3-D compact finite difference application for up to 64 GPUs on Clemson’s Palmetto cluster