1,016 research outputs found
PoisFFT - A Free Parallel Fast Poisson Solver
A fast Poisson solver software package PoisFFT is presented. It is available
as a free software licensed under the GNU GPL license version 3. The package
uses the fast Fourier transform to directly solve the Poisson equation on a
uniform orthogonal grid. It can solve the pseudo-spectral approximation and the
second order finite difference approximation of the continuous solution. The
paper reviews the mathematical methods for the fast Poisson solver and
discusses the software implementation and parallelization. The use of PoisFFT
in an incompressible flow solver is also demonstrated
Recommended from our members
Preparing sparse solvers for exascale computing.
Sparse solvers provide essential functionality for a wide variety of scientific applications. Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi-physics and multi-scale simulations, especially as we target exascale platforms. This paper describes the challenges, strategies and progress of the US Department of Energy Exascale Computing project towards providing sparse solvers for exascale computing platforms. We address the demands of systems with thousands of high-performance node devices where exposing concurrency, hiding latency and creating alternative algorithms become essential. The efforts described here are works in progress, highlighting current success and upcoming challenges. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'
A distributed-memory package for dense Hierarchically Semi-Separable matrix computations using randomization
We present a distributed-memory library for computations with dense
structured matrices. A matrix is considered structured if its off-diagonal
blocks can be approximated by a rank-deficient matrix with low numerical rank.
Here, we use Hierarchically Semi-Separable representations (HSS). Such matrices
appear in many applications, e.g., finite element methods, boundary element
methods, etc. Exploiting this structure allows for fast solution of linear
systems and/or fast computation of matrix-vector products, which are the two
main building blocks of matrix computations. The compression algorithm that we
use, that computes the HSS form of an input dense matrix, relies on randomized
sampling with a novel adaptive sampling mechanism. We discuss the
parallelization of this algorithm and also present the parallelization of
structured matrix-vector product, structured factorization and solution
routines. The efficiency of the approach is demonstrated on large problems from
different academic and industrial applications, on up to 8,000 cores.
This work is part of a more global effort, the STRUMPACK (STRUctured Matrices
PACKage) software package for computations with sparse and dense structured
matrices. Hence, although useful on their own right, the routines also
represent a step in the direction of a distributed-memory sparse solver
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
Density Functional Theory calculation on many-cores hybrid CPU-GPU architectures
The implementation of a full electronic structure calculation code on a
hybrid parallel architecture with Graphic Processing Units (GPU) is presented.
The code which is on the basis of our implementation is a GNU-GPL code based on
Daubechies wavelets. It shows very good performances, systematic convergence
properties and an excellent efficiency on parallel computers. Our GPU-based
acceleration fully preserves all these properties. In particular, the code is
able to run on many cores which may or may not have a GPU associated. It is
thus able to run on parallel and massive parallel hybrid environment, also with
a non-homogeneous ratio CPU/GPU. With double precision calculations, we may
achieve considerable speedup, between a factor of 20 for some operations and a
factor of 6 for the whole DFT code.Comment: 14 pages, 8 figure
Simulation of Laser Propagation in a Plasma with a Frequency Wave Equation
The aim of this work is to perform numerical simulations of the propagation
of a laser in a plasma. At each time step, one has to solve a Helmholtz
equation in a domain which consists in some hundreds of millions of cells. To
solve this huge linear system, one uses a iterative Krylov method with a
preconditioning by a separable matrix. The corresponding linear system is
solved with a block cyclic reduction method. Some enlightments on the parallel
implementation are also given. Lastly, numerical results are presented
including some features concerning the scalability of the numerical method on a
parallel architecture
- …