32,230 research outputs found
A scalable H-matrix approach for the solution of boundary integral equations on multi-GPU clusters
In this work, we consider the solution of boundary integral equations by
means of a scalable hierarchical matrix approach on clusters equipped with
graphics hardware, i.e. graphics processing units (GPUs). To this end, we
extend our existing single-GPU hierarchical matrix library hmglib such that it
is able to scale on many GPUs and such that it can be coupled to arbitrary
application codes. Using a model GPU implementation of a boundary element
method (BEM) solver, we are able to achieve more than 67 percent relative
parallel speed-up going from 128 to 1024 GPUs for a model geometry test case
with 1.5 million unknowns and a real-world geometry test case with almost 1.2
million unknowns. On 1024 GPUs of the cluster Titan, it takes less than 6
minutes to solve the 1.5 million unknowns problem, with 5.7 minutes for the
setup phase and 20 seconds for the iterative solver. To the best of the
authors' knowledge, we here discuss the first fully GPU-based
distributed-memory parallel hierarchical matrix Open Source library using the
traditional H-matrix format and adaptive cross approximation with an
application to BEM problems
An accurate, fast, mathematically robust, universal, non-iterative algorithm for computing multi-component diffusion velocities
Using accurate multi-component diffusion treatment in numerical combustion
studies remains formidable due to the computational cost associated with
solving for diffusion velocities. To obtain the diffusion velocities, for low
density gases, one needs to solve the Stefan-Maxwell equations along with the
zero diffusion flux criteria, which scales as , when solved
exactly. In this article, we propose an accurate, fast, direct and robust
algorithm to compute multi-component diffusion velocities. To our knowledge,
this is the first provably accurate algorithm (the solution can be obtained up
to an arbitrary degree of precision) scaling at a computational complexity of
in finite precision. The key idea involves leveraging the fact
that the matrix of the reciprocal of the binary diffusivities, , is low
rank, with its rank being independent of the number of species involved. The
low rank representation of matrix is computed in a fast manner at a
computational complexity of and the Sherman-Morrison-Woodbury
formula is used to solve for the diffusion velocities at a computational
complexity of . Rigorous proofs and numerical benchmarks
illustrate the low rank property of the matrix and scaling of the
algorithm.Comment: 16 pages, 7 figures, 1 table, 1 algorith
- …