The paper describes a sparse direct solver for the linear systems that arise
from the discretization of an elliptic PDE on a two dimensional domain. The
solver is designed to reduce communication costs and perform well on GPUs; it
uses a two-level framework, which is easier to implement and optimize than
traditional multi-frontal schemes based on hierarchical nested dissection
orderings. The scheme decomposes the domain into thin subdomains, or "slabs".
Within each slab, a local factorization is executed that exploits the geometry
of the local domain. A global factorization is then obtained through the LU
factorization of a block-tridiagonal reduced coefficient matrix. The solver has
complexity O(N5/3) for the factorization step, and O(N7/6) for each
solve once the factorization is completed.
The solver described is compatible with a range of different local
discretizations, and numerical experiments demonstrate its performance for
regular discretizations of rectangular and curved geometries. The technique
becomes particularly efficient when combined with very high-order convergent
multi-domain spectral collocation schemes. With this discretization, a
Helmholtz problem on a domain of size 1000λ×1000λ (for
which N=100 \mbox{M}) is solved in 15 minutes to 6 correct digits on a
high-powered desktop with GPU acceleration