22 research outputs found
Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption
In this paper we give new results on domain decomposition preconditioners for
GMRES when computing piecewise-linear finite-element approximations of the
Helmholtz equation , with
absorption parameter . Multigrid approximations of
this equation with are commonly used as preconditioners
for the pure Helmholtz case (). However a rigorous theory for
such (so-called "shifted Laplace") preconditioners, either for the pure
Helmholtz equation, or even the absorptive equation (), is
still missing. We present a new theory for the absorptive equation that
provides rates of convergence for (left- or right-) preconditioned GMRES, via
estimates of the norm and field of values of the preconditioned matrix. This
theory uses a - and -explicit coercivity result for the
underlying sesquilinear form and shows, for example, that if , then classical overlapping additive Schwarz will perform optimally for
the absorptive problem, provided the subdomain and coarse mesh diameters are
carefully chosen. Extensive numerical experiments are given that support the
theoretical results. The theory for the absorptive case gives insight into how
its domain decomposition approximations perform as preconditioners for the pure
Helmholtz case . At the end of the paper we propose a
(scalable) multilevel preconditioner for the pure Helmholtz problem that has an
empirical computation time complexity of about for
solving finite element systems of size , where we have
chosen the mesh diameter to avoid the pollution effect.
Experiments on problems with , i.e. a fixed number of grid points
per wavelength, are also given
Product quasi-interpolation in logarithmically singular integral equations
A discrete high order method is constructed and justified for a class of Fredholm integral equations of the second kind with kernels that may have boundary and logarithmic diagonal singularities. The method is based on the improving the boundary behaviour of the kernel with the help of a change of variables, and on the product integration using quasi-interpolation by smooth splines of order m. Properties of different proposed calculation schemes are compared through numerical experiments using, in particular, variable precision interval arithmetics
Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters
Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with trillions (\order(10^{12})) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic PDEs which are encountered in (semi-) implicit time stepping procedures in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. In particular, an important constant in the discretisation is the CFL number; only the multigrid solver is robust to changes in this constant. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with unknowns on 16384 GPUs; this corresponds to about of the theoretical peak performance of the machine and we use more than of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second
Parallel Privacy-Preserving Shortest Path Algorithms
In this paper, we propose and present secure multiparty computation (SMC) protocols for single-source shortest distance (SSSD) and all-pairs shortest distance (APSD) in sparse and dense graphs. Our protocols follow the structure of classical algorithms—Bellman–Ford and Dijkstra for SSSD; Johnson, Floyd–Warshall, and transitive closure for APSD. As the computational platforms offered by SMC protocol sets have performance profiles that differ from typical processors, we had to perform extensive changes to the structure (including their control flow and memory accesses) and the details of these algorithms in order to obtain good performance. We implemented our protocols on top of the secret sharing based protocol set offered by the Sharemind SMC platform, using single-instruction-multiple-data (SIMD) operations as much as possible to reduce the round complexity. We benchmarked our protocols under several different parameters for network performance and compared our performance figures against each other and with ones reported previously
Parallel Privacy-Preserving Shortest Path Algorithms
In this paper, we propose and present secure multiparty computation (SMC) protocols for single-source shortest distance (SSSD) and all-pairs shortest distance (APSD) in sparse and dense graphs. Our protocols follow the structure of classical algorithms—Bellman–Ford and Dijkstra for SSSD; Johnson, Floyd–Warshall, and transitive closure for APSD. As the computational platforms offered by SMC protocol sets have performance profiles that differ from typical processors, we had to perform extensive changes to the structure (including their control flow and memory accesses) and the details of these algorithms in order to obtain good performance. We implemented our protocols on top of the secret sharing based protocol set offered by the Sharemind SMC platform, using single-instruction-multiple-data (SIMD) operations as much as possible to reduce the round complexity. We benchmarked our protocols under several different parameters for network performance and compared our performance figures against each other and with ones reported previously
Parallel implementation of a Schwarz Domain Decomposition Algorithm
. We describe and compare some recent domain decomposition algorithms of Schwarz type with respect to parallel performance. A new, robust domain decomposition algorithm -- Additive Average Schwarz is compared with a classical overlapping Schwarz code. Complexity estimates are given in both two and three dimensions and actual implementations are compared on a Paragon machine as well as on a cluster of modern workstations. 1 Introduction This paper is concerned with parallel algorithms for the numerical solution of equations of the form: Find u 2 V (\Omega ) such that N X i=1 Z \Omega i ae i ru \Delta rv dx = Z \Omega fv dx 8v 2 V (\Omega ); (1) in a space V (\Omega ) ae H 1 (\Omega ). Here ae i are positive constants and ¯\Omega = [ N i=1 ¯\Omega i . This problem is important in many fields of application, an example being the study of flow in porous media [2]. Domain decomposition algorithms have received much attention in the last ten years due to their potent..