22 research outputs found

    Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption

    Get PDF
    In this paper we give new results on domain decomposition preconditioners for GMRES when computing piecewise-linear finite-element approximations of the Helmholtz equation Δu(k2+iε)u=f-\Delta u - (k^2+ {\rm i} \varepsilon)u = f, with absorption parameter εR\varepsilon \in \mathbb{R}. Multigrid approximations of this equation with ε0\varepsilon \not= 0 are commonly used as preconditioners for the pure Helmholtz case (ε=0\varepsilon = 0). However a rigorous theory for such (so-called "shifted Laplace") preconditioners, either for the pure Helmholtz equation, or even the absorptive equation (ε0\varepsilon \not=0), is still missing. We present a new theory for the absorptive equation that provides rates of convergence for (left- or right-) preconditioned GMRES, via estimates of the norm and field of values of the preconditioned matrix. This theory uses a kk- and ε\varepsilon-explicit coercivity result for the underlying sesquilinear form and shows, for example, that if εk2|\varepsilon|\sim k^2, then classical overlapping additive Schwarz will perform optimally for the absorptive problem, provided the subdomain and coarse mesh diameters are carefully chosen. Extensive numerical experiments are given that support the theoretical results. The theory for the absorptive case gives insight into how its domain decomposition approximations perform as preconditioners for the pure Helmholtz case ε=0\varepsilon = 0. At the end of the paper we propose a (scalable) multilevel preconditioner for the pure Helmholtz problem that has an empirical computation time complexity of about O(n4/3)\mathcal{O}(n^{4/3}) for solving finite element systems of size n=O(k3)n=\mathcal{O}(k^3), where we have chosen the mesh diameter hk3/2h \sim k^{-3/2} to avoid the pollution effect. Experiments on problems with hk1h\sim k^{-1}, i.e. a fixed number of grid points per wavelength, are also given

    Product quasi-interpolation in logarithmically singular integral equations

    Get PDF
    A discrete high order method is constructed and justified for a class of Fredholm integral equations of the second kind with kernels that may have boundary and logarithmic diagonal singularities. The method is based on the improving the boundary behaviour of the kernel with the help of a change of variables, and on the product integration using quasi-interpolation by smooth splines of order m. Properties of different proposed calculation schemes are compared through numerical experiments using, in particular, variable precision interval arithmetics

    Fortran95 ja MPI

    Get PDF

    Petascale solvers for anisotropic PDEs in atmospheric modelling on GPU clusters

    Get PDF
    Memory bound applications such as solvers for large sparse systems of equations remain a challenge for GPUs. Fast solvers should be based on numerically efficient algorithms and implemented such that global memory access is minimised. To solve systems with trillions (\order(10^{12})) unknowns the code has to make efficient use of several million individual processor cores on large GPU clusters.We describe the multi-GPU implementation of two algorithmically optimal iterative solvers for anisotropic PDEs which are encountered in (semi-) implicit time stepping procedures in atmospheric modelling. In this application the condition number is large but independent of the grid resolution and both methods are asymptotically optimal, albeit with different absolute performance. In particular, an important constant in the discretisation is the CFL number; only the multigrid solver is robust to changes in this constant. We parallelise the solvers and adapt them to the specific features of GPU architectures, paying particular attention to efficient global memory access. We achieve a performance of up to 0.78 PFLOPs when solving an equation with 0.5510120.55\cdot 10^{12} unknowns on 16384 GPUs; this corresponds to about 3%3\% of the theoretical peak performance of the machine and we use more than 40%40\% of the peak memory bandwidth with a Conjugate Gradient (CG) solver. Although the other solver, a geometric multigrid algorithm, has a slightly worse performance in terms of FLOPs per second, overall it is faster as it needs less iterations to converge; the multigrid algorithm can solve a linear PDE with half a trillion unknowns in about one second

    Parallel Privacy-Preserving Shortest Path Algorithms

    No full text
    In this paper, we propose and present secure multiparty computation (SMC) protocols for single-source shortest distance (SSSD) and all-pairs shortest distance (APSD) in sparse and dense graphs. Our protocols follow the structure of classical algorithms—Bellman–Ford and Dijkstra for SSSD; Johnson, Floyd–Warshall, and transitive closure for APSD. As the computational platforms offered by SMC protocol sets have performance profiles that differ from typical processors, we had to perform extensive changes to the structure (including their control flow and memory accesses) and the details of these algorithms in order to obtain good performance. We implemented our protocols on top of the secret sharing based protocol set offered by the Sharemind SMC platform, using single-instruction-multiple-data (SIMD) operations as much as possible to reduce the round complexity. We benchmarked our protocols under several different parameters for network performance and compared our performance figures against each other and with ones reported previously

    Parallel Privacy-Preserving Shortest Path Algorithms

    No full text
    In this paper, we propose and present secure multiparty computation (SMC) protocols for single-source shortest distance (SSSD) and all-pairs shortest distance (APSD) in sparse and dense graphs. Our protocols follow the structure of classical algorithms—Bellman–Ford and Dijkstra for SSSD; Johnson, Floyd–Warshall, and transitive closure for APSD. As the computational platforms offered by SMC protocol sets have performance profiles that differ from typical processors, we had to perform extensive changes to the structure (including their control flow and memory accesses) and the details of these algorithms in order to obtain good performance. We implemented our protocols on top of the secret sharing based protocol set offered by the Sharemind SMC platform, using single-instruction-multiple-data (SIMD) operations as much as possible to reduce the round complexity. We benchmarked our protocols under several different parameters for network performance and compared our performance figures against each other and with ones reported previously

    Parallel implementation of a Schwarz Domain Decomposition Algorithm

    No full text
    . We describe and compare some recent domain decomposition algorithms of Schwarz type with respect to parallel performance. A new, robust domain decomposition algorithm -- Additive Average Schwarz is compared with a classical overlapping Schwarz code. Complexity estimates are given in both two and three dimensions and actual implementations are compared on a Paragon machine as well as on a cluster of modern workstations. 1 Introduction This paper is concerned with parallel algorithms for the numerical solution of equations of the form: Find u 2 V (\Omega ) such that N X i=1 Z \Omega i ae i ru \Delta rv dx = Z \Omega fv dx 8v 2 V (\Omega ); (1) in a space V (\Omega ) ae H 1 (\Omega ). Here ae i are positive constants and ¯\Omega = [ N i=1 ¯\Omega i . This problem is important in many fields of application, an example being the study of flow in porous media [2]. Domain decomposition algorithms have received much attention in the last ten years due to their potent..
    corecore