5,985 research outputs found

    Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption

    Get PDF
    In this paper we give new results on domain decomposition preconditioners for GMRES when computing piecewise-linear finite-element approximations of the Helmholtz equation −Δu−(k2+iΔ)u=f-\Delta u - (k^2+ {\rm i} \varepsilon)u = f, with absorption parameter Δ∈R\varepsilon \in \mathbb{R}. Multigrid approximations of this equation with Δ=Ìž0\varepsilon \not= 0 are commonly used as preconditioners for the pure Helmholtz case (Δ=0\varepsilon = 0). However a rigorous theory for such (so-called "shifted Laplace") preconditioners, either for the pure Helmholtz equation, or even the absorptive equation (Δ=Ìž0\varepsilon \not=0), is still missing. We present a new theory for the absorptive equation that provides rates of convergence for (left- or right-) preconditioned GMRES, via estimates of the norm and field of values of the preconditioned matrix. This theory uses a kk- and Δ\varepsilon-explicit coercivity result for the underlying sesquilinear form and shows, for example, that if âˆŁÎ”âˆŁâˆŒk2|\varepsilon|\sim k^2, then classical overlapping additive Schwarz will perform optimally for the absorptive problem, provided the subdomain and coarse mesh diameters are carefully chosen. Extensive numerical experiments are given that support the theoretical results. The theory for the absorptive case gives insight into how its domain decomposition approximations perform as preconditioners for the pure Helmholtz case Δ=0\varepsilon = 0. At the end of the paper we propose a (scalable) multilevel preconditioner for the pure Helmholtz problem that has an empirical computation time complexity of about O(n4/3)\mathcal{O}(n^{4/3}) for solving finite element systems of size n=O(k3)n=\mathcal{O}(k^3), where we have chosen the mesh diameter h∌k−3/2h \sim k^{-3/2} to avoid the pollution effect. Experiments on problems with h∌k−1h\sim k^{-1}, i.e. a fixed number of grid points per wavelength, are also given

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    A new generalized domain decomposition strategy for the efïŹcient parallel solution of the FDS-pressure equation. Part I: Theory, Concept and Implementation

    Get PDF
    Due to steadily increasing problem sizes and accuracy requirements as well as storage restrictions on single-processor systems, the efficient numerical simulation of realistic fire scenarios can only be obtained on modern high-performance computers based on multi-processor architectures. The transition to those systems requires the elaborate parallelization of the underlying numerical concepts which must guarantee the same result as a potentially corresponding serial execution and preserve the convergence order of the original serial method. Because of its low degree of inherent parallelizm, especially the efficient parallelization of the elliptic pressure equation is still a big challenge in many simulation programs for fire-induced flows such as the Fire Dynamics Simulator (FDS). In order to avoid losses of accuracy or numerical instabilities, the parallelization process must definitely take into account the strong global character of the physical pressure. The current parallel FDS solver is based on a relatively coarse-grained parallellization concept which can’t guarantee these requirements in all cases. Therefore, an alternative parallel pressure solver, ScaRC, is proposed which ensures a high degree of global coupling and a good computational performance at the same time. Part I explains the theory, concept and implementation of this new strategy, whereas Part II describes a series of validation and verification tests to proof its correctness
    • 

    corecore