5,985 research outputs found
Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption
In this paper we give new results on domain decomposition preconditioners for
GMRES when computing piecewise-linear finite-element approximations of the
Helmholtz equation , with
absorption parameter . Multigrid approximations of
this equation with are commonly used as preconditioners
for the pure Helmholtz case (). However a rigorous theory for
such (so-called "shifted Laplace") preconditioners, either for the pure
Helmholtz equation, or even the absorptive equation (), is
still missing. We present a new theory for the absorptive equation that
provides rates of convergence for (left- or right-) preconditioned GMRES, via
estimates of the norm and field of values of the preconditioned matrix. This
theory uses a - and -explicit coercivity result for the
underlying sesquilinear form and shows, for example, that if , then classical overlapping additive Schwarz will perform optimally for
the absorptive problem, provided the subdomain and coarse mesh diameters are
carefully chosen. Extensive numerical experiments are given that support the
theoretical results. The theory for the absorptive case gives insight into how
its domain decomposition approximations perform as preconditioners for the pure
Helmholtz case . At the end of the paper we propose a
(scalable) multilevel preconditioner for the pure Helmholtz problem that has an
empirical computation time complexity of about for
solving finite element systems of size , where we have
chosen the mesh diameter to avoid the pollution effect.
Experiments on problems with , i.e. a fixed number of grid points
per wavelength, are also given
Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids
This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework.
Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates.
Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint
A new generalized domain decomposition strategy for the efïŹcient parallel solution of the FDS-pressure equation. Part I: Theory, Concept and Implementation
Due to steadily increasing problem sizes and accuracy requirements as well as storage restrictions on single-processor systems, the efficient numerical simulation
of realistic fire scenarios can only be obtained on modern high-performance computers based on multi-processor architectures. The transition to those systems
requires the elaborate parallelization of the underlying numerical concepts which must guarantee the same result as a potentially corresponding serial execution and preserve the convergence order of the original serial method. Because
of its low degree of inherent parallelizm, especially the efficient parallelization of the elliptic pressure equation is still a big challenge in many simulation programs for fire-induced flows such as the Fire Dynamics Simulator (FDS). In order to avoid losses of accuracy or numerical instabilities, the parallelization process must definitely take into account the strong global character of the physical pressure. The current parallel FDS solver is based on a relatively coarse-grained parallellization concept which canât guarantee these requirements in all cases.
Therefore, an alternative parallel pressure solver, ScaRC, is proposed which ensures a high degree of global coupling and a good computational performance at the same time. Part I explains the theory, concept and implementation of this
new strategy, whereas Part II describes a series of validation and verification tests to proof its correctness
- âŠ