389 research outputs found
Composing Scalable Nonlinear Algebraic Solvers
Most efficient linear solvers use composable algorithmic components, with the
most common model being the combination of a Krylov accelerator and one or more
preconditioners. A similar set of concepts may be used for nonlinear algebraic
systems, where nonlinear composition of different nonlinear solvers may
significantly improve the time to solution. We describe the basic concepts of
nonlinear composition and preconditioning and present a number of solvers
applicable to nonlinear partial differential equations. We have developed a
software framework in order to easily explore the possible combinations of
solvers. We show that the performance gains from using composed solvers can be
substantial compared with gains from standard Newton-Krylov methods.Comment: 29 pages, 14 figures, 13 table
Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption
In this paper we give new results on domain decomposition preconditioners for
GMRES when computing piecewise-linear finite-element approximations of the
Helmholtz equation , with
absorption parameter . Multigrid approximations of
this equation with are commonly used as preconditioners
for the pure Helmholtz case (). However a rigorous theory for
such (so-called "shifted Laplace") preconditioners, either for the pure
Helmholtz equation, or even the absorptive equation (), is
still missing. We present a new theory for the absorptive equation that
provides rates of convergence for (left- or right-) preconditioned GMRES, via
estimates of the norm and field of values of the preconditioned matrix. This
theory uses a - and -explicit coercivity result for the
underlying sesquilinear form and shows, for example, that if , then classical overlapping additive Schwarz will perform optimally for
the absorptive problem, provided the subdomain and coarse mesh diameters are
carefully chosen. Extensive numerical experiments are given that support the
theoretical results. The theory for the absorptive case gives insight into how
its domain decomposition approximations perform as preconditioners for the pure
Helmholtz case . At the end of the paper we propose a
(scalable) multilevel preconditioner for the pure Helmholtz problem that has an
empirical computation time complexity of about for
solving finite element systems of size , where we have
chosen the mesh diameter to avoid the pollution effect.
Experiments on problems with , i.e. a fixed number of grid points
per wavelength, are also given
Solution strategies for nonlinear conservation laws
Nonlinear conservation laws form the basis for models for a wide range of physical phenomena. Finding an optimal strategy for solving these problems can be challenging, and a good strategy for one problem may fail spectacularly for others. As different problems have different challenging features, exploiting knowledge about the problem structure is a key factor in achieving an efficient solution strategy. Most strategies found in literature for solving nonlinear problems involve a linearization step, usually using Newton's method, which replaces the original nonlinear problem by an iteration process consisting of a series of linear problems. A large effort is then spent on finding a good strategy for solving these linear problems. This involves choosing suitable preconditioners and linear solvers. This approach is in many cases a good choice and a multitude of different methods have been developed. However, the linearization step to some degree involves a loss of information about the original problem. This is not necessarily critical, but in many cases the structure of the nonlinear problem can be exploited to a larger extent than what is possible when working solely on the linearized problem. This may involve knowledge about dominating physical processes and specifically on whether a process is near equilibrium. By using nonlinear preconditioning techniques developed in recent years, certain attractive features such as automatic localization of computations to parts of the problem domain with the highest degree of nonlinearities arise. In the present work, these methods are further refined to obtain a framework for nonlinear preconditioning that also takes into account equilibrium information. This framework is developed mainly in the context of porous media, but in a general manner, allowing for application to a wide range of problems. A scalability study shows that the method is scalable for challenging two-phase flow problems. It is also demonstrated for nonlinear elasticity problems. Some models arising from nonlinear conservation laws are best solved using completely different strategies than the approach outlined above. One such example can be found in the field of surface gravity waves. For special types of nonlinear waves, such as solitary waves and undular bores, the well-known Korteweg-de Vries (KdV) equation has been shown to be a suitable model. This equation has many interesting properties not typical of nonlinear equations which may be exploited in the solver, and strategies usually reserved to linear problems may be applied. In this work includes a comparative study of two discretization methods with highly different properties for this equation
Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids
This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework.
Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates.
Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint
Large-Eddy Simulations of Flow and Heat Transfer in Complex Three-Dimensional Multilouvered Fins
The paper describes the computational procedure and
results from large-eddy simulations in a complex three-dimensional
louver geometry. The three-dimensionality in the
louver geometry occurs along the height of the fin, where the
angled louver transitions to the flat landing and joins with the
tube surface. The transition region is characterized by a swept
leading edge and decreasing flow area between louvers.
Preliminary results show a high energy compact vortex jet
forming in this region. The jet forms in the vicinity of the louver
junction with the flat landing and is drawn under the louver in
the transition region. Its interaction with the surface of the
louver produces vorticity of the opposite sign, which aids in
augmenting heat transfer on the louver surface. The top surface
of the louver in the transition region experiences large velocities
in the vicinity of the surface and exhibits higher heat transfer
coefficients than the bottom surface.Air Conditioning and Refrigeration Project 9
A new generalized domain decomposition strategy for the efïŹcient parallel solution of the FDS-pressure equation. Part I: Theory, Concept and Implementation
Due to steadily increasing problem sizes and accuracy requirements as well as storage restrictions on single-processor systems, the efficient numerical simulation
of realistic fire scenarios can only be obtained on modern high-performance computers based on multi-processor architectures. The transition to those systems
requires the elaborate parallelization of the underlying numerical concepts which must guarantee the same result as a potentially corresponding serial execution and preserve the convergence order of the original serial method. Because
of its low degree of inherent parallelizm, especially the efficient parallelization of the elliptic pressure equation is still a big challenge in many simulation programs for fire-induced flows such as the Fire Dynamics Simulator (FDS). In order to avoid losses of accuracy or numerical instabilities, the parallelization process must definitely take into account the strong global character of the physical pressure. The current parallel FDS solver is based on a relatively coarse-grained parallellization concept which canât guarantee these requirements in all cases.
Therefore, an alternative parallel pressure solver, ScaRC, is proposed which ensures a high degree of global coupling and a good computational performance at the same time. Part I explains the theory, concept and implementation of this
new strategy, whereas Part II describes a series of validation and verification tests to proof its correctness
Domain decomposition preconditioning for the Helmholtz equation: a coarse space based on local Dirichlet-to-Neumann maps
In this thesis, we present a two-level domain decomposition method for the iterative solution of the heterogeneous Helmholtz equation. The Helmholtz equation governs wave propagation and scattering phenomena arising in a wide range of engineering applications. Its discretization with piecewise linear finite elements results in typically large, ill-conditioned, indefinite, and non- Hermitian linear systems of equations, for which standard iterative and direct methods encounter convergence problems. Therefore, especially designed methods are needed. The inherently parallel domain decomposition methods constitute a promising class of preconditioners, as they subdivide the large problems into smaller subproblems and are hence able to cope with many degrees of freedom. An essential element of these methods is a good coarse space. Here, the Helmholtz equation presents a particular challenge, as even slight deviations from the optimal choice can be fatal. We develop a coarse space that is based on local eigenproblems involving the Dirichlet-to-Neumann operator. Our construction is completely automatic, ensuring good convergence rates without the need for parameter tuning. Moreover, it naturally respects local variations in the wave number and is hence suited also for heterogeneous Helmholtz problems. Apart from the question of how to design the coarse space, we also investigate the question of how to incorporate the coarse space into the method. Also here the fact that the stiffness matrix is non-Hermitian and indefinite constitutes a major challenge. The resulting method is parallel by design and its efficiency is investigated for two- and three-dimensional homogeneous and heterogeneous numerical examples
- âŠ