389 research outputs found

    Composing Scalable Nonlinear Algebraic Solvers

    Get PDF
    Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods.Comment: 29 pages, 14 figures, 13 table

    Domain Decomposition preconditioning for high-frequency Helmholtz problems with absorption

    Get PDF
    In this paper we give new results on domain decomposition preconditioners for GMRES when computing piecewise-linear finite-element approximations of the Helmholtz equation −Δu−(k2+iΔ)u=f-\Delta u - (k^2+ {\rm i} \varepsilon)u = f, with absorption parameter Δ∈R\varepsilon \in \mathbb{R}. Multigrid approximations of this equation with Δ=Ìž0\varepsilon \not= 0 are commonly used as preconditioners for the pure Helmholtz case (Δ=0\varepsilon = 0). However a rigorous theory for such (so-called "shifted Laplace") preconditioners, either for the pure Helmholtz equation, or even the absorptive equation (Δ=Ìž0\varepsilon \not=0), is still missing. We present a new theory for the absorptive equation that provides rates of convergence for (left- or right-) preconditioned GMRES, via estimates of the norm and field of values of the preconditioned matrix. This theory uses a kk- and Δ\varepsilon-explicit coercivity result for the underlying sesquilinear form and shows, for example, that if âˆŁÎ”âˆŁâˆŒk2|\varepsilon|\sim k^2, then classical overlapping additive Schwarz will perform optimally for the absorptive problem, provided the subdomain and coarse mesh diameters are carefully chosen. Extensive numerical experiments are given that support the theoretical results. The theory for the absorptive case gives insight into how its domain decomposition approximations perform as preconditioners for the pure Helmholtz case Δ=0\varepsilon = 0. At the end of the paper we propose a (scalable) multilevel preconditioner for the pure Helmholtz problem that has an empirical computation time complexity of about O(n4/3)\mathcal{O}(n^{4/3}) for solving finite element systems of size n=O(k3)n=\mathcal{O}(k^3), where we have chosen the mesh diameter h∌k−3/2h \sim k^{-3/2} to avoid the pollution effect. Experiments on problems with h∌k−1h\sim k^{-1}, i.e. a fixed number of grid points per wavelength, are also given

    Solution strategies for nonlinear conservation laws

    Get PDF
    Nonlinear conservation laws form the basis for models for a wide range of physical phenomena. Finding an optimal strategy for solving these problems can be challenging, and a good strategy for one problem may fail spectacularly for others. As different problems have different challenging features, exploiting knowledge about the problem structure is a key factor in achieving an efficient solution strategy. Most strategies found in literature for solving nonlinear problems involve a linearization step, usually using Newton's method, which replaces the original nonlinear problem by an iteration process consisting of a series of linear problems. A large effort is then spent on finding a good strategy for solving these linear problems. This involves choosing suitable preconditioners and linear solvers. This approach is in many cases a good choice and a multitude of different methods have been developed. However, the linearization step to some degree involves a loss of information about the original problem. This is not necessarily critical, but in many cases the structure of the nonlinear problem can be exploited to a larger extent than what is possible when working solely on the linearized problem. This may involve knowledge about dominating physical processes and specifically on whether a process is near equilibrium. By using nonlinear preconditioning techniques developed in recent years, certain attractive features such as automatic localization of computations to parts of the problem domain with the highest degree of nonlinearities arise. In the present work, these methods are further refined to obtain a framework for nonlinear preconditioning that also takes into account equilibrium information. This framework is developed mainly in the context of porous media, but in a general manner, allowing for application to a wide range of problems. A scalability study shows that the method is scalable for challenging two-phase flow problems. It is also demonstrated for nonlinear elasticity problems. Some models arising from nonlinear conservation laws are best solved using completely different strategies than the approach outlined above. One such example can be found in the field of surface gravity waves. For special types of nonlinear waves, such as solitary waves and undular bores, the well-known Korteweg-de Vries (KdV) equation has been shown to be a suitable model. This equation has many interesting properties not typical of nonlinear equations which may be exploited in the solver, and strategies usually reserved to linear problems may be applied. In this work includes a comparative study of two discretization methods with highly different properties for this equation

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    Large-Eddy Simulations of Flow and Heat Transfer in Complex Three-Dimensional Multilouvered Fins

    Get PDF
    The paper describes the computational procedure and results from large-eddy simulations in a complex three-dimensional louver geometry. The three-dimensionality in the louver geometry occurs along the height of the fin, where the angled louver transitions to the flat landing and joins with the tube surface. The transition region is characterized by a swept leading edge and decreasing flow area between louvers. Preliminary results show a high energy compact vortex jet forming in this region. The jet forms in the vicinity of the louver junction with the flat landing and is drawn under the louver in the transition region. Its interaction with the surface of the louver produces vorticity of the opposite sign, which aids in augmenting heat transfer on the louver surface. The top surface of the louver in the transition region experiences large velocities in the vicinity of the surface and exhibits higher heat transfer coefficients than the bottom surface.Air Conditioning and Refrigeration Project 9

    A new generalized domain decomposition strategy for the efïŹcient parallel solution of the FDS-pressure equation. Part I: Theory, Concept and Implementation

    Get PDF
    Due to steadily increasing problem sizes and accuracy requirements as well as storage restrictions on single-processor systems, the efficient numerical simulation of realistic fire scenarios can only be obtained on modern high-performance computers based on multi-processor architectures. The transition to those systems requires the elaborate parallelization of the underlying numerical concepts which must guarantee the same result as a potentially corresponding serial execution and preserve the convergence order of the original serial method. Because of its low degree of inherent parallelizm, especially the efficient parallelization of the elliptic pressure equation is still a big challenge in many simulation programs for fire-induced flows such as the Fire Dynamics Simulator (FDS). In order to avoid losses of accuracy or numerical instabilities, the parallelization process must definitely take into account the strong global character of the physical pressure. The current parallel FDS solver is based on a relatively coarse-grained parallellization concept which can’t guarantee these requirements in all cases. Therefore, an alternative parallel pressure solver, ScaRC, is proposed which ensures a high degree of global coupling and a good computational performance at the same time. Part I explains the theory, concept and implementation of this new strategy, whereas Part II describes a series of validation and verification tests to proof its correctness

    Domain decomposition preconditioning for the Helmholtz equation: a coarse space based on local Dirichlet-to-Neumann maps

    Get PDF
    In this thesis, we present a two-level domain decomposition method for the iterative solution of the heterogeneous Helmholtz equation. The Helmholtz equation governs wave propagation and scattering phenomena arising in a wide range of engineering applications. Its discretization with piecewise linear finite elements results in typically large, ill-conditioned, indefinite, and non- Hermitian linear systems of equations, for which standard iterative and direct methods encounter convergence problems. Therefore, especially designed methods are needed. The inherently parallel domain decomposition methods constitute a promising class of preconditioners, as they subdivide the large problems into smaller subproblems and are hence able to cope with many degrees of freedom. An essential element of these methods is a good coarse space. Here, the Helmholtz equation presents a particular challenge, as even slight deviations from the optimal choice can be fatal. We develop a coarse space that is based on local eigenproblems involving the Dirichlet-to-Neumann operator. Our construction is completely automatic, ensuring good convergence rates without the need for parameter tuning. Moreover, it naturally respects local variations in the wave number and is hence suited also for heterogeneous Helmholtz problems. Apart from the question of how to design the coarse space, we also investigate the question of how to incorporate the coarse space into the method. Also here the fact that the stiffness matrix is non-Hermitian and indefinite constitutes a major challenge. The resulting method is parallel by design and its efficiency is investigated for two- and three-dimensional homogeneous and heterogeneous numerical examples
    • 

    corecore