407 research outputs found

    Composing Scalable Nonlinear Algebraic Solvers

    Get PDF
    Most efficient linear solvers use composable algorithmic components, with the most common model being the combination of a Krylov accelerator and one or more preconditioners. A similar set of concepts may be used for nonlinear algebraic systems, where nonlinear composition of different nonlinear solvers may significantly improve the time to solution. We describe the basic concepts of nonlinear composition and preconditioning and present a number of solvers applicable to nonlinear partial differential equations. We have developed a software framework in order to easily explore the possible combinations of solvers. We show that the performance gains from using composed solvers can be substantial compared with gains from standard Newton-Krylov methods.Comment: 29 pages, 14 figures, 13 table

    Asynchronous Stabilisation and Assembly Techniques for Additive Multigrid

    Get PDF
    Multigrid solvers are among the best solvers in the world, but once applied in the real world there are issues they must overcome. Many multigrid phases exhibit low concurrency. Mesh and matrix assembly are challenging to parallelise and introduce algorithmic latency. Dynamically adaptive codes exacerbate these issues. Multigrid codes require the computation of a cascade of matrices and dynamic adaptivity means these matrices are recomputed throughout the solve. Existing methods to compute the matrices are expensive and delay the solve. Non- trivial material parameters further increase the cost of accurate equation integration. We propose to assemble all matrix equations as stencils in a delayed element-wise fashion. Early multigrid iterations use cheap geometric approximations and more accurate updated stencil integrations are computed in parallel with the multigrid cycles. New stencil integrations are evaluated lazily and asynchronously fed to the solver once they become available. They do not delay multigrid iterations. We deploy stencil integrations as parallel tasks that are picked up by cores that would otherwise be idle. Coarse grid solves in multiplicative multigrid also exhibit limited concurrency. Small coarse mesh sizes correspond to small computational workload and require costly synchronisation steps. This acts as a bottleneck and delays solver iterations. Additive multigrid avoids this restriction, but becomes unstable for non-trivial material parameters as additive coarse grid levels tend to overcorrect. This leads to oscillations. We propose a new additive variant, adAFAC-x, with a stabilisation parameter that damps coarse grid corrections to remove oscillations. Per-level we solve an additional equation that produces an auxiliary correction. The auxiliary correction can be computed additively to the rest of the solve and uses ideas similar to smoothed aggregation multigrid to anticipate overcorrections. Pipelining techniques allow adAFAC-x to be written using single-touch semantics on a dynamically adaptive mesh

    Adaptive control in rollforward recovery for extreme scale multigrid

    Full text link
    With the increasing number of compute components, failures in future exa-scale computer systems are expected to become more frequent. This motivates the study of novel resilience techniques. Here, we extend a recently proposed algorithm-based recovery method for multigrid iterations by introducing an adaptive control. After a fault, the healthy part of the system continues the iterative solution process, while the solution in the faulty domain is re-constructed by an asynchronous on-line recovery. The computations in both the faulty and healthy subdomains must be coordinated in a sensitive way, in particular, both under and over-solving must be avoided. Both of these waste computational resources and will therefore increase the overall time-to-solution. To control the local recovery and guarantee an optimal re-coupling, we introduce a stopping criterion based on a mathematical error estimator. It involves hierarchical weighted sums of residuals within the context of uniformly refined meshes and is well-suited in the context of parallel high-performance computing. The re-coupling process is steered by local contributions of the error estimator. We propose and compare two criteria which differ in their weights. Failure scenarios when solving up to 6.9â‹…10116.9\cdot10^{11} unknowns on more than 245\,766 parallel processes will be reported on a state-of-the-art peta-scale supercomputer demonstrating the robustness of the method
    • …
    corecore