10 research outputs found

    Chaotic multigrid methods for the solution of elliptic equations

    Get PDF
    Supercomputer power has been doubling approximately every 14 months for several decades, increasing the capabilities of scientific modelling at a similar rate. However, to utilize these machines effectively for applications such as computational fluid dynamics, improvements to strong scalability are required. Here, the particular focus is on semi-implicit, viscous-flow CFD, where the largest bottleneck to strong scalability is the parallel solution of the linear pressure-correction equation — an elliptic Poisson equation. State-of-the-art linear solvers, such as Krylov subspace or multigrid methods, provide excellent numerical performance for elliptic equations, but do not scale efficiently due to frequent synchronization between processes. Complete desynchronization is possible for basic, Jacobi-like solvers using the theory of ‘chaotic relaxations’. These non-deterministic, chaotic solvers scale superbly, as demonstrated herein, but lack the numerical performance to converge elliptic equations — even with the relatively lax convergence requirements of the example CFD application. However, these chaotic principles can also be applied to multigrid solvers. In this paper, a ‘chaotic-cycle’ algebraic multigrid method is described and implemented as an open-source library. It is tested on a model Poisson equation, and also within the context of CFD. Two CFD test cases are used: the canonical lid-driven cavity flow and the flow simulation of a ship (KVLCC2). The chaotic-cycle multigrid shows good scalability and numerical performance compared to classical V-, W- and F-cycles. On 2048 cores the chaotic-cycle multigrid solver performs up to faster than Flexible-GMRES and faster than classical V-cycle multigrid. Further improvements to chaotic-cycle multigrid can be made, relating to coarse-grid communications and desynchronized residual computations. It is expected that the chaotic-cycle multigrid could be applied to other scientific fields, wherever a scalable elliptic-equation solver is required

    Nonlinear FETI-DP and BDDC Methods

    Get PDF
    In the simulation of deformation processes in material science the consideration of a microscopic material structure is often necessary, as in the simulation of modern high strength steels. A straightforward finite element discretization of the complete deformed body resolving the microscopic structure leads to very large nonlinear problems and a solution is out of reach, even on modern supercomputers. In homogenization approaches, as the computational scale bridging approach FE2, the macroscopic scale of the deformed object is decoupled from the microscopic scale of the material structure. These approaches only consider the microstructure in a localized fashion on independent and parallel representative volume elements (RVEs). This introduces massive parallelism on the macroscopic level and is thus ideal for modern computer architectures with large numbers of parallel computational cores. Nevertheless, the discretization of an RVE can still result in large nonlinear problems and thus highly scalable parallel solvers are necessary. In this context, nonlinear FETI-DP (Finite Element Tearing and Interconnecting - Dual-Primal) and BDDC (Balancing Domain Decomposition by Constraints) domain decomposition methods are discussed in this thesis, which are parallel solution methods for nonlinear problems arising from a finite element discretization. These approaches can be viewed as a strategies to further localize the computational work and to extend the parallel scalability of classical FETI-DP and BDDC methods towards extreme-scale supercomputers. Also variants providing an inexact solution of the FETI-DP coarse problem are considered in this thesis, combining two successful paradigms, i.e., nonlinear domain decomposition and AMG (Algebraic Multigrid). An efficient implementation of the resulting inexact reduced Nonlinear-FETI-DP-1 method is presented and scalability beyond 200,000 computational cores is showed. Finally, a highly scalable FE2 implementation using recent inexact reduced FETI-DP methods to solve the RVE problems on the microscopic level is presented and scalability on all 458,752 cores of the JUQUEEN BlueGene/Q system at Forschungszentrum JĂŒlich is demonstrated

    Algebraic analysis of aggregation-based multigrid

    No full text
    info:eu-repo/semantics/nonPublishe

    Algebraic analysis of aggregation-based multigrid

    No full text
    A convergence analysis of two-grid methods based on coarsening by (unsmoothed) aggregation is presented. For diagonally dominant symmetric (M-)matrices, it is shown that the analysis can be conducted locally; that is, the convergence factor can be bounded above by computing separately for each aggregate a parameter, which in some sense measures its quality. The procedure is purely algebraic and can be used to control a posteriori the quality of automatic coarsening algorithms. Assuming the aggregation pattern is sufficiently regular, it is further shown that the resulting bound is asymptotically sharp for a large class of elliptic boundary value problems, including problems with variable and discontinuous coefficients. In particular, the analysis of typical examples shows that the convergence rate is insensitive to discontinuities under some reasonable assumptions on the aggregation scheme

    Analysis of an aggregation‐based algebraic two‐grid method for a rotated anisotropic diffusion problem

    No full text
    A two‐grid convergence analysis based on the paper [Algebraic analysis of aggregation‐based multigrid, by A. Napov and Y. Notay, Numer. Lin. Alg. Appl. 18 (2011), pp. 539–564] is derived for various aggregation schemes applied to a finite element discretization of a rotated anisotropic diffusion equation. As expected, it is shown that the best aggregation scheme is one in which aggregates are aligned with the anisotropy. In practice, however, this is not what automatic aggregation procedures do. We suggest approaches for determining appropriate aggregates based on eigenvectors associated with small eigenvalues of a block splitting matrix or based on minimizing a quantity related to the spectral radius of the iteration matrix

    Analysis of an Aggregation-based Algebraic Multigrid Method and its Parallelization

    No full text
    Thesis (Ph.D.)--University of Washington, 2014The interests of this thesis are twofold. First, a two-grid convergence analysis based on the paper [ \textit{Algebraic analysis of aggregation-based multigrid } by A. Napov and Y. Notay, Numer. Lin. Alg. Appl. 18 (2011), pp. 539-564 ] is derived for various aggregation schemes applied to a finite element discretization of a rotated anisotropic diffusion equation. As expected, it is shown that the best aggregation scheme is one in which aggregates are aligned with the anisotropy. In practice, however, this is not what automatic aggregation procedures do. We suggest an approach for determining appropriate aggregates based on eigenvectors associated with small eigenvalues of a block splitting matrix. In the second part of the thesis several issues regarding the parallel implementation of aggregation-based multigrid methods are discussed. The coarsest grid solving stage of multigrid cycles has been a bottleneck for parallel multigrid algorithms to attain a good speedup. A comparison between a parallel linear system direct solver (MUMPS) and a few steps of preconditioned conjugate gradient (PCG) methods for solving the coarsest grid system is carried out and tested on TACC Lonestar multi-processor machine. Regarding the preconditioner of conjugate gradient iterations, a parallel sparse approximate inverse (SAI) algorithm is used to construct an approximate inverse of the original matrix in order to replace the preconditioner solving step, which is inherently sequential, by matrix-vector multiplications. The linear systems tested arise from discretization of 2D or 3D partial differential equations, which are symmetric positive definite. The results exhibit that using PCG on the coarsest grid attains better speedup and overall better performance than MUMPS when the number of processors is greater than about 100. The effects of different decompositions of the physical domain (rows/slab versus blocks/pencils) on the scaling and efficiency of aggregation-based algebraic multigrid are also studied and one sees that the blocks/pencils decomposition of the physical domain reduces the amount of communication and hence has better performance
    corecore