147 research outputs found
Support graph preconditioning for elliptic finite element problems
A relatively new preconditioning technique called support graph preconditioning has
many merits over the traditional incomplete factorization based methods. A major
limitation of this technique is that it is applicable to symmetric diagonally dominant
matrices only. This work presents a technique that can be used to transform
the symmetric positive definite matrices arising from elliptic finite element problems
into symmetric diagonally dominant M-matrices. The basic idea is to approximate
the element gradient matrix by taking the gradients along chosen edges, whose unit
vectors form a new coordinate system. For Lagrangian elements, the rows of the
element gradient matrix in this new coordinate system are scaled edge vectors, thus
a diagonally dominant symmetric semidefinite M-matrix can be generated to approximate
the element stiffness matrix. Depending on the element type, one or more
such coordinate systems are required to obtain a global nonsingular M-matrix. Since
such approximation takes place at the element level, the degradation in the quality
of the preconditioner is only a small constant factor independent of the size of the
problem. This technique of element coordinate transformations applies to a variety of
first order Lagrangian elements. Combination of this technique and other techniques
enables us to construct an M-matrix preconditioner for a wide range of second order
elliptic problems even with higher order elements. Another contribution of this work is the proposal of a new variant of Vaidya’s
support graph preconditioning technique called modified domain partitioned support
graph preconditioners. Numerical experiments are conducted for various second order
elliptic finite element problems, along with performance comparison to the incomplete
factorization based preconditioners. Results show that these support graph preconditioners
are superior when solving ill-conditioned problems. In addition, the domain
partition feature provides inherent parallelism, and initial experiments show a good
potential of parallelization and scalability of these preconditioners
Development of scalable linear solvers for engineering applications
The numerical simulation of modern engineering problems can easily incorporate millions or even billions of unknowns. In several applications, particularly those with diffusive character, sparse linear systems with symmetric positive definite (SPD) matrices need to be solved, and multilevel methods represent common choices for the role of iterative solvers or preconditioners. The weak scalability showed by those techniques is one of the main reasons for their popularity, since it allows the solution of linear systems with growing size without requiring a substantial increase in the computational time and number of iterations. On the other hand, single-level preconditioners such as the adaptive Factorized Sparse Approximate Inverse (aFSAI) might be attractive for reaching strong scalability due to their simpler setup. In this thesis, we propose four multilevel preconditioners based on aFSAI targeting the efficient solution of ill-conditioned SPD systems through parallel computing. The first two novel methods, namely Block Tridiagonal FSAI (BTFSAI) and Domain Decomposition FSAI (DDFSAI), rely on graph reordering techniques and approximate block factorizations carried out by aFSAI. Then, we introduce an extension of the previous techniques called the Multilevel Factorization with Low-Rank corrections (MFLR) that ensures positive definiteness of the Schur complements as well as improves their approximation with the aid of tall-and-skinny correction matrices. Lastly, we present the adaptive Smoothing and Prolongation Algebraic MultiGrid (aSPAMG) preconditioner belonging to the adaptive AMG family that introduces the use of aFSAI as a flexible smoother; three strategies for uncovering the near-null space of the system matrix and two new approaches to dynamically compute the prolongation operator. We assess the performance of the proposed preconditioners through the solution of a set of model problems along with real-world engineering test cases. Moreover, we perform comparisons to other approaches such as aFSAI, ILU (ILUPACK), and BoomerAMG (HYPRE), showing that our new methods prove comparable, if not superior, in many test cases
An algebraic multigrid method for mixed discretizations of the Navier-Stokes equations
Algebraic multigrid (AMG) preconditioners are considered for discretized
systems of partial differential equations (PDEs) where unknowns associated with
different physical quantities are not necessarily co-located at mesh points.
Specifically, we investigate a mixed finite element discretization of
the incompressible Navier-Stokes equations where the number of velocity nodes
is much greater than the number of pressure nodes. Consequently, some velocity
degrees-of-freedom (dofs) are defined at spatial locations where there are no
corresponding pressure dofs. Thus, AMG approaches leveraging this co-located
structure are not applicable. This paper instead proposes an automatic AMG
coarsening that mimics certain pressure/velocity dof relationships of the
discretization. The main idea is to first automatically define coarse
pressures in a somewhat standard AMG fashion and then to carefully (but
automatically) choose coarse velocity unknowns so that the spatial location
relationship between pressure and velocity dofs resembles that on the finest
grid. To define coefficients within the inter-grid transfers, an energy
minimization AMG (EMIN-AMG) is utilized. EMIN-AMG is not tied to specific
coarsening schemes and grid transfer sparsity patterns, and so it is applicable
to the proposed coarsening. Numerical results highlighting solver performance
are given on Stokes and incompressible Navier-Stokes problems.Comment: Submitted to a journa
Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids
This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework.
Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates.
Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint
Scalable domain decomposition methods for finite element approximations of transient and electromagnetic problems
The main object of study of this thesis is the development of scalable and robust solvers based on domain decomposition (DD) methods for the linear systems arising from the finite element (FE) discretization of transient and electromagnetic problems.
The thesis commences with a theoretical review of the curl-conforming edge (or Nédélec) FEs of the first kind and a comprehensive description of a general implementation strategy for h- and p- adaptive elements of arbitrary order on tetrahedral and hexahedral non-conforming meshes. Then, a novel balancing domain decomposition by constraints (BDDC) preconditioner that is robust for multi-material and/or heterogeneous problems posed in curl-conforming spaces is presented. The new method, in contrast to existent approaches, is based on the definition of the ingredients of the preconditioner according to the physical coefficients of the problem and does not require spectral information. The result is a robust and highly scalable preconditioner that preserves the simplicity of the original BDDC method.
When dealing with transient problems, the time direction offers itself an opportunity for further parallelization. Aiming to design scalable space-time solvers, first, parallel-in-time parallel methods for linear and non-linear ordinary differential equations (ODEs) are proposed, based on (non-linear) Schur complement efficient solvers of a multilevel partition of the time interval. Then, these ideas are combined with DD concepts in order to design a two-level preconditioner as an extension to space-time of the BDDC method. The key ingredients for these new methods are defined such that they preserve the time causality, i.e., information only travels from the past to the future. The proposed schemes are weakly scalable in time and space-time, i.e., one can efficiently exploit increasing computational resources to solve more time steps in (approximately) the same time-to-solution.
All the developments presented herein are motivated by the driving application of the thesis, the 3D simulation of the low-frequency electromagnetic response of High Temperature Superconductors (HTS). Throughout the document, an exhaustive set of numerical experiments, which includes the simulation of a realistic 3D HTS problem, is performed in order to validate the suitability and assess the parallel performance of the High Performance Computing (HPC) implementation of the proposed algorithms.L’objecte principal d’estudi d’aquesta tesi és el desenvolupament de solucionadors escalables i robustos basats en mètodes de descomposició de dominis (DD) per a sistemes lineals que sorgeixen en la discretització mitjançant elements finits (FE) de problemes transitoris i electromagnètics.
La tesi comença amb una revisió teòrica dels FE d’eix (o de Nédélec) de la primera famÃlia i una descripció exhaustiva d’una estratègia d’implementació general per a elements h- i p-adaptatius d’ordre arbitrari en malles de tetraedres i hexaedres noconformes.
Llavors, es presenta un nou precondicionador de descomposició de dominis balancejats per restricció (BDDC) que és robust per a problemes amb múltiples materials i/o heterogenis definits en espais curl-conformes. El nou mètode, en contrast amb els enfocaments existents, està basat en la definició dels ingredients del precondicionador segons els coeficients fÃsics del problema i no requereix informació espectral. El resultat és un precondicionador robust i escalable que preserva la simplicitat del mètode original BDDC.
Quan tractem amb problemes transitoris, la direcció temporal ofereix ella mateixa l’oportunitat de seguir explotant paral·lelisme. Amb l’objectiu de dissenyar precondicionadors en espai-temps, primer, proposem solucionadors paral·lels en temps per equacions diferencials lineals i no-lineals, basats en un solucionador eficient del complement de Schur d’una partició multinivell de l’interval de temps. Seguidament, aquestes idees es combinen amb conceptes de DD amb l’objectiu de dissenyar precondicionadors com a extensió a espai-temps dels mètodes de BDDC. Els ingredients clau d’aquests nous mètodes es defineixen de tal manera que preserven la causalitat del temps, on la informació només viatja de temps passats a temps futurs. Els esquemes proposats són dèbilment escalables en temps i en espai-temps, és a dir, es poden explotar eficientment recursos computacionals creixents per resoldre més passos de temps en (aproximadament) el mateix temps transcorregut de cà lcul.
Tots els desenvolupaments presentats aquà són motivats pel problema d’aplicació de la tesi, la simulació de la resposta electromagnètica de baixa freqüència dels superconductors d’alta temperatura (HTS) en 3D. Al llarg del document, es realitza un conjunt exhaustiu d’experiments numèrics, els quals inclouen la simulació d’un problema de HTS realista en 3D, per validar la idoneïtat i el rendiment paral·lel de la implementació per a computació d’alt rendiment dels algorismes proposatsPostprint (published version
- …