1,545 research outputs found

    A Fast Parallel Poisson Solver on Irregular Domains Applied to Beam Dynamic Simulations

    Full text link
    We discuss the scalable parallel solution of the Poisson equation within a Particle-In-Cell (PIC) code for the simulation of electron beams in particle accelerators of irregular shape. The problem is discretized by Finite Differences. Depending on the treatment of the Dirichlet boundary the resulting system of equations is symmetric or `mildly' nonsymmetric positive definite. In all cases, the system is solved by the preconditioned conjugate gradient algorithm with smoothed aggregation (SA) based algebraic multigrid (AMG) preconditioning. We investigate variants of the implementation of SA-AMG that lead to considerable improvements in the execution times. We demonstrate good scalability of the solver on distributed memory parallel processor with up to 2048 processors. We also compare our SAAMG-PCG solver with an FFT-based solver that is more commonly used for applications in beam dynamics

    Scalable domain decomposition methods for finite element approximations of transient and electromagnetic problems

    Get PDF
    The main object of study of this thesis is the development of scalable and robust solvers based on domain decomposition (DD) methods for the linear systems arising from the finite element (FE) discretization of transient and electromagnetic problems. The thesis commences with a theoretical review of the curl-conforming edge (or Nédélec) FEs of the first kind and a comprehensive description of a general implementation strategy for h- and p- adaptive elements of arbitrary order on tetrahedral and hexahedral non-conforming meshes. Then, a novel balancing domain decomposition by constraints (BDDC) preconditioner that is robust for multi-material and/or heterogeneous problems posed in curl-conforming spaces is presented. The new method, in contrast to existent approaches, is based on the definition of the ingredients of the preconditioner according to the physical coefficients of the problem and does not require spectral information. The result is a robust and highly scalable preconditioner that preserves the simplicity of the original BDDC method. When dealing with transient problems, the time direction offers itself an opportunity for further parallelization. Aiming to design scalable space-time solvers, first, parallel-in-time parallel methods for linear and non-linear ordinary differential equations (ODEs) are proposed, based on (non-linear) Schur complement efficient solvers of a multilevel partition of the time interval. Then, these ideas are combined with DD concepts in order to design a two-level preconditioner as an extension to space-time of the BDDC method. The key ingredients for these new methods are defined such that they preserve the time causality, i.e., information only travels from the past to the future. The proposed schemes are weakly scalable in time and space-time, i.e., one can efficiently exploit increasing computational resources to solve more time steps in (approximately) the same time-to-solution. All the developments presented herein are motivated by the driving application of the thesis, the 3D simulation of the low-frequency electromagnetic response of High Temperature Superconductors (HTS). Throughout the document, an exhaustive set of numerical experiments, which includes the simulation of a realistic 3D HTS problem, is performed in order to validate the suitability and assess the parallel performance of the High Performance Computing (HPC) implementation of the proposed algorithms.L’objecte principal d’estudi d’aquesta tesi és el desenvolupament de solucionadors escalables i robustos basats en mètodes de descomposició de dominis (DD) per a sistemes lineals que sorgeixen en la discretització mitjançant elements finits (FE) de problemes transitoris i electromagnètics. La tesi comença amb una revisió teòrica dels FE d’eix (o de Nédélec) de la primera família i una descripció exhaustiva d’una estratègia d’implementació general per a elements h- i p-adaptatius d’ordre arbitrari en malles de tetraedres i hexaedres noconformes. Llavors, es presenta un nou precondicionador de descomposició de dominis balancejats per restricció (BDDC) que és robust per a problemes amb múltiples materials i/o heterogenis definits en espais curl-conformes. El nou mètode, en contrast amb els enfocaments existents, està basat en la definició dels ingredients del precondicionador segons els coeficients físics del problema i no requereix informació espectral. El resultat és un precondicionador robust i escalable que preserva la simplicitat del mètode original BDDC. Quan tractem amb problemes transitoris, la direcció temporal ofereix ella mateixa l’oportunitat de seguir explotant paral·lelisme. Amb l’objectiu de dissenyar precondicionadors en espai-temps, primer, proposem solucionadors paral·lels en temps per equacions diferencials lineals i no-lineals, basats en un solucionador eficient del complement de Schur d’una partició multinivell de l’interval de temps. Seguidament, aquestes idees es combinen amb conceptes de DD amb l’objectiu de dissenyar precondicionadors com a extensió a espai-temps dels mètodes de BDDC. Els ingredients clau d’aquests nous mètodes es defineixen de tal manera que preserven la causalitat del temps, on la informació només viatja de temps passats a temps futurs. Els esquemes proposats són dèbilment escalables en temps i en espai-temps, és a dir, es poden explotar eficientment recursos computacionals creixents per resoldre més passos de temps en (aproximadament) el mateix temps transcorregut de càlcul. Tots els desenvolupaments presentats aquí són motivats pel problema d’aplicació de la tesi, la simulació de la resposta electromagnètica de baixa freqüència dels superconductors d’alta temperatura (HTS) en 3D. Al llarg del document, es realitza un conjunt exhaustiu d’experiments numèrics, els quals inclouen la simulació d’un problema de HTS realista en 3D, per validar la idoneïtat i el rendiment paral·lel de la implementació per a computació d’alt rendiment dels algorismes proposatsPostprint (published version

    Performance Modeling and Prediction for the Scalable Solution of Partial Differential Equations on Unstructured Grids

    Get PDF
    This dissertation studies the sources of poor performance in scientific computing codes based on partial differential equations (PDEs), which typically perform at a computational rate well below other scientific simulations (e.g., those with dense linear algebra or N-body kernels) on modern architectures with deep memory hierarchies. We identify that the primary factors responsible for this relatively poor performance are: insufficient available memory bandwidth, low ratio of work to data size (good algorithmic efficiency), and nonscaling cost of synchronization and gather/scatter operations (for a fixed problem size scaling). This dissertation also illustrates how to reuse the legacy scientific and engineering software within a library framework. Specifically, a three-dimensional unstructured grid incompressible Euler code from NASA has been parallelized with the Portable Extensible Toolkit for Scientific Computing (PETSc) library for distributed memory architectures. Using this newly instrumented code (called PETSc-FUN3D) as an example of a typical PDE solver, we demonstrate some strategies that are effective in tolerating the latencies arising from the hierarchical memory system and the network. Even on a single processor from each of the major contemporary architectural families, the PETSc-FUN3D code runs from 2.5 to 7.5 times faster than the legacy code on a medium-sized data set (with approximately 105 degrees of freedom). The major source of performance improvement is the increased locality in data reference patterns achieved through blocking, interlacing, and edge reordering. To explain these performance gains, we provide simple performance models based on memory bandwidth and instruction issue rates. Experimental evidence, in terms of translation lookaside buffer (TLB) and data cache miss rates, achieved memory bandwidth, and graduated floating point instructions per memory reference, is provided through accurate measurements with hardware counters. The performance models and experimental results motivate algorithmic and software practices that lead to improvements in both parallel scalability and per-node performance. We identify the bottlenecks to scalability (algorithmic as well as implementation) for a fixed-size problem when the number of processors grows to several thousands (the expected level of concurrency on terascale architectures). We also evaluate the hybrid programming model (mixed distributed/shared) from a performance standpoint

    Multilevel techniques for Reservoir Simulation

    Get PDF
    corecore