244 research outputs found

    Dynamic task fusion for a block-structured finite volume solver over a dynamically adaptive mesh with local time stepping

    Get PDF
    Load balancing of generic wave equation solvers over dynamically adaptive meshes with local time stepping is dicult, as the load changes with every time step. Task-based programming promises to mitigate the load balancing problem. We study a Finite Volume code over dynamically adaptive block-structured meshes for two astrophysics simulations, where the patches (blocks) dene tasks. They are classied into urgent and low priority tasks. Urgent tasks are algorithmically latencysensitive. They are processed directly as part of our bulk-synchronous mesh traversals. Non-urgent tasks are held back in an additional task queue on top of the task runtime system. If they lack global side-eects, i.e. do not alter the global solver state, we can generate optimised compute kernels for these tasks. Furthermore, we propose to use the additional queue to merge tasks without side-eects into task assemblies, and to balance out imbalanced bulk synchronous processing phases

    Efficient Generating And Processing Of Large-Scale Unstructured Meshes

    Get PDF
    Unstructured meshes are used in a variety of disciplines to represent simulations and experimental data. Scientists who want to increase accuracy of simulations by increasing resolution must also increase the size of the resulting dataset. However, generating and processing a extremely large unstructured meshes remains a barrier. Researchers have published many parallel Delaunay triangulation (DT) algorithms, often focusing on partitioning the initial mesh domain, so that each rectangular partition can be triangulated in parallel. However, the comproblems for this method is how to merge all triangulated partitions into a single domain-wide mesh or the significant cost for communication the sub-region borders. We devised a novel algorithm --Triangulation of Independent Partitions in Parallel (TIPP) to deal with very large DT problems without requiring inter-processor communication while still guaranteeing the Delaunay criteria. The core of the algorithm is to find a set of independent} partitions such that the circumcircles of triangles in one partition do not enclose any vertex in other partitions. For this reason, this set of independent partitions can be triangulated in parallel without affecting each other. The results of mesh generation is the large unstructured meshes including vertex index and vertex coordinate files which introduce a new challenge \-- locality. Partitioning unstructured meshes to improve locality is a key part of our own approach. Elements that were widely scattered in the original dataset are grouped together, speeding data access. For further improve unstructured mesh partitioning, we also described our new approach. Direct Load which mitigates the challenges of unstructured meshes by maximizing the proportion of useful data retrieved during each read from disk, which in turn reduces the total number of read operations, boosting performance

    Tetrahedral-Mesh Simulation of Turbulent Flows with the Space-Time Conservative Schemes

    Get PDF
    Direct numerical simulations of turbulent flows are predominantly carried out using structured, hexahedral meshes despite decades of development in unstructured mesh methods. Tetrahedral meshes offer ease of mesh generation around complex geometries and the potential of an orientation free grid that would provide un-biased small-scale dissipation and more accurate intermediate scale solutions. However, due to the lack of consistent multi-dimensional numerical formulations in conventional schemes for triangular and tetrahedral meshes at the cell interfaces, numerical issues exist when flow discontinuities or stagnation regions are present. The space-time conservative conservation element solution element (CESE) method - due to its Riemann-solver-free shock capturing capabilities, non-dissipative baseline schemes, and flux conservation in time as well as space - has the potential to more accurately simulate turbulent flows using unstructured tetrahedral meshes. To pave the way towards accurate simulation of shock/turbulent boundary-layer interaction, a series of wave and shock interaction benchmark problems that increase in complexity, are computed in this paper with triangular/tetrahedral meshes. Preliminary computations for the normal shock/turbulence interactions are carried out with a relatively coarse mesh, by direct numerical simulations standards, in order to assess other effects such as boundary conditions and the necessity of a buffer domain. The results indicate that qualitative agreement with previous studies can be obtained for flows where, strong shocks co-exist along with unsteady waves that display a broad range of scales, with a relatively compact computational domain and less stringent requirements for grid clustering near the shock. With the space-time conservation properties, stable solutions without any spurious wave reflections can be obtained without a need for buffer domains near the outflow/farfield boundaries. Computational results for the isotropic turbulent flow decay, at a relatively high turbulent Mach number, show a nicely behaved spectral decay rate for medium to high wave numbers. The high-order CESE schemes offer very robust solutions even with the presence of strong shocks or widespread shocklets. The explicit formulation in conjunction with a close to unity theoretical upper Courant number bound has the potential to offer an efficient numerical framework for general compressible turbulent flow simulations with unstructured meshes

    A Parallel Geometric Multigrid Method for Adaptive Finite Elements

    Get PDF
    Applications in a variety of scientific disciplines use systems of Partial Differential Equations (PDEs) to model physical phenomena. Numerical solutions to these models are often found using the Finite Element Method (FEM), where the problem is discretized and the solution of a large linear system is required, containing millions or even billions of unknowns. Often times, the domain of these solves will contain localized features that require very high resolution of the underlying finite element mesh to accurately solve, while a mesh with uniform resolution would require far too much computational time and memory overhead to be feasible on a modern machine. Therefore, techniques like adaptive mesh refinement, where one increases the resolution of the mesh only where it is necessary, must be used. Even with adaptive mesh refinement, these systems can still be on the order of much more than a million unknowns (large mantle convection applications like the ones in [90] show simulations on over 600 billion unknowns), and attempting to solve on a single processing unit is infeasible due to limited computational time and memory required. For this reason, any application code aimed at solving large problems must be built using a parallel framework, allowing the concurrent use of multiple processing units to solve a single problem, and the code must exhibit efficient scaling to large amounts of processing units. Multigrid methods are currently the only known optimal solvers for linear systems arising from discretizations of elliptic boundary valued problems. These methods can be represented as an iterative scheme with contraction number less than one, independent of the resolution of the discretization [24, 54, 25, 103], with optimal complexity in the number of unknowns in the system [29]. Geometric multigrid (GMG) methods, where the hierarchy of spaces are defined by linear systems of finite element discretizations on meshes of decreasing resolution, have been shown to be robust for many different problem formulations, giving mesh independent convergence for highly adaptive meshes [26, 61, 83, 18], but these methods require specific implementations for each type of equation, boundary condition, mesh, etc., required by the specific application. The implementation in a massively parallel environment is not obvious, and research into this topic is far from exhaustive. We present an implementation of a massively parallel, adaptive geometric multigrid (GMG) method used in the open-source finite element library deal.II [5], and perform extensive tests showing scaling of the v-cycle application on systems with up to 137 billion unknowns run on up to 65,536 processors, and demonstrating low communication overhead of the algorithms proposed. We then show the flexibility of the GMG by applying the method to four different PDE systems: the Poisson equation, linear elasticity, advection-diffusion, and the Stokes equations. For the Stokes equations, we implement a fully matrix-free, adaptive, GMG-based solver in the mantle convection code ASPECT [13], and give a comparison to the current matrix-based method used. We show improvements in robustness, parallel scaling, and memory consumption for simulations with up to 27 billion unknowns and 114,688 processors. Finally, we test the performance of IDR(s) methods compared to the FGMRES method currently used in ASPECT, showing the effects of the flexible preconditioning used for the Stokes solves in ASPECT, and the demonstrating the possible reduction in memory consumption for IDR(s) and the potential for solving large scale problems. Parts of the work in this thesis has been submitted to peer reviewed journals in the form of two publications ([36] and [34]), and the implementations discussed have been integrated into two open-source codes, deal.II and ASPECT. From the contributions to deal.II, including a full length tutorial program, Step-63 [35], the author is listed as a contributing author to the newest deal.II release (see [5]). The implementation into ASPECT is based on work from the author and Timo Heister. The goal for the work here is to enable the community of geoscientists using ASPECT to solve larger problems than currently possible. Over the course of this thesis, the author was partially funded by the NSF Award OAC-1835452 and by the Computational Infrastructure in Geodynamics initiative (CIG), through the NSF under Award EAR-0949446 and EAR-1550901 and The University of California -- Davis

    Scalable domain decomposition methods for finite element approximations of transient and electromagnetic problems

    Get PDF
    The main object of study of this thesis is the development of scalable and robust solvers based on domain decomposition (DD) methods for the linear systems arising from the finite element (FE) discretization of transient and electromagnetic problems. The thesis commences with a theoretical review of the curl-conforming edge (or Nédélec) FEs of the first kind and a comprehensive description of a general implementation strategy for h- and p- adaptive elements of arbitrary order on tetrahedral and hexahedral non-conforming meshes. Then, a novel balancing domain decomposition by constraints (BDDC) preconditioner that is robust for multi-material and/or heterogeneous problems posed in curl-conforming spaces is presented. The new method, in contrast to existent approaches, is based on the definition of the ingredients of the preconditioner according to the physical coefficients of the problem and does not require spectral information. The result is a robust and highly scalable preconditioner that preserves the simplicity of the original BDDC method. When dealing with transient problems, the time direction offers itself an opportunity for further parallelization. Aiming to design scalable space-time solvers, first, parallel-in-time parallel methods for linear and non-linear ordinary differential equations (ODEs) are proposed, based on (non-linear) Schur complement efficient solvers of a multilevel partition of the time interval. Then, these ideas are combined with DD concepts in order to design a two-level preconditioner as an extension to space-time of the BDDC method. The key ingredients for these new methods are defined such that they preserve the time causality, i.e., information only travels from the past to the future. The proposed schemes are weakly scalable in time and space-time, i.e., one can efficiently exploit increasing computational resources to solve more time steps in (approximately) the same time-to-solution. All the developments presented herein are motivated by the driving application of the thesis, the 3D simulation of the low-frequency electromagnetic response of High Temperature Superconductors (HTS). Throughout the document, an exhaustive set of numerical experiments, which includes the simulation of a realistic 3D HTS problem, is performed in order to validate the suitability and assess the parallel performance of the High Performance Computing (HPC) implementation of the proposed algorithms.L’objecte principal d’estudi d’aquesta tesi és el desenvolupament de solucionadors escalables i robustos basats en mètodes de descomposició de dominis (DD) per a sistemes lineals que sorgeixen en la discretització mitjançant elements finits (FE) de problemes transitoris i electromagnètics. La tesi comença amb una revisió teòrica dels FE d’eix (o de Nédélec) de la primera família i una descripció exhaustiva d’una estratègia d’implementació general per a elements h- i p-adaptatius d’ordre arbitrari en malles de tetraedres i hexaedres noconformes. Llavors, es presenta un nou precondicionador de descomposició de dominis balancejats per restricció (BDDC) que és robust per a problemes amb múltiples materials i/o heterogenis definits en espais curl-conformes. El nou mètode, en contrast amb els enfocaments existents, està basat en la definició dels ingredients del precondicionador segons els coeficients físics del problema i no requereix informació espectral. El resultat és un precondicionador robust i escalable que preserva la simplicitat del mètode original BDDC. Quan tractem amb problemes transitoris, la direcció temporal ofereix ella mateixa l’oportunitat de seguir explotant paral·lelisme. Amb l’objectiu de dissenyar precondicionadors en espai-temps, primer, proposem solucionadors paral·lels en temps per equacions diferencials lineals i no-lineals, basats en un solucionador eficient del complement de Schur d’una partició multinivell de l’interval de temps. Seguidament, aquestes idees es combinen amb conceptes de DD amb l’objectiu de dissenyar precondicionadors com a extensió a espai-temps dels mètodes de BDDC. Els ingredients clau d’aquests nous mètodes es defineixen de tal manera que preserven la causalitat del temps, on la informació només viatja de temps passats a temps futurs. Els esquemes proposats són dèbilment escalables en temps i en espai-temps, és a dir, es poden explotar eficientment recursos computacionals creixents per resoldre més passos de temps en (aproximadament) el mateix temps transcorregut de càlcul. Tots els desenvolupaments presentats aquí són motivats pel problema d’aplicació de la tesi, la simulació de la resposta electromagnètica de baixa freqüència dels superconductors d’alta temperatura (HTS) en 3D. Al llarg del document, es realitza un conjunt exhaustiu d’experiments numèrics, els quals inclouen la simulació d’un problema de HTS realista en 3D, per validar la idoneïtat i el rendiment paral·lel de la implementació per a computació d’alt rendiment dels algorismes proposatsPostprint (published version

    Higher-Order DGFEM Transport Calculations on Polytope Meshes for Massively-Parallel Architectures

    Get PDF
    In this dissertation, we develop improvements to the discrete ordinates (S_N) neutron transport equation using a Discontinuous Galerkin Finite Element Method (DGFEM) spatial discretization on arbitrary polytope (polygonal and polyhedral) grids compatible for massively-parallel computer architectures. Polytope meshes are attractive for multiple reasons, including their use in other physics communities and their ease in handling local mesh refinement strategies. In this work, we focus on two topical areas of research. First, we discuss higher-order basis functions compatible to solve the DGFEM S_N transport equation on arbitrary polygonal meshes. Second, we assess Diffusion Synthetic Acceleration (DSA) schemes compatible with polytope grids for massively-parallel transport problems. We first utilize basis functions compatible with arbitrary polygonal grids for the DGFEM transport equation. We analyze four different basis functions that have linear completeness on polygons: the Wachspress rational functions, the PWL functions, the mean value coordinates, and the maximum entropy coordinates. We then describe the procedure to extend these polygonal linear basis functions into the quadratic serendipity space of functions. These quadratic basis functions can exactly interpolate monomial functions up to order 2. Both the linear and quadratic sets of basis functions preserve transport solutions in the thick diffusion limit. Maximum convergence rates of 2 and 3 are observed for regular transport solutions for the linear and quadratic basis functions, respectively. For problems that are limited by the regularity of the transport solution, convergence rates of 3/2 (when the solution is continuous) and 1/2 (when the solution is discontinuous) are observed. Spatial Adaptive Mesh Refinement (AMR) achieved superior convergence rates than uniform refinement, even for problems bounded by the solution regularity. We demonstrated accuracy in the AMR solutions by allowing them to reach a level where the ray effects of the angular discretization are realized. Next, we analyzed DSA schemes to accelerate both the within-group iterations as well as the thermal upscattering iterations for multigroup transport problems. Accelerating the thermal upscattering iterations is important for materials (e.g., graphite) with significant thermal energy scattering and minimal absorption. All of the acceleration schemes analyzed use a DGFEM discretization of the diffusion equation that is compatible with arbitrary polytope meshes: the Modified Interior Penalty Method (MIP). MIP uses the same DGFEM discretization as the transport equation. The MIP form is Symmetric Positive De_nite (SPD) and e_ciently solved with Preconditioned Conjugate Gradient (PCG) with Algebraic MultiGrid (AMG) preconditioning. The analysis from previous work was extended to show MIP's stability and robustness for accelerating 3D transport problems. MIP DSA preconditioning was implemented in the Parallel Deterministic Transport (PDT) code at Texas A&M University and linked with the HYPRE suite of linear solvers. Good scalability was numerically verified out to around 131K processors. The fraction of time spent performing DSA operations was small for problems with sufficient work performed in the transport sweep (O(10^3) angular directions). Finally, we have developed a novel methodology to accelerate transport problems dominated by thermal neutron upscattering. Compared to historical upscatter acceleration methods, our method is parallelizable and amenable to massively parallel transport calculations. Speedup factors of about 3-4 were observed with our new method

    Large-scale Geometric Data Decomposition, Processing and Structured Mesh Generation

    Get PDF
    Mesh generation is a fundamental and critical problem in geometric data modeling and processing. In most scientific and engineering tasks that involve numerical computations and simulations on 2D/3D regions or on curved geometric objects, discretizing or approximating the geometric data using a polygonal or polyhedral meshes is always the first step of the procedure. The quality of this tessellation often dictates the subsequent computation accuracy, efficiency, and numerical stability. When compared with unstructured meshes, the structured meshes are favored in many scientific/engineering tasks due to their good properties. However, generating high-quality structured mesh remains challenging, especially for complex or large-scale geometric data. In industrial Computer-aided Design/Engineering (CAD/CAE) pipelines, the geometry processing to create a desirable structural mesh of the complex model is the most costly step. This step is semi-manual, and often takes up to several weeks to finish. Several technical challenges remains unsolved in existing structured mesh generation techniques. This dissertation studies the effective generation of structural mesh on large and complex geometric data. We study a general geometric computation paradigm to solve this problem via model partitioning and divide-and-conquer. To apply effective divide-and-conquer, we study two key technical components: the shape decomposition in the divide stage, and the structured meshing in the conquer stage. We test our algorithm on vairous data set, the results demonstrate the efficiency and effectiveness of our framework. The comparisons also show our algorithm outperforms existing partitioning methods in final meshing quality. We also show our pipeline scales up efficiently on HPC environment

    Finite Element Modeling Driven by Health Care and Aerospace Applications

    Get PDF
    This thesis concerns the development, analysis, and computer implementation of mesh generation algorithms encountered in finite element modeling in health care and aerospace. The finite element method can reduce a continuous system to a discrete idealization that can be solved in the same manner as a discrete system, provided the continuum is discretized into a finite number of simple geometric shapes (e.g., triangles in two dimensions or tetrahedrons in three dimensions). In health care, namely anatomic modeling, a discretization of the biological object is essential to compute tissue deformation for physics-based simulations. This thesis proposes an efficient procedure to convert 3-dimensional imaging data into adaptive lattice-based discretizations of well-shaped tetrahedra or mixed elements (i.e., tetrahedra, pentahedra and hexahedra). This method operates directly on segmented images, thus skipping a surface reconstruction that is required by traditional Computer-Aided Design (CAD)-based meshing techniques and is convoluted, especially in complex anatomic geometries. Our approach utilizes proper mesh gradation and tissue-specific multi-resolution, without sacrificing the fidelity and while maintaining a smooth surface to reflect a certain degree of visual reality. Image-to-mesh conversion can facilitate accurate computational modeling for biomechanical registration of Magnetic Resonance Imaging (MRI) in image-guided neurosurgery. Neuronavigation with deformable registration of preoperative MRI to intraoperative MRI allows the surgeon to view the location of surgical tools relative to the preoperative anatomical (MRI) or functional data (DT-MRI, fMRI), thereby avoiding damage to eloquent areas during tumor resection. This thesis presents a deformable registration framework that utilizes multi-tissue mesh adaptation to map preoperative MRI to intraoperative MRI of patients who have undergone a brain tumor resection. Our enhancements with mesh adaptation improve the accuracy of the registration by more than 5 times compared to rigid and traditional physics-based non-rigid registration, and by more than 4 times compared to publicly available B-Spline interpolation methods. The adaptive framework is parallelized for shared memory multiprocessor architectures. Performance analysis shows that this method could be applied, on average, in less than two minutes, achieving desirable speed for use in a clinical setting. The last part of this thesis focuses on finite element modeling of CAD data. This is an integral part of the design and optimization of components and assemblies in industry. We propose a new parallel mesh generator for efficient tetrahedralization of piecewise linear complex domains in aerospace. CAD-based meshing algorithms typically improve the shape of the elements in a post-processing step due to high complexity and cost of the operations involved. On the contrary, our method optimizes the shape of the elements throughout the generation process to obtain a maximum quality and utilizes high performance computing to reduce the overheads and improve end-user productivity. The proposed mesh generation technique is a combination of Advancing Front type point placement, direct point insertion, and parallel multi-threaded connectivity optimization schemes. The mesh optimization is based on a speculative (optimistic) approach that has been proven to perform well on hardware-shared memory. The experimental evaluation indicates that the high quality and performance attributes of this method see substantial improvement over existing state-of-the-art unstructured grid technology currently incorporated in several commercial systems. The proposed mesh generator will be part of an Extreme-Scale Anisotropic Mesh Generation Environment to meet industries expectations and NASA\u27s CFD visio

    Decoupling method for parallel Delaunay two-dimensional mesh generation

    Get PDF
    Parallel mesh generation procedures that are based on geometric domain decompositions require the permanent separators to be of good quality (in terms of their angles and length), in order to maintain the mesh quality. The Medial Axis Domain Decomposition, an innovative geometric domain decomposition procedure that addresses this problem, is introduced. The Medial Axis domain decomposition is of high quality in terms of the formed angles, and provides separators of small size, and also good work-load balance. It presents for the first time a decomposition method suitable for parallel meshing procedures that are based on geometric domain decompositions.;The Decoupling Method for parallel Delaunay 2D mesh generation is a highly efficient and effective parallel procedure, able to generate billions of elements in a few hundred of seconds, on distributed memory machines. Our mathematical formulation introduces the notion of the decoupling path, which guarantees the decoupling property, and also the quality and conformity of the Delaunay submeshes. The subdomains are meshed independently, and as a result, the method eliminates the communication and the synchronization during the parallel meshing. A method for shielding small angles is introduced, so that the decoupled parallel Delaunay algorithm can be applied on domains with small angles. Moreover, I present the construction of a sizing function, that encompasses an existing sizing function and also geometric features and small angles. The decoupling procedure can be used for parallel graded Delaunay mesh generation, controlled by the sizing function
    • …
    corecore