132 research outputs found

    Algorithms for massively parallel generic hp-adaptive finite element methods

    Get PDF
    Efficient algorithms for the numerical solution of partial differential equations are required to solve problems on an economically viable timescale. In general, this is achieved by adapting the resolution of the discretization to the investigated problem, as well as exploiting hardware specifications. For the latter category, parallelization plays a major role for modern multi-core and multi-node architectures, especially in the context of high-performance computing. Using finite element methods, solutions are approximated by discretizing the function space of the problem with piecewise polynomials. With hp-adaptive methods, the polynomial degrees of these basis functions may vary on locally refined meshes. We present algorithms and data structures required for generic hp-adaptive finite element software applicable for both continuous and discontinuous Galerkin methods on distributed memory systems. Both function space and mesh may be adapted dynamically during the solution process. We cover details concerning the unique enumeration of degrees of freedom with continuous Galerkin methods, the communication of variable size data, and load balancing. Furthermore, we present strategies to determine the type of adaptation based on error estimation and prediction as well as smoothness estimation via the decay rate of coefficients of Fourier and Legendre series expansions. Both refinement and coarsening are considered. A reference implementation in the open-source library deal.II is provided and applied to the Laplace problem on a domain with a reentrant corner which invokes a singularity. With this example, we demonstrate the benefits of the hp-adaptive methods in terms of error convergence and show that our algorithm scales up to 49,152 MPI processes.Für die numerische Lösung partieller Differentialgleichungen sind effiziente Algorithmen erforderlich, um Probleme auf einer wirtschaftlich tragbaren Zeitskala zu lösen. Im Allgemeinen ist dies durch die Anpassung der Diskretisierungsauflösung an das untersuchte Problem sowie durch die Ausnutzung der Hardwarespezifikationen möglich. Für die letztere Kategorie spielt die Parallelisierung eine große Rolle für moderne Mehrkern- und Mehrknotenarchitekturen, insbesondere im Kontext des Hochleistungsrechnens. Mit Hilfe von Finite-Elemente-Methoden werden Lösungen durch Diskretisierung des assoziierten Funktionsraums mit stückweisen Polynomen approximiert. Bei hp-adaptiven Verfahren können die Polynomgrade dieser Basisfunktionen auf lokal verfeinerten Gittern variieren. In dieser Dissertation werden Algorithmen und Datenstrukturen vorgestellt, die für generische hp-adaptive Finite-Elemente-Software benötigt werden und sowohl für kontinuierliche als auch diskontinuierliche Galerkin-Verfahren auf Systemen mit verteiltem Speicher anwendbar sind. Sowohl der Funktionsraum als auch das Gitter können während des Lösungsprozesses dynamisch angepasst werden. Im Besonderen erläutert werden die eindeutige Nummerierung von Freiheitsgraden mit kontinuierlichen Galerkin-Verfahren, die Kommunikation von Daten variabler Größe und die Lastenverteilung. Außerdem werden Strategien zur Bestimmung des Adaptierungstyps auf der Grundlage von Fehlerschätzungen und -prognosen sowie Glattheitsschätzungen vorgestellt, die über die Zerfallsrate von Koeffizienten aus Reihenentwicklungen nach Fourier und Legendre bestimmt werden. Dabei werden sowohl Verfeinerung als auch Vergröberung berücksichtigt. Eine Referenzimplementierung erfolgt in der Open-Source-Bibliothek deal.II und wird auf das Laplace-Problem auf einem Gebiet mit einer einschneidenden Ecke angewandt, die eine Singularität aufweist. Anhand dieses Beispiels werden die Vorteile der hp-adaptiven Methoden hinsichtlich der Fehlerkonvergenz und die Skalierbarkeit der präsentierten Algorithmen auf bis zu 49.152 MPI-Prozessen demonstriert

    HPCCP/CAS Workshop Proceedings 1998

    Get PDF
    This publication is a collection of extended abstracts of presentations given at the HPCCP/CAS (High Performance Computing and Communications Program/Computational Aerosciences Project) Workshop held on August 24-26, 1998, at NASA Ames Research Center, Moffett Field, California. The objective of the Workshop was to bring together the aerospace high performance computing community, consisting of airframe and propulsion companies, independent software vendors, university researchers, and government scientists and engineers. The Workshop was sponsored by the HPCCP Office at NASA Ames Research Center. The Workshop consisted of over 40 presentations, including an overview of NASA's High Performance Computing and Communications Program and the Computational Aerosciences Project; ten sessions of papers representative of the high performance computing research conducted within the Program by the aerospace industry, academia, NASA, and other government laboratories; two panel sessions; and a special presentation by Mr. James Bailey

    Parallelization of the Advancing Front Local Reconnection Mesh Generation Software Using a Pseudo-Constrained Parallel Data Refinement Method

    Get PDF
    Preliminary results of a long-term project entailing the parallelization of an industrial strength sequential mesh generator, called Advancing Front Local Reconnection (AFLR), are presented. AFLR has been under development for the last 25 years at the NSF/ERC center at Mississippi State University. The parallel procedure that is presented is called Pseudo-constrained (PsC) Parallel Data Refinement (PDR) and consists of the following steps: (i) use an octree data-decomposition scheme to divide the original geometry into subdomains (octree leaves), (ii) refine each subdomain with the proper adjustments of its neighbors using the given refinement code, and (iii) combine all subdomain data into a single, conforming mesh. Parallelism was achieved by implementing Pseudo-constrained Parallel Data Refinement AFLR (PsC.AFLR) on top of a runtime system called Parallel Runtime Environment for Multi-computer Applications (PREMA). During run time, the PsC.AFLR method exposes data decomposition information (number of subdomains waiting to be refined) to the underlying runtime system. In turn, this system facilitates work-load balancing and guides the program’s execution towards the most efficient utilization of hardware resources. Preliminary results, on the mesh refinement operation, show that the end-user productivity (measured in terms of elements refined per second) increases as the number of cores in use are increased. When using approximately 16 cores, PsC.AFLR outperforms the serial AFLR code by about 11 times. PsC.AFLR also maintains its stability by generating meshes of comparable quality. Although it offers good end-user productivity, PsC.AFLR suffers in its capability to generate meshes with the same level of density or quality as that of the serial AFLR software due to the constraints set by subdomain boundaries that are required to successfully execute AFLR. These constraints demonstrate that it is not ideal to use AFLR in a black box manner when parallelizing the software. Its source code must be modified to a non-trivial extent if one wishes to remove these constraints and maximize the end-user productivity and potential scalability

    Scalable Parallel Delaunay Image-to-Mesh Conversion for Shared and Distributed Memory Architectures

    Get PDF
    Mesh generation is an essential component for many engineering applications. The ability to generate meshes in parallel is critical for the scalability of the entire Finite Element Method (FEM) pipeline. However, parallel mesh generation applications belong to the broader class of adaptive and irregular problems, and are among the most complex, challenging, and labor intensive to develop and maintain. In this thesis, we summarize several years of the progress that we made in a novel framework for highly scalable and guaranteed quality mesh generation for finite element analysis in three dimensions. We studied and developed parallel mesh generation algorithms on both shared and distributed memory architectures. In this thesis we present a novel two-level parallel tetrahedral mesh generation framework capable of delivering and sustaining close to 6000 of concurrent work units (cores). We achieve this by leveraging concurrency at two different granularity levels by using a hybrid message passing and multi-threaded execution model which is suitable to the hierarchy of the hardware architecture of the distributed memory clusters. An end-user productivity and scalability study was performed on up to 6000 cores, and indicated very good end-user productivity with about 300 million tets per second and about 3600 weak scaling speedup. Both of these results suggest that: compared to the best previous algorithm, we have seen an improvement of more than 7000 times in performance, measured in terms of speed (elements per second) by using about 180 times more CPUs, for geometries that are by many orders of magnitude more complex

    Load Balancing for Parallel Multiphase Flow Simulation

    Get PDF

    A Unified Framework for Parallel Anisotropic Mesh Adaptation

    Get PDF
    Finite-element methods are a critical component of the design and analysis procedures of many (bio-)engineering applications. Mesh adaptation is one of the most crucial components since it discretizes the physics of the application at a relatively low cost to the solver. Highly scalable parallel mesh adaptation methods for High-Performance Computing (HPC) are essential to meet the ever-growing demand for higher fidelity simulations. Moreover, the continuous growth of the complexity of the HPC systems requires a systematic approach to exploit their full potential. Anisotropic mesh adaptation captures features of the solution at multiple scales while, minimizing the required number of elements. However, it also introduces new challenges on top of mesh generation. Also, the increased complexity of the targeted cases requires departing from traditional surface-constrained approaches to utilizing CAD (Computer-Aided Design) kernels. Alongside the functionality requirements, is the need of taking advantage of the ubiquitous multi-core machines. More importantly, the parallel implementation needs to handle the ever-increasing complexity of the mesh adaptation code. In this work, we develop a parallel mesh adaptation method that utilizes a metric-based approach for generating anisotropic meshes. Moreover, we enhance our method by interfacing with a CAD kernel, thus enabling its use on complex geometries. We evaluate our method both with fixed-resolution benchmarks and within a simulation pipeline, where the resolution of the discretization increases incrementally. With the Telescopic Approach for scalable mesh generation as a guide, we propose a parallel method at the node (multi-core) for mesh adaptation that is expected to scale up efficiently to the upcoming exascale machines. To facilitate an effective implementation, we introduce an abstract layer between the application and the runtime system that enables the use of task-based parallelism for concurrent mesh operations. Our evaluation indicates results comparable to state-of-the-art methods for fixed-resolution meshes both in terms of performance and quality. The integration with an adaptive pipeline offers promising results for the capability of the proposed method to function as part of an adaptive simulation. Moreover, our abstract tasking layer allows the separation of different aspects of the implementation without any impact on the functionality of the method
    corecore