516 research outputs found

    Software concepts and algorithms for an efficient and scalable parallel finite element method

    Get PDF
    Software packages for the numerical solution of partial differential equations (PDEs) using the finite element method are important in different fields of research. The basic data structures and algorithms change in time, as the user\'s requirements are growing and the software must efficiently use the newest highly parallel computing systems. This is the central point of this work. To make efficiently use of parallel computing systems with growing number of independent basic computing units, i.e.~CPUs, we have to combine data structures and algorithms from different areas of mathematics and computer science. Two crucial parts are a distributed mesh and parallel solver for linear systems of equations. For both there exists multiple independent approaches. In this work we argue that it is necessary to combine both of them to allow for an efficient and scalable implementation of the finite element method. First, we present concepts, data structures and algorithms for distributed meshes, which allow for local refinement. The central point of our presentation is to provide arbitrary geometrical information of the mesh and its distribution to the linear solver. A large part of the overall computing time of the finite element method is spend by the linear solver. Thus, its parallelization is of major importance. Based on the presented concept for distributed meshes, we preset several different linear solver methods. Hereby we concentrate on general purpose linear solver, which makes only little assumptions about the systems to be solver. For this, a new FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) method is proposed. Those the standard FETI-DP method is quasi optimal from a mathematical point of view, its not possible to implement it efficiently for a large number of processors (> 10,000). The main reason is a relatively small but globally distributed coarse mesh problem. To circumvent this problem, we propose a new multilevel FETI-DP method which hierarchically decompose the coarse grid problem. This leads to a more local communication pattern for solver the coarse grid problem and makes it possible to scale for a large number of processors. Besides the parallelization of the finite element method, we discuss an approach to speed up serial computations of existing finite element packages. In many computations the PDE to be solved consists of more than one variable. This is especially the case in multi-physics modeling. Observation show that in many of these computation the solution structure of the variables is different. But in the standard finite element method, only one mesh is used for the discretization of all variables. We present a multi-mesh finite element method, which allows to discretize a system of PDEs with two independently refined meshes.Softwarepakete zur numerischen Lösung partieller Differentialgleichungen mit Hilfe der Finiten-Element-Methode sind in vielen Forschungsbereichen ein wichtiges Werkzeug. Die dahinter stehenden Datenstrukturen und Algorithmen unterliegen einer stĂ€ndigen Neuentwicklung um den immer weiter steigenden Anforderungen der Nutzergemeinde gerecht zu werden und um neue, hochgradig parallel Rechnerarchitekturen effizient nutzen zu können. Dies ist auch der Kernpunkt dieser Arbeit. Um parallel Rechnerarchitekturen mit einer immer höher werdenden Anzahl an von einander unabhĂ€ngigen Recheneinheiten, z.B.~Prozessoren, effizient Nutzen zu können, mĂŒssen Datenstrukturen und Algorithmen aus verschiedenen Teilgebieten der Mathematik und Informatik entwickelt und miteinander kombiniert werden. Im Kern sind dies zwei Bereiche: verteilte Gitter und parallele Löser fĂŒr lineare Gleichungssysteme. FĂŒr jedes der beiden Teilgebiete existieren unabhĂ€ngig voneinander zahlreiche AnsĂ€tze. In dieser Arbeit wird argumentiert, dass fĂŒr hochskalierbare Anwendungen der Finiten-Elemente-Methode nur eine Kombination beider Teilgebiete und die VerknĂŒpfung der darunter liegenden Datenstrukturen eine effiziente und skalierbare Implementierung ermöglicht. Zuerst stellen wir Konzepte vor, die parallele verteile Gitter mit entsprechenden Adaptionstrategien ermöglichen. Zentraler Punkt ist hier die Informationsaufbereitung fĂŒr beliebige Löser linearer Gleichungssysteme. Beim Lösen partieller Differentialgleichung mit der Finiten Elemente Methode wird ein großer Teil der Rechenzeit fĂŒr das Lösen der dabei anfallenden linearen Gleichungssysteme aufgebracht. Daher ist deren Parallelisierung von zentraler Bedeutung. Basierend auf dem vorgestelltem Konzept fĂŒr verteilten Gitter, welches beliebige geometrische Informationen fĂŒr die linearen Löser aufbereiten kann, prĂ€sentieren wir mehrere unterschiedliche Lösermethoden. Besonders Gewicht wird dabei auf allgemeine Löser gelegt, die möglichst wenig Annahmen ĂŒber das zu lösende System machen. HierfĂŒr wird die FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) Methode weiterentwickelt. Obwohl die FETI-DP Methode vom mathematischen Standpunkt her als quasi-optimal bezĂŒglich der parallelen Skalierbarkeit gilt, kann sie fĂŒr große Anzahl an Prozessoren (> 10.000) nicht mehr effizient implementiert werden. Dies liegt hauptsĂ€chlich an einem verhĂ€ltnismĂ€ĂŸig kleinem aber global verteilten Grobgitterproblem. Wir stellen eine Multilevel FETI-DP Methode vor, die dieses Problem durch eine hierarchische Komposition des Grobgitterproblems löst. Dadurch wird die Kommunikation entlang des Grobgitterproblems lokalisiert und die Skalierbarkeit der FETI-DP Methode auch fĂŒr große Anzahl an Prozessoren sichergestellt. Neben der Parallelisierung der Finiten-Elemente-Methode beschĂ€ftigen wir uns in dieser Arbeit mit der Ausnutzung von bestimmten Voraussetzung um auch die sequentielle Effizienz bestehender Implementierung der Finiten-Elemente-Methode zu steigern. In vielen FĂ€llen mĂŒssen partielle Differentialgleichungen mit mehreren Variablen gelöst werden. Sehr hĂ€ufig ist dabei zu beobachten, insbesondere bei der Modellierung mehrere miteinander gekoppelter physikalischer PhĂ€nomene, dass die Lösungsstruktur der unterschiedlichen Variablen entweder schwach oder vollstĂ€ndig voneinander entkoppelt ist. In den meisten Implementierungen wird dabei nur ein Gitter zur Diskretisierung aller Variablen des Systems genutzt. Wir stellen eine Finite-Elemente-Methode vor, bei der zwei unabhĂ€ngig voneinander verfeinerte Gitter genutzt werden können um ein System partieller Differentialgleichungen zu lösen

    Scalable parallel simulation of variably saturated flow

    Get PDF
    In this thesis we develop highly accurate simulation tools for variably saturated flow through porous media able to take advantage of the latest supercomputing resources. Hence, we aim for parallel scalability to very large compute resources of over 105 CPU cores. Our starting point is the parallel subsurface flow simulator ParFlow. This library is of widespread use in the hydrology community and known to have excellent parallel scalability up to 16k processes. We first investigate the numerical tools this library implements in order to perform the simulations it was designed for. ParFlow solves the governing equation for subsurface flow with a cell centered finite difference (FD) method. The code targets high performance computing (HPC) systems by means of distributed memory parallelism. We propose to reorganize ParFlow's mesh subsystem by using fast partitioning algorithms provided by the parallel adaptive mesh refinement (AMR) library p4est. We realize this in a minimally invasive manner by modifying selected parts of the code to reinterpret the existing mesh data structures. Furthermore, we evaluate the scaling performance of the modified version of ParFlow, demonstrating excellent weak and strong scaling up to 458k cores of the Juqueen supercomputer at the JĂŒlich Supercomputing Centre. The above mentioned results were obtained for uniform meshes and hence without explicitly exploiting the AMR capabilities of the p4est library. A natural extension of our work is to activate such functionality and make ParFlow a true AMR application. Enabling ParFlow to use AMR is challenging for several reasons: It may be based on assumptions on the parallel partition that cannot be maintained with AMR, it may use mesh-related metadata that is replicated on all CPUs, and it may assume uniform meshes in the construction of mathematical operators. Additionally, the use of locally refined meshes will certainly change the spectral properties of these operators. In this work, we develop an algorithmic approach to activate the usage of locally refined grids in ParFlow. AMR allows meshes where elements of different size neighbor each other. In this case, ParFlow may incur erroneous results when it attempts to communicate data between inter-element boundaries. We propose and discuss two solutions to this issue operating at two different levels: The first manipulates the indices of the degrees of freedom, While the second operates directly on the degrees of freedom. Both approaches aim to introduce minimal changes to the original ParFlow code. In an AMR framework, the FD method taken by ParFlow will require modifications to correctly deal with different size elements. Mixed finite elements (MFE) are on the other hand better suited for the usage of AMR. It is known that the cell centered FD method used in ParFlow might be reinterpreted as a MFE discretization using Raviart-Thomas elements of lower order. We conclude this thesis presenting a block preconditioner for saddle point problems arising from a MFE on locally refined meshes. We evaluate its robustness with respect to various classes of coefficients for uniform and locally refined meshes

    Composite Finite Elements for Trabecular Bone Microstructures

    Get PDF
    In many medical and technical applications, numerical simulations need to be performed for objects with interfaces of geometrically complex shape. We focus on the biomechanical problem of elasticity simulations for trabecular bone microstructures. The goal of this dissertation is to develop and implement an efficient simulation tool for finite element simulations on such structures, so-called composite finite elements. We will deal with both the case of material/void interfaces (complicated domains) and the case of interfaces between different materials (discontinuous coefficients). In classical finite element simulations, geometric complexity is encoded in tetrahedral and typically unstructured meshes. Composite finite elements, in contrast, encode geometric complexity in specialized basis functions on a uniform mesh of hexahedral structure. Other than alternative approaches (such as e.g. fictitious domain methods, generalized finite element methods, immersed interface methods, partition of unity methods, unfitted meshes, and extended finite element methods), the composite finite elements are tailored to geometry descriptions by 3D voxel image data and use the corresponding voxel grid as computational mesh, without introducing additional degrees of freedom, and thus making use of efficient data structures for uniformly structured meshes. The composite finite element method for complicated domains goes back to Wolfgang Hackbusch and Stefan Sauter and restricts standard affine finite element basis functions on the uniformly structured tetrahedral grid (obtained by subdivision of each cube in six tetrahedra) to an approximation of the interior. This can be implemented as a composition of standard finite element basis functions on a local auxiliary and purely virtual grid by which we approximate the interface. In case of discontinuous coefficients, the same local auxiliary composition approach is used. Composition weights are obtained by solving local interpolation problems for which coupling conditions across the interface need to be determined. These depend both on the local interface geometry and on the (scalar or tensor-valued) material coefficients on both sides of the interface. We consider heat diffusion as a scalar model problem and linear elasticity as a vector-valued model problem to develop and implement the composite finite elements. Uniform cubic meshes contain a natural hierarchy of coarsened grids, which allows us to implement a multigrid solver for the case of complicated domains. Besides simulations of single loading cases, we also apply the composite finite element method to the problem of determining effective material properties, e.g. for multiscale simulations. For periodic microstructures, this is achieved by solving corrector problems on the fundamental cells using affine-periodic boundary conditions corresponding to uniaxial compression and shearing. For statistically periodic trabecular structures, representative fundamental cells can be identified but do not permit the periodic approach. Instead, macroscopic displacements are imposed using the same set as before of affine-periodic Dirichlet boundary conditions on all faces. The stress response of the material is subsequently computed only on an interior subdomain to prevent artificial stiffening near the boundary. We finally check for orthotropy of the macroscopic elasticity tensor and identify its axes.Zusammengesetzte finite Elemente fĂŒr trabekulĂ€re Mikrostrukturen in Knochen In vielen medizinischen und technischen Anwendungen werden numerische Simulationen fĂŒr Objekte mit geometrisch komplizierter Form durchgefĂŒhrt. Gegenstand dieser Dissertation ist die Simulation der ElastizitĂ€t trabekulĂ€rer Mikrostrukturen von Knochen, einem biomechanischen Problem. Ziel ist es, ein effizientes Simulationswerkzeug fĂŒr solche Strukturen zu entwickeln, die sogenannten zusammengesetzten finiten Elemente. Wir betrachten dabei sowohl den Fall von Interfaces zwischen Material und Hohlraum (komplizierte Gebiete) als auch zwischen verschiedenen Materialien (unstetige Koeffizienten). In klassischen Finite-Element-Simulationen wird geometrische KomplexitĂ€t typischerweise in unstrukturierten Tetraeder-Gittern kodiert. Zusammengesetzte finite Elemente dagegen kodieren geometrische KomplexitĂ€t in speziellen Basisfunktionen auf einem gleichförmigen WĂŒrfelgitter. Anders als alternative AnsĂ€tze (wie zum Beispiel fictitious domain methods, generalized finite element methods, immersed interface methods, partition of unity methods, unfitted meshes und extended finite element methods) sind die zusammengesetzten finiten Elemente zugeschnitten auf die Geometriebeschreibung durch dreidimensionale Bilddaten und benutzen das zugehörige Voxelgitter als Rechengitter, ohne zusĂ€tzliche Freiheitsgrade einzufĂŒhren. Somit können sie effiziente Datenstrukturen fĂŒr gleichförmig strukturierte Gitter ausnutzen. Die Methode der zusammengesetzten finiten Elemente geht zurĂŒck auf Wolfgang Hackbusch und Stefan Sauter. Man schrĂ€nkt dabei ĂŒbliche affine Finite-Element-Basisfunktionen auf gleichförmig strukturierten Tetraedergittern (die man durch Unterteilung jedes WĂŒrfels in sechs Tetraeder erhĂ€lt) auf das approximierte Innere ein. Dies kann implementiert werden durch das Zusammensetzen von Standard-Basisfunktionen auf einem lokalen und rein virtuellen Hilfsgitter, durch das das Interface approximiert wird. Im Falle unstetiger Koeffizienten wird die gleiche lokale Hilfskonstruktion verwendet. Gewichte fĂŒr das Zusammensetzen erhĂ€lt man hier, indem lokale Interpolationsprobleme gelöst werden, wozu zunĂ€chst Kopplungsbedingungen ĂŒber das Interface hinweg bestimmt werden. Diese hĂ€ngen ab sowohl von der lokalen Geometrie des Interface als auch von den (skalaren oder tensorwertigen) Material-Koeffizienten auf beiden Seiten des Interface. Wir betrachten WĂ€rmeleitung als skalares und lineare ElastizitĂ€t als vektorwertiges Modellproblem, um die zusammengesetzten finiten Elemente zu entwickeln und zu implementieren. Gleichförmige WĂŒrfelgitter enthalten eine natĂŒrliche Hierarchie vergröberter Gitter, was es uns erlaubt, im Falle komplizierter Gebiete einen Mehrgitterlöser zu implementieren. Neben Simulationen einzelner LastfĂ€lle wenden wir die zusammengesetzten finiten Elemente auch auf das Problem an, effektive Materialeigenschaften zu bestimmen, etwa fĂŒr mehrskalige Simulationen. FĂŒr periodische Mikrostrukturen wird dies erreicht, indem man Korrekturprobleme auf der Fundamentalzelle löst. DafĂŒr nutzt man affin-periodische Randwerte, die zu uniaxialem Druck oder zu Scherung korrespondieren. In statistisch periodischen trabekulĂ€ren Mikrostrukturen lassen sich ebenfalls Fundamentalzellen identifizieren, sie erlauben jedoch keinen periodischen Ansatz. Stattdessen werden makroskopische Verschiebungen zu denselben affin-periodischen Randbedingungen vorgegeben, allerdings durch Dirichlet-Randwerte auf allen SeitenflĂ€chen. Die Spannungsantwort des Materials wird anschließend nur auf einem inneren Teilbereich berechnet, um kĂŒnstliche Versteifung am Rand zu verhindern. Schließlich prĂŒfen wir den makroskopischen ElastizitĂ€tstensor auf Orthotropie und identifizieren deren Achsen

    Automatic Performance Optimization of Stencil Codes

    Get PDF
    A widely used class of codes are stencil codes. Their general structure is very simple: data points in a large grid are repeatedly recomputed from neighboring values. This predefined neighborhood is the so-called stencil. Despite their very simple structure, stencil codes are hard to optimize since only few computations are performed while a comparatively large number of values have to be accessed, i.e., stencil codes usually have a very low computational intensity. Moreover, the set of optimizations and their parameters also depend on the hardware on which the code is executed. To cut a long story short, current production compilers are not able to fully optimize this class of codes and optimizing each application by hand is not practical. As a remedy, we propose a set of optimizations and describe how they can be applied automatically by a code generator for the domain of stencil codes. A combination of a space and time tiling is able to increase the data locality, which significantly reduces the memory-bandwidth requirements: a standard three-dimensional 7-point Jacobi stencil can be accelerated by a factor of 3. This optimization can target basically any stencil code, while others are more specialized. E.g., support for arbitrary linear data layout transformations is especially beneficial for colored kernels, such as a Red-Black Gauss-Seidel smoother. On the one hand, an optimized data layout for such kernels reduces the bandwidth requirements while, on the other hand, it simplifies an explicit vectorization. Other noticeable optimizations described in detail are redundancy elimination techniques to eliminate common subexpressions both in a sequence of statements and across loop boundaries, arithmetic simplifications and normalizations, and the vectorization mentioned previously. In combination, these optimizations are able to increase the performance not only of the model problem given by Poisson’s equation, but also of real-world applications: an optical flow simulation and the simulation of a non-isothermal and non-Newtonian fluid flow

    Seventh Copper Mountain Conference on Multigrid Methods

    Get PDF
    The Seventh Copper Mountain Conference on Multigrid Methods was held on 2-7 Apr. 1995 at Copper Mountain, Colorado. This book is a collection of many of the papers presented at the conference and so represents the conference proceedings. NASA Langley graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection shows its rapid trend to further diversity and depth

    Parallel unstructured solvers for linear partial differential equations

    Get PDF
    This thesis presents the development of a parallel algorithm to solve symmetric systems of linear equations and the computational implementation of a parallel partial differential equations solver for unstructured meshes. The proposed method, called distributive conjugate gradient - DCG, is based on a single-level domain decomposition method and the conjugate gradient method to obtain a highly scalable parallel algorithm. An overview on methods for the discretization of domains and partial differential equations is given. The partition and refinement of meshes is discussed and the formulation of the weighted residual method for two- and three-dimensions presented. Some of the methods to solve systems of linear equations are introduced, highlighting the conjugate gradient method and domain decomposition methods. A parallel unstructured PDE solver is proposed and its actual implementation presented. Emphasis is given to the data partition adopted and the scheme used for communication among adjacent subdomains is explained. A series of experiments in processor scalability is also reported. The derivation and parallelization of DCG are presented and the method validated throughout numerical experiments. The method capabilities and limitations were investigated by the solution of the Poisson equation with various source terms. The experimental results obtained using the parallel solver developed as part of this work show that the algorithm presented is accurate and highly scalable, achieving roughly linear parallel speed-up in many of the cases tested

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    Computational multiscale solvers for continuum approaches

    Get PDF
    Computational multiscale analyses are currently ubiquitous in science and technology. Different problems of interest-e.g., mechanical, fluid, thermal, or electromagnetic-involving a domain with two or more clearly distinguished spatial or temporal scales, are candidates to be solved by using this technique. Moreover, the predictable capability and potential of multiscale analysis may result in an interesting tool for the development of new concept materials, with desired macroscopic or apparent properties through the design of their microstructure, which is now even more possible with the combination of nanotechnology and additive manufacturing. Indeed, the information in terms of field variables at a finer scale is available by solving its associated localization problem. In this work, a review on the algorithmic treatment of multiscale analyses of several problems with a technological interest is presented. The paper collects both classical and modern techniques of multiscale simulation such as those based on the proper generalized decomposition (PGD) approach. Moreover, an overview of available software for the implementation of such numerical schemes is also carried out. The availability and usefulness of this technique in the design of complex microstructural systems are highlighted along the text. In this review, the fine, and hence the coarse scale, are associated with continuum variables so atomistic approaches and coarse-graining transfer techniques are out of the scope of this paper

    Computational Multiscale Solvers for Continuum Approaches

    Get PDF
    Computational multiscale analyses are currently ubiquitous in science and technology. Different problems of interest-e.g., mechanical, fluid, thermal, or electromagnetic-involving a domain with two or more clearly distinguished spatial or temporal scales, are candidates to be solved by using this technique. Moreover, the predictable capability and potential of multiscale analysis may result in an interesting tool for the development of new concept materials, with desired macroscopic or apparent properties through the design of their microstructure, which is now even more possible with the combination of nanotechnology and additive manufacturing. Indeed, the information in terms of field variables at a finer scale is available by solving its associated localization problem. In this work, a review on the algorithmic treatment of multiscale analyses of several problems with a technological interest is presented. The paper collects both classical and modern techniques of multiscale simulation such as those based on the proper generalized decomposition (PGD) approach. Moreover, an overview of available software for the implementation of such numerical schemes is also carried out. The availability and usefulness of this technique in the design of complex microstructural systems are highlighted along the text. In this review, the fine, and hence the coarse scale, are associated with continuum variables so atomistic approaches and coarse-graining transfer techniques are out of the scope of this paper.Abengoa Researc
    • 

    corecore