73 research outputs found

    Software concepts and algorithms for an efficient and scalable parallel finite element method

    Get PDF
    Software packages for the numerical solution of partial differential equations (PDEs) using the finite element method are important in different fields of research. The basic data structures and algorithms change in time, as the user\'s requirements are growing and the software must efficiently use the newest highly parallel computing systems. This is the central point of this work. To make efficiently use of parallel computing systems with growing number of independent basic computing units, i.e.~CPUs, we have to combine data structures and algorithms from different areas of mathematics and computer science. Two crucial parts are a distributed mesh and parallel solver for linear systems of equations. For both there exists multiple independent approaches. In this work we argue that it is necessary to combine both of them to allow for an efficient and scalable implementation of the finite element method. First, we present concepts, data structures and algorithms for distributed meshes, which allow for local refinement. The central point of our presentation is to provide arbitrary geometrical information of the mesh and its distribution to the linear solver. A large part of the overall computing time of the finite element method is spend by the linear solver. Thus, its parallelization is of major importance. Based on the presented concept for distributed meshes, we preset several different linear solver methods. Hereby we concentrate on general purpose linear solver, which makes only little assumptions about the systems to be solver. For this, a new FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) method is proposed. Those the standard FETI-DP method is quasi optimal from a mathematical point of view, its not possible to implement it efficiently for a large number of processors (> 10,000). The main reason is a relatively small but globally distributed coarse mesh problem. To circumvent this problem, we propose a new multilevel FETI-DP method which hierarchically decompose the coarse grid problem. This leads to a more local communication pattern for solver the coarse grid problem and makes it possible to scale for a large number of processors. Besides the parallelization of the finite element method, we discuss an approach to speed up serial computations of existing finite element packages. In many computations the PDE to be solved consists of more than one variable. This is especially the case in multi-physics modeling. Observation show that in many of these computation the solution structure of the variables is different. But in the standard finite element method, only one mesh is used for the discretization of all variables. We present a multi-mesh finite element method, which allows to discretize a system of PDEs with two independently refined meshes.Softwarepakete zur numerischen Lösung partieller Differentialgleichungen mit Hilfe der Finiten-Element-Methode sind in vielen Forschungsbereichen ein wichtiges Werkzeug. Die dahinter stehenden Datenstrukturen und Algorithmen unterliegen einer ständigen Neuentwicklung um den immer weiter steigenden Anforderungen der Nutzergemeinde gerecht zu werden und um neue, hochgradig parallel Rechnerarchitekturen effizient nutzen zu können. Dies ist auch der Kernpunkt dieser Arbeit. Um parallel Rechnerarchitekturen mit einer immer höher werdenden Anzahl an von einander unabhängigen Recheneinheiten, z.B.~Prozessoren, effizient Nutzen zu können, müssen Datenstrukturen und Algorithmen aus verschiedenen Teilgebieten der Mathematik und Informatik entwickelt und miteinander kombiniert werden. Im Kern sind dies zwei Bereiche: verteilte Gitter und parallele Löser für lineare Gleichungssysteme. Für jedes der beiden Teilgebiete existieren unabhängig voneinander zahlreiche Ansätze. In dieser Arbeit wird argumentiert, dass für hochskalierbare Anwendungen der Finiten-Elemente-Methode nur eine Kombination beider Teilgebiete und die Verknüpfung der darunter liegenden Datenstrukturen eine effiziente und skalierbare Implementierung ermöglicht. Zuerst stellen wir Konzepte vor, die parallele verteile Gitter mit entsprechenden Adaptionstrategien ermöglichen. Zentraler Punkt ist hier die Informationsaufbereitung für beliebige Löser linearer Gleichungssysteme. Beim Lösen partieller Differentialgleichung mit der Finiten Elemente Methode wird ein großer Teil der Rechenzeit für das Lösen der dabei anfallenden linearen Gleichungssysteme aufgebracht. Daher ist deren Parallelisierung von zentraler Bedeutung. Basierend auf dem vorgestelltem Konzept für verteilten Gitter, welches beliebige geometrische Informationen für die linearen Löser aufbereiten kann, präsentieren wir mehrere unterschiedliche Lösermethoden. Besonders Gewicht wird dabei auf allgemeine Löser gelegt, die möglichst wenig Annahmen über das zu lösende System machen. Hierfür wird die FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) Methode weiterentwickelt. Obwohl die FETI-DP Methode vom mathematischen Standpunkt her als quasi-optimal bezüglich der parallelen Skalierbarkeit gilt, kann sie für große Anzahl an Prozessoren (> 10.000) nicht mehr effizient implementiert werden. Dies liegt hauptsächlich an einem verhältnismäßig kleinem aber global verteilten Grobgitterproblem. Wir stellen eine Multilevel FETI-DP Methode vor, die dieses Problem durch eine hierarchische Komposition des Grobgitterproblems löst. Dadurch wird die Kommunikation entlang des Grobgitterproblems lokalisiert und die Skalierbarkeit der FETI-DP Methode auch für große Anzahl an Prozessoren sichergestellt. Neben der Parallelisierung der Finiten-Elemente-Methode beschäftigen wir uns in dieser Arbeit mit der Ausnutzung von bestimmten Voraussetzung um auch die sequentielle Effizienz bestehender Implementierung der Finiten-Elemente-Methode zu steigern. In vielen Fällen müssen partielle Differentialgleichungen mit mehreren Variablen gelöst werden. Sehr häufig ist dabei zu beobachten, insbesondere bei der Modellierung mehrere miteinander gekoppelter physikalischer Phänomene, dass die Lösungsstruktur der unterschiedlichen Variablen entweder schwach oder vollständig voneinander entkoppelt ist. In den meisten Implementierungen wird dabei nur ein Gitter zur Diskretisierung aller Variablen des Systems genutzt. Wir stellen eine Finite-Elemente-Methode vor, bei der zwei unabhängig voneinander verfeinerte Gitter genutzt werden können um ein System partieller Differentialgleichungen zu lösen

    Nonlinear FETI-DP and BDDC Methods

    Get PDF
    In the simulation of deformation processes in material science the consideration of a microscopic material structure is often necessary, as in the simulation of modern high strength steels. A straightforward finite element discretization of the complete deformed body resolving the microscopic structure leads to very large nonlinear problems and a solution is out of reach, even on modern supercomputers. In homogenization approaches, as the computational scale bridging approach FE2, the macroscopic scale of the deformed object is decoupled from the microscopic scale of the material structure. These approaches only consider the microstructure in a localized fashion on independent and parallel representative volume elements (RVEs). This introduces massive parallelism on the macroscopic level and is thus ideal for modern computer architectures with large numbers of parallel computational cores. Nevertheless, the discretization of an RVE can still result in large nonlinear problems and thus highly scalable parallel solvers are necessary. In this context, nonlinear FETI-DP (Finite Element Tearing and Interconnecting - Dual-Primal) and BDDC (Balancing Domain Decomposition by Constraints) domain decomposition methods are discussed in this thesis, which are parallel solution methods for nonlinear problems arising from a finite element discretization. These approaches can be viewed as a strategies to further localize the computational work and to extend the parallel scalability of classical FETI-DP and BDDC methods towards extreme-scale supercomputers. Also variants providing an inexact solution of the FETI-DP coarse problem are considered in this thesis, combining two successful paradigms, i.e., nonlinear domain decomposition and AMG (Algebraic Multigrid). An efficient implementation of the resulting inexact reduced Nonlinear-FETI-DP-1 method is presented and scalability beyond 200,000 computational cores is showed. Finally, a highly scalable FE2 implementation using recent inexact reduced FETI-DP methods to solve the RVE problems on the microscopic level is presented and scalability on all 458,752 cores of the JUQUEEN BlueGene/Q system at Forschungszentrum JĂĽlich is demonstrated

    Reduced Order Modeling based Inexact FETI-DP solver for lattice structures

    Full text link
    This paper addresses the overwhelming computational resources needed with standard numerical approaches to simulate architected materials. Those multiscale heterogeneous lattice structures gain intensive interest in conjunction with the improvement of additive manufacturing as they offer, among many others, excellent stiffness-to-weight ratios. We develop here a dedicated HPC solver that benefits from the specific nature of the underlying problem in order to drastically reduce the computational costs (memory and time) for the full fine-scale analysis of lattice structures. Our purpose is to take advantage of the natural domain decomposition into cells and, even more importantly, of the geometrical and mechanical similarities among cells. Our solver consists in a so-called inexact FETI-DP method where the local, cell-wise operators and solutions are approximated with reduced order modeling techniques. Instead of considering independently every cell, we end up with only few principal local problems to solve and make use of the corresponding principal cell-wise operators to approximate all the others. It results in a scalable algorithm that saves numerous local factorizations. Our solver is applied for the isogeometric analysis of lattices built by spline composition, which offers the opportunity to compute the reduced basis with macro-scale data, thereby making our method also multiscale and matrix-free. The solver is tested against various 2D and 3D analyses. It shows major gains with respect to black-box solvers; in particular, problems of several millions of degrees of freedom can be solved with a simple computer within few minutes.Comment: 30 pages, 12 figures, 2 table

    Computational Homogenization with Million-way Parallelism using Domain Decomposition Methods

    Get PDF
    Parallel computational homogenization using the well-known FE2 approach is described and combined with fast domain decomposition and algebraic multigrid solvers. It is the purpose of this paper to show that and how the FE2 method can take advantage of the largest supercomputers available and those of the upcoming exascale era for virtual material testing of micro-heterogeneous materials such as advanced steel. The FE2 method is a computational micro-macro homogenization approach which incorporates micromechanical finite element simulations into macroscopic finite element simulations. In this approach, at each GauĂź integration point of the macroscopic finite element problem a microscopic finite element problem, defined on a representative volume element (RVE), is attached. Note that the FE2 method is not embarassingly parallel since the RVE problems are coupled through the macroscopic problem. Numerical results are presented considering different grids on both, the macroscopic and microscopic level. Unstructured as well as structured grids with different irregular domain decompositions are considered on the microscale. Finally, weak scaling results from a few nodes up to a million parallel processes are presented

    Energy Efficiency of Nonlinear Domain Decomposition Methods

    Get PDF
    A nonlinear domain decomposition (DD) solver is considered with respect to improved energy efficiency. In this method, nonlinear problems are solved using Newton’s method on the subdomains in parallel and in asynchronous iterations. The method is compared to the more standard Newton-Krylov approach, where a linear domain decomposition solver is applied to the overall nonlinear problem after linearization using Newton’s method. It is found that in the nonlinear domain decomposition method, making use of the asynchronicity, some processor cores can be set to sleep to save energy and to allow better use of the power and thermal budget. Energy savings up to 77% are observed compared to the more traditional Newton-Krylov approach, which is synchronous by design, using up to 5120 Intel Broadwell (Xeon E5-2630v4) cores. The total time to solution is not affected. On the contrary, remaining cores of the same processor may be able to go to turbo mode, thus reducing the total time to solution slightly. Last, we consider the same strategy for the ASPIN (Additive Schwarz Preconditioned Inexact Newton) nonlinear domain decomposition method and observe a similar potential to save energy

    Balancing domain decomposition by constraints and perturbation

    Get PDF
    In this paper, we formulate and analyze a perturbed formulation of the balancing domain decomposition by constraints (BDDC) method. We prove that the perturbed BDDC has the same polylogarithmic bound for the condition number as the standard formulation. Two types of properly scaled zero-order perturbations are considered: one uses a mass matrix, and the other uses a Robin-type boundary condition, i.e, a mass matrix on the interface. With perturbation, the wellposedness of the local Neumann problems and the global coarse problem is automatically guaranteed, and coarse degrees of freedom can be defined only for convergence purposes but not well-posedness. This allows a much simpler implementation as no complicated corner selection algorithm is needed. Minimal coarse spaces using only face or edge constraints can also be considered. They are very useful in extreme scale calculations where the coarse problem is usually the bottleneck that can jeopardize scalability. The perturbation also adds extra robustness as the perturbed formulation works even when the constraints fail to eliminate a small number of subdomain rigid body modes from the standard BDDC space. This is extremely important when solving problems on unstructured meshes partitioned by automatic graph partitioners since arbitrary disconnected subdomains are possible. Numerical results are provided to support the theoretical findings.Peer ReviewedPostprint (published version

    High-Performance Computing Two-Scale Finite Element Simulations of a Contact Problem Using Computational Homogenization - Virtual Forming Limit Curves for Dual-Phase Steel

    Get PDF
    The appreciated macroscopic properties of dual-phase (DP) steels strongly depend on their microstructure. Therefore, accurate finite element (FE) simulations of a deformation process of such a steel require the incorporation of the microscopic heterogeneous structure. Usually, a brute force FE discretization incorporating the microstructure is not feasible since it results in exceedingly large problem sizes. Instead, the microstructure has to be incorporated by using computational homogenization. We present a numerical two-scale approach of the Nakajima test for a DP steel, which is a well known material test in the steel industry. It can be used to derive forming limit diagrams (FLDs), which allow experts to judge the maximum formability properties of a specific type of sheet metal in the considered thickness. For the simulations, we use our software package FE2TI, which is a highly scalable implementation of the well known FE2 homogenization approach. The microstructure is represented by a representative volume element (RVE) and it is discretized separately from the macroscopic problem. We discuss the incorporation of contact constraints using a penalty formulation as well as appropriate boundary conditions. In addition, we introduce a simple load step strategy and different opportunities for the choice of an initial value for a single load step by using an interpolation polynomial. Finally, we come up with computationally derived FLDs. Although we use a computational homogenization strategy, the resulting problems on both scales can be quite large. The efficient solution of such large problems requires parallel strategies. Therefore, we consider the highly scalable nonlinear domain decomposition methods FETI-DP (Finite Element Tearing and Interconnecting - Dual-Primal) and BDDC (Balancing Domain Decomposition by Constraints). For the first time, the BDDC approach is used for the parallel solution of the macroscopic problem in a simulation of the Nakajima test. We introduce a unified framework that combines all variants of nonlinear FETI-DP and nonlinear BDDC. For the first time, we introduce a nonlinear FETI-DP variant that chooses suitable elimination sets by utilizing information from the nonlinear residual. Furthermore, we show weak scaling results for different nonlinear FETI-DP variants and several model problems

    Robust exact and inexact FETI-DP methods with applications to elasticity

    Get PDF
    Gebietszerlegungsverfahren sind parallele, iterative Lösungsverfahren für grosse Gleichungssysteme, die bei der Diskretisierung von partiellen Differentialgleichungen, etwa aus der Strukturmechanik, entstehen. In dieser Arbeit werden duale, iterative Substrukturierungsverfahren vom FETI-DP-Typ (Finite Element Tearing and Interconnecting Dual-Primal) entwickelt und auf elliptische partielle Differentialgleichungen zweiter Ordnung angewandt. Insbesondere wird versucht, robuste Verfahren für homogene und heterogene Elastizitaetsprobleme zu entwickeln. Ebenso werden neue, inexakte FETI-DP-Verfahren vorgestellt, die eine inexakte Lösung des Grobgitterproblems und/oder der Teilgebietsprobleme erlauben. Es wird gezeigt, dass die neuen Algorithmen unter bestimmten Voraussetzungen Abschätzungen der gleichen asymptotischen Güte wie das klassische, exakte FETI-DP-Verfahren erfüllen. Parallele Resultate unter Verwendung von algebraischen Mehrgitter für das Grobgitterproblem zeigen die verbesserte Skalierbarkeit der neuen Algorithmen.Domain decomposition methods are fast parallel solvers for large equation systems arising from the discretisation of partial differential equations, e.g. from structural mechanics. In this work, dual iterative substructuring methods of the FETI-DP (Finite Element Tearing and Interconnecting Dual-Primal) type are developed and applied to second order elliptic problems with emphasis on elasticity. An attempt is made to develop robust methods for homogeneous and heterogeneous problems. New inexact FETI-DP methods are also introduced that allow for inexact coarse problem solvers and/or inexact subdomain solvers. It is shown that under certain conditions the new algorithms fulfill the same asymptotic condition number estimate as the traditional, exact FETI-DP methods. Parallel results using algebraic multigrid for the FETI-DP coarse problem show the improved scalability of the new algorithms
    • …
    corecore