152 research outputs found
Software concepts and algorithms for an efficient and scalable parallel finite element method
Software packages for the numerical solution of partial differential equations (PDEs) using the finite element method are important in different fields of research. The basic data structures and algorithms change in time, as the user\'s requirements are growing and the software must efficiently use the newest highly parallel computing systems. This is the central point of this work.
To make efficiently use of parallel computing systems with growing number of independent basic computing units, i.e.~CPUs, we have to combine data structures and algorithms from different areas of mathematics and computer science. Two crucial parts are a distributed mesh and parallel solver for linear systems of equations. For both there exists multiple independent approaches. In this work we argue that it is necessary to combine both of them to allow for an efficient and scalable implementation of the finite element method. First, we present concepts, data structures and algorithms for distributed meshes, which allow for local refinement. The central point of our presentation is to provide arbitrary geometrical information of the mesh and its distribution to the linear solver.
A large part of the overall computing time of the finite element method is spend by the linear solver. Thus, its parallelization is of major importance. Based on the presented concept for distributed meshes, we preset several different linear solver methods. Hereby we concentrate on general purpose linear solver, which makes only little assumptions about the systems to be solver. For this, a new FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) method is proposed. Those the standard FETI-DP method is quasi optimal from a mathematical point of view, its not possible to implement it efficiently for a large number of processors (> 10,000). The main reason is a relatively small but globally distributed coarse mesh problem. To circumvent this problem, we propose a new multilevel FETI-DP method which hierarchically decompose the coarse grid problem. This leads to a more local communication pattern for solver the coarse grid problem and makes it possible to scale for a large number of processors.
Besides the parallelization of the finite element method, we discuss an approach to speed up serial computations of existing finite element packages. In many computations the PDE to be solved consists of more than one variable. This is especially the case in multi-physics modeling. Observation show that in many of these computation the solution structure of the variables is different. But in the standard finite element method, only one mesh is used for the discretization of all variables. We present a multi-mesh finite element method, which allows to discretize a system of PDEs with two independently refined meshes.Softwarepakete zur numerischen Lösung partieller Differentialgleichungen mit Hilfe der Finiten-Element-Methode sind in vielen Forschungsbereichen ein wichtiges Werkzeug. Die dahinter stehenden Datenstrukturen und Algorithmen unterliegen einer ständigen Neuentwicklung um den immer weiter steigenden Anforderungen der Nutzergemeinde gerecht zu werden und um neue, hochgradig parallel Rechnerarchitekturen effizient nutzen zu können. Dies ist auch der Kernpunkt dieser Arbeit.
Um parallel Rechnerarchitekturen mit einer immer höher werdenden Anzahl an von einander unabhängigen Recheneinheiten, z.B.~Prozessoren, effizient Nutzen zu können, müssen Datenstrukturen und Algorithmen aus verschiedenen Teilgebieten der Mathematik und Informatik entwickelt und miteinander kombiniert werden. Im Kern sind dies zwei Bereiche: verteilte Gitter und parallele Löser für lineare Gleichungssysteme. Für jedes der beiden Teilgebiete existieren unabhängig voneinander zahlreiche Ansätze. In dieser Arbeit wird argumentiert, dass für hochskalierbare Anwendungen der Finiten-Elemente-Methode nur eine Kombination beider Teilgebiete und die Verknüpfung der darunter liegenden Datenstrukturen eine effiziente und skalierbare Implementierung ermöglicht. Zuerst stellen wir Konzepte vor, die parallele verteile Gitter mit entsprechenden Adaptionstrategien ermöglichen. Zentraler Punkt ist hier die Informationsaufbereitung für beliebige Löser linearer Gleichungssysteme.
Beim Lösen partieller Differentialgleichung mit der Finiten Elemente Methode wird ein großer Teil der Rechenzeit für das Lösen der dabei anfallenden linearen Gleichungssysteme aufgebracht. Daher ist deren Parallelisierung von zentraler Bedeutung. Basierend auf dem vorgestelltem Konzept für verteilten Gitter, welches beliebige geometrische Informationen für die linearen Löser aufbereiten kann, präsentieren wir mehrere unterschiedliche Lösermethoden. Besonders Gewicht wird dabei auf allgemeine Löser gelegt, die möglichst wenig Annahmen über das zu lösende System machen. Hierfür wird die FETI-DP (Finite Element Tearing and Interconnect - Dual Primal) Methode weiterentwickelt. Obwohl die FETI-DP Methode vom mathematischen Standpunkt her als quasi-optimal bezüglich der parallelen Skalierbarkeit gilt, kann sie für große Anzahl an Prozessoren (> 10.000) nicht mehr effizient implementiert werden. Dies liegt hauptsächlich an einem verhältnismäßig kleinem aber global verteilten Grobgitterproblem. Wir stellen eine Multilevel FETI-DP Methode vor, die dieses Problem durch eine hierarchische Komposition des Grobgitterproblems löst. Dadurch wird die Kommunikation entlang des Grobgitterproblems lokalisiert und die Skalierbarkeit der FETI-DP Methode auch für große Anzahl an Prozessoren sichergestellt.
Neben der Parallelisierung der Finiten-Elemente-Methode beschäftigen wir uns in dieser Arbeit mit der Ausnutzung von bestimmten Voraussetzung um auch die sequentielle Effizienz bestehender Implementierung der Finiten-Elemente-Methode zu steigern. In vielen Fällen müssen partielle Differentialgleichungen mit mehreren Variablen gelöst werden. Sehr häufig ist dabei zu beobachten, insbesondere bei der Modellierung mehrere miteinander gekoppelter physikalischer Phänomene, dass die Lösungsstruktur der unterschiedlichen Variablen entweder schwach oder vollständig voneinander entkoppelt ist. In den meisten Implementierungen wird dabei nur ein Gitter zur Diskretisierung aller Variablen des Systems genutzt. Wir stellen eine Finite-Elemente-Methode vor, bei der zwei unabhängig voneinander verfeinerte Gitter genutzt werden können um ein System partieller Differentialgleichungen zu lösen
Benchmarking mixed-mode PETSc performance on high-performance architectures
The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of parallelism exposed in modern high-performance platforms. In order to realise the full potential of recent hardware advances, a mixed-mode between shared-memory programming techniques and inter-node message passing can be adopted which provides high-levels of parallelism with minimal overheads. For scientific applications this entails that not only the simulation code itself, but the whole software stack needs to evolve. In this paper, we evaluate the mixed-mode performance of PETSc, a widely used scientific library for the scalable solution of partial differential equations. We describe the addition of OpenMP threaded functionality to the library, focusing on sparse matrix-vector multiplication. We highlight key challenges in achieving good parallel performance, such as explicit communication overlap using task-based parallelism, and show how to further improve performance by explicitly load balancing threads within MPI processes. Using a set of matrices extracted from Fluidity, a CFD application code which uses the library as its linear solver engine, we then benchmark the parallel performance of mixed-mode PETSc across multiple nodes on several modern HPC architectures. We evaluate the parallel scalability on Uniform Memory Access (UMA) systems, such as the Fujitsu PRIMEHPC FX10 and IBM BlueGene/Q, as well as a Non-Uniform Memory Access (NUMA) Cray XE6 platform. A detailed comparison is performed which highlights the characteristics of each particular architecture, before demonstrating efficient strong scalability of sparse matrix-vector multiplication with significant speedups over the pure-MPI mode
Recommended from our members
A Survey of High-Quality Computational Libraries and their Impactin Science and Engineering Applications
Recently, a number of important scientific and engineering problems have been successfully studied and solved by means of computational modeling and simulation. Many of these computational models and simulations benefited from the use of available software tools and libraries to achieve high performance and portability. In this article, we present a reference matrix of the performance of robust, reliable and widely used tools mapped to scientific and engineering applications that use them. We aim at regularly maintaining and disseminating this matrix to the computational science community. This matrix will contain information on state-of-the-art computational tools, their applications and their use
Implementation and scalability analysis of balancing domain decomposition methods
Manuscript submitted for publication in SIAM Journal of Scientific Computing. Under Review.Preprin
Productive and efficient computational science through domain-specific abstractions
In an ideal world, scientific applications are computationally efficient,
maintainable and composable and allow scientists to work very productively. We
argue that these goals are achievable for a specific application field by
choosing suitable domain-specific abstractions that encapsulate domain
knowledge with a high degree of expressiveness.
This thesis demonstrates the design and composition of
domain-specific abstractions by abstracting the stages a scientist goes
through in formulating a problem of numerically solving a partial differential
equation. Domain knowledge is used to transform this problem into a different,
lower level representation and decompose it into parts which can be solved
using existing tools. A system for the portable solution of partial
differential equations using the finite element method on unstructured meshes
is formulated, in which contributions from different scientific communities
are composed to solve sophisticated problems.
The concrete implementations of these domain-specific abstractions are
Firedrake and PyOP2. Firedrake allows scientists to describe variational
forms and discretisations for linear and non-linear finite element problems
symbolically, in a notation very close to their mathematical models. PyOP2
abstracts the performance-portable parallel execution of local computations
over the mesh on a range of hardware architectures, targeting multi-core CPUs,
GPUs and accelerators. Thereby, a separation of concerns is achieved, in which
Firedrake encapsulates domain knowledge about the finite element method
separately from its efficient parallel execution in PyOP2, which in turn is
completely agnostic to the higher abstraction layer.
As a consequence of the composability of those abstractions, optimised
implementations for different hardware architectures can be
automatically generated without any changes to a single high-level
source. Performance matches or exceeds what is realistically attainable by
hand-written code. Firedrake and PyOP2 are combined to form a tool chain that
is demonstrated to be competitive with or faster than available alternatives
on a wide range of different finite element problems.Open Acces
- …