729 research outputs found
Recommended from our members
A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures
The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed.
This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm.
The strategies developed to parallelise unstructured mesh codes should meet a number of requirements:
The algorithms are faithfully reproduced in parallel.
The code is largely unaltered in the parallel version.
The parallel efficiency is maximised.
The techniques should scale to highly parallel systems.
The parallelisation process should become automated.
Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance.
The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect.
The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication
Analysis of the discontinuous Galerkin method for elliptic problems on surfaces
We extend the discontinuous Galerkin (DG) framework to a linear second-order
elliptic problem on a compact smooth connected and oriented surface. An
interior penalty (IP) method is introduced on a discrete surface and we derive
a-priori error estimates by relating the latter to the original surface via the
lift introduced in Dziuk (1988). The estimates suggest that the geometric error
terms arising from the surface discretisation do not affect the overall
convergence rate of the IP method when using linear ansatz functions. This is
then verified numerically for a number of test problems. An intricate issue is
the approximation of the surface conormal required in the IP formulation,
choices of which are investigated numerically. Furthermore, we present a
generic implementation of test problems on surfaces.Comment: 21 pages, 4 figures. IMA Journal of Numerical Analysis 2013, Link to
publication: http://imajna.oxfordjournals.org/cgi/content/abstract/drs033?
ijkey=45b23qZl5oJslZQ&keytype=re
Numerical solution of 3-D electromagnetic problems in exploration geophysics and its implementation on massively parallel computers
The growing significance, technical development and employment of electromagnetic (EM) methods in exploration geophysics have led to the increasing need for reliable and fast techniques of interpretation of 3-D EM data sets acquired in complex geological environments. The first and most important step to creating an inversion method is the development of a solver for the forward problem. In order to create an efficient, reliable and practical 3-D EM inversion, it is necessary to have a 3-D EM modelling code that is highly accurate, robust and very fast. This thesis focuses precisely on this crucial and very demanding step to building a 3-D EM interpretation method.
The thesis presents as its main contribution a highly accurate, robust, very fast and extremely scalable numerical method for 3-D EM modelling in geophysics that is based on finite elements (FE) and designed to run on massively parallel computing platforms. Thanks to the fact that the FE approach supports completely unstructured tetrahedral meshes as well as local mesh refinements, the presented solver is able to represent complex geometries of subsurface structures very precisely and thus improve the solution accuracy and avoid misleading artefacts in images. Consequently, it can be successfully used in geological environments of arbitrary geometrical complexities. The parallel implementation of the method, which is based on the domain decomposition and a hybrid MPI-OpenMP scheme, has proved to be highly scalable - the achieved speed-up is close to the linear for more than a thousand processors. Thanks to this, the code is able to deal with extremely large problems, which may have hundreds of millions of degrees of freedom, in a very efficient way. The importance of having this forward-problem solver lies in the fact that it is now possible to create a 3-D EM inversion that can deal with data obtained in extremely complex geological environments in a way that is realistic for practical use in industry. So far, such imaging tool has not been proposed due to a lack of efficient, parallel FE solutions as well as the limitations of efficient solvers based on finite differences.
In addition, the thesis discusses physical, mathematical and numerical aspects and challenges of 3-D EM modelling, which have been studied during my research in order to properly design the presented software for EM field simulations on 3-D areas of the Earth. Through this work, a physical problem formulation based on the secondary Coulomb-gauged EM potentials has been validated, proving that it can be successfully used with the standard nodal FE method to give highly accurate numerical solutions. Also, this work has shown that Krylov subspace iterative methods are the best solution for solving linear systems that arise after FE discretisation of the problem under consideration. More precisely, it has been discovered empirically that the best iterative method for this kind of problems is biconjugate gradient stabilised with an elaborate preconditioner. Since most commonly used preconditioners proved to be either unable to improve the convergence of the implemented solvers to the desired extent, or impractical in the parallel context, I have proposed a preconditioning technique for Krylov methods that is based on algebraic multigrid. Tests for various problems with different conductivity structures and characteristics have shown that the new preconditioner greatly improves the convergence of different Krylov subspace methods, which significantly reduces the total execution time of the program and improves the solution quality. Furthermore, the preconditioner is very practical for parallel implementation. Finally, it has been concluded that there are not any restrictions in employing classical parallel programming models, MPI and OpenMP, for parallelisation of the presented FE solver. Moreover, they have proved to be enough to provide an excellent scalability for it
Efficient Multigrid Preconditioners for Atmospheric Flow Simulations at High Aspect Ratio
Many problems in fluid modelling require the efficient solution of highly
anisotropic elliptic partial differential equations (PDEs) in "flat" domains.
For example, in numerical weather- and climate-prediction an elliptic PDE for
the pressure correction has to be solved at every time step in a thin spherical
shell representing the global atmosphere. This elliptic solve can be one of the
computationally most demanding components in semi-implicit semi-Lagrangian time
stepping methods which are very popular as they allow for larger model time
steps and better overall performance. With increasing model resolution,
algorithmically efficient and scalable algorithms are essential to run the code
under tight operational time constraints. We discuss the theory and practical
application of bespoke geometric multigrid preconditioners for equations of
this type. The algorithms deal with the strong anisotropy in the vertical
direction by using the tensor-product approach originally analysed by B\"{o}rm
and Hiptmair [Numer. Algorithms, 26/3 (2001), pp. 219-234]. We extend the
analysis to three dimensions under slightly weakened assumptions, and
numerically demonstrate its efficiency for the solution of the elliptic PDE for
the global pressure correction in atmospheric forecast models. For this we
compare the performance of different multigrid preconditioners on a
tensor-product grid with a semi-structured and quasi-uniform horizontal mesh
and a one dimensional vertical grid. The code is implemented in the Distributed
and Unified Numerics Environment (DUNE), which provides an easy-to-use and
scalable environment for algorithms operating on tensor-product grids. Parallel
scalability of our solvers on up to 20,480 cores is demonstrated on the HECToR
supercomputer.Comment: 22 pages, 6 Figures, 2 Table
HONEI: A collection of libraries for numerical computations targeting multiple processor architectures.
We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3--4 and 4--16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development
Advanced interface modelling for 2D shell & 3D continuum problems
This work is motivated by the need for an efficient yet accurate approach for static and dynamic contact analysis of large-scale structures which can a) capture the optimum con- tact position with a moderate number of contact elements, and b) enable across-partition adaptive contact analysis within a parallel processing environment. In addressing these two issues, a novel adaptive node-to-surface contact approach is proposed to discretise the contact boundaries and to trace the evolution of contact locations.
Contact search is a demanding process that can become quite complicated for certain types of problem. In this work, an efficient and robust contact search method is proposed, which can a) locally track the master facet of a given slave node despite the appearance of highly non-smooth contact surface, including surfaces with concave/convex regions or with distinct boundaries as well as reversible normals, and b) globally reallocate the master-slave contact pairs based on the penetration state without an expensive global search, providing an effective adaptive contact approach.
A dual-interface-based domain decomposition method emphasising across-partition con- tact coupling is proposed. A pair of fully decomposed node-to-surface contact element are proposed to discretise the across-partition contact boundaries. The assumption of small incremental displacements is adopted, which a) avoids the excessive coupling between the decomposed master and slave, b) reduces significantly the communication overhead, and c) facilitates a flexible across-partition adaptive analysis. This strategy is found to provide good results for a sufficiently small time- or load-step, and it also facilitates mix-dimensional contact simulation.
Another interest in current thesis is the inaccuracy in non-smooth plates modelled us- ing 2D displacement-based shell elements. In this work the dominant factor causing the inaccuracy is recognised as the incompatible tangential rotations on the two sides of the in- tersection. A 3-noded coupling element is introduced to impose a continuous constraint to couple the incompatible rotations. The significance of the discontinuity in the shell-based folded structure and the effectiveness of the coupling element is demonstrated through numerical studies comparing shell-based models to high fidelity solid-based models.Open Acces
Strategies for producing fast finite element solutions of the incompressible Navier-Stokes equations on massively parallel architectures
To take advantage of the inherent flexibility of the finite element method in solving for flows within complex geometries, it is necessary to produce efficient implementations of the method. Segregation of the solution scheme and the use of parallel computers are two ways of doing this.
Here, the optimisation of a sequential segregated finite element algorithm is discussed, together with the various strategies by which this is done. Furthermore, the implications of parallelising the code onto a massively parallel computer, the MasPar, are explored.
This machine is of Single Instruction Multiple Data type and so modifications to the computer code have been necessary. A general methodology for the implementation of finite element programs is presented based on projecting the levels of data within the algorithm into a form which is ideal for parallelisation. Application of this methodology, in a high level language, has resulted in a code which runs at just under 30MFlops (in double precision). The computations are performed with minimal inter-processor communication and this represents an efficiency of 20% of the theoretical peak speed. Even though only high level language constructs have been used, this efficiency is comparable with other work using low level constructs on machines of this architecture. In particular, the use of data parallel arrays and the utilisation of the non-unique machine specific features of the computer architecture have produced an efficient, fast program
Communication-Avoiding Algorithms for a High-Performance Hyperbolic PDE Engine
The study of waves has always been an important subject of research. Earthquakes, for example,
have a direct impact on the daily lives of millions of people while gravitational waves reveal
insight into the composition and history of the Universe. These physical phenomena, despite
being tackled traditionally by different fields of physics, have in common that they are modelled
the same way mathematically: as a system of hyperbolic partial differential equations (PDEs).
The ExaHyPE project (“An Exascale Hyperbolic PDE Engine") translates this similarity into
a software engine that can be quickly adapted to simulate a wide range of hyperbolic partial
differential equations. ExaHyPE’s key idea is that the user only specifies the physics while the
engine takes care of the parallelisation and the interplay of the underlying numerical methods.
Consequently, a first simulation code for a new hyperbolic PDE can often be realised within a
few hours. This is a task that traditionally can take weeks, months, even years for researchers
starting from scratch.
My main contribution to ExaHyPE is the development of the core infrastructure. This
comprises the development and implementation of ExaHyPE’s solvers and adaptive mesh
refinement procedures, it’s MPI+X parallelisation as well as high-level aspects of ExaHyPE’s
application-tailored code generation, which allows to adapt ExaHyPE to model many different
hyperbolic PDE systems. Like any high-performance computing code, ExaHyPE has to tackle the
challenges of the coming exascale computing era, notably network communication latencies and
the growing memory wall. In this thesis, I propose memory-efficient realisations of ExaHyPE’s
solvers that avoid data movement together with a novel task-based MPI+X parallelisation
concept that allows to hide network communication behind computation in dynamically adaptive
simulations
- …