Search CORE

7 research outputs found

Parallel implementation of the finite element method on shared memory multiprocessors

Author: Pakzad Mustapha
Publication venue: Newcastle University
Publication date: 01/01/1995
Field of study

PhD ThesisThe work presented in this thesis concerns parallel methods for finite element analysis. The research has been funded by British Gas and some of the presented material involves work on their software. Practical problems involving the finite element method can use a large amount of processing power and the execution times can be very large. It is consequently important to investigate the possibilities for the parallel implementation of the method. The research has been carried out on an Encore Multimax, a shared memory multiprocessor with 14 identical CPU's. We firstly experimented on autoparallelising a large British Gas finite element program (GASP4) using Encore's parallelising Fortran compiler (epf). The par- allel program generated by epj proved not to be efficient. The main reasons are the complexity of the code and small grain parallelism. Since the program is hard to analyse for the compiler at high levels, only small grain parallelism has been inserted automatically into the code. This involves a great deal of low level syn- chronisations which produce large overheads and cause inefficiency. A detailed analysis of the autoparallelised code has been made with a view to determining the reasons for the inefficiency. Suggestions have also been made about writing programs such that they are suitable for efficient autoparallelisation. The finite element method consists of the assembly of a stiffness matrix and the solution of a set of simultaneous linear equations. A sparse representation of the stiffness matrix has been used to allow experimentation on large problems. Parallel assembly techniques for the sparse representation have been developed. Some of these methods have proved to be very efficient giving speed ups that are near ideal. For the solution phase, we have used the preconditioned conjugate gradient method (PCG). An incomplete LU factorization ofthe stiffness matrix with no fill- in (ILU(O)) has been found to be an effective preconditioner. The factors can be obtained at a low cost. We have parallelised all the steps of the PCG method. The main bottleneck is the triangular solves (preconditioning operations) at each step. Two parallel methods of triangular solution have been implemented. One is based on level scheduling (row-oriented parallelism) and the other is a new approach called independent columns (column-oriented parallelism). The algorithms have been tested for row and red-black orderings of the nodal unknowns in the finite element meshes considered. The best speed ups obtained are 7.29 (on 12 processors) for level scheduling and 7.11 (on 12 processors) for independent columns. Red-black ordering gives rise to better parallel performance than row ordering in general. An analysis of methods for the improvement of the parallel efficiency has been made.British Ga

Newcastle University eTheses

A class of alternate strip-based domain decomposition methods for elliptic partial differential Equations

Author: Mihai Loredana Angela
Publication venue
Publication date: 01/01/2005
Field of study

The domain decomposition strategies proposed in this thesis are efficient preconditioning techniques with good parallelism properties for the discrete systems which arise from the finite element approximation of symmetric elliptic boundary value problems in two and three-dimensional Euclidean spaces. For two-dimensional problems, two new domain decomposition preconditioners are introduced, such that the condition number of the preconditioned system is bounded independently of the size of the subdomains and the finite element mesh size. First, the alternate strip-based (ASB2) preconditioner is based on the partitioning of the domain into a finite number of nonoverlapping strips without interior vertices. This preconditioner is obtained from direct solvers inside the strips and a direct fast Poisson solver on the edges between strips, and contains two stages. At each stage the strips change such that the edges between strips at one stage are perpendicular on the edges between strips at the other stage. Next, the alternate strip-based substructuring (ASBS2) preconditioner is a Schur complement solver for the case of a decomposition with multiple nonoverlapping subdomains and interior vertices. The subdomains are assembled into nonoverlapping strips such that the vertices of the strips are on the boundary of the given domain, the edges between strips align with the edges of the subdomains and their union contains all of the interior vertices of the initial decomposition. This preconditioner is produced from direct fast Poisson solvers on the edges between strips and the edges between subdo- mains inside strips, and also contains two stages such that the edges between strips at one stage are perpendicular on the edges between strips at the other stage. The extension to three-dimensional problems is via solvers on slices of the domain

Durham e-Theses

OpenGrey Repository

Parallel iterative methods in semiconductor device modelling

Author: Coomer Rob
Publication venue
Publication date: 01/01/1994
Field of study

OPUS

Variational Domain Decomposition For Parallel Image Processing

Author: Kohlberger Timo
Publication venue: Universität Mannheim
Publication date: 01/01/2007
Field of study

Many important techniques in image processing rely on partial differential equation (PDE) problems, which exhibit spatial couplings between the unknowns throughout the whole image plane. Therefore, a straightforward spatial splitting into independent subproblems and subsequent parallel solving aimed at diminishing the total computation time does not lead to the solution of the original problem. Typically, significant errors at the local boundaries between the subproblems occur. For that reason, most of the PDE-based image processing algorithms are not directly amenable to coarse-grained parallel computing, but only to fine-grained parallelism, e.g. on the level of the particular arithmetic operations involved with the specific solving procedure. In contrast, Domain Decomposition (DD) methods provide several different approaches to decompose PDE problems spatially so that the merged local solutions converge to the original, global one. Thus, such methods distinguish between the two main classes of overlapping and non-overlapping methods, referring to the overlap between the adjacent subdomains on which the local problems are defined. Furthermore, the classical DD methods --- studied intensively in the past thirty years --- are primarily applied to linear PDE problems, whereas some of the current important image processing approaches involve solving of nonlinear problems, e.g. Total Variation (TV)-based approaches. Among the linear DD methods, non-overlapping methods are favored, since in general they require significanty fewer data exchanges between the particular processing nodes during the parallel computation and therefore reach a higher scalability. For that reason, the theoretical and empirical focus of this work lies primarily on non-overlapping methods, whereas for the overlapping methods we mainly stay with presenting the most important algorithms. With the linear non-overlapping DD methods, we first concentrate on the theoretical foundation, which serves as basis for gradually deriving the different algorithms thereafter. Although we make a connection between the very early methods on two subdomains and the current two-level methods on arbitrary numbers of subdomains, the experimental studies focus on two prototypical methods being applied to the model problem of estimating the optic flow, at which point different numerical aspects, such as the influence of the number of subdomains on the convergence rate, are explored. In particular, we present results of experiments conducted on a PC-cluster (a distributed memory parallel computer based on low-cost PC hardware for up to 144 processing nodes) which show a very good scalability of non-overlapping DD methods. With respect to nonlinear non-overlapping DD methods, we pursue two distinct approaches, both applied to nonlinear, PDE-based image denoising. The first approach draws upon the theory of optimal control, and has been successfully employed for the domain decomposition of Navier-Stokes equations. The second nonlinear DD approach, on the other hand, relies on convex programming and relies on the decomposition of the corresponding minimization problems. Besides the main subject of parallelization by DD methods, we also investigate the linear model problem of motion estimation itself, namely by proposing and empirically studying a new variational approach for the estimation of turbulent flows in the area of fluid mechanics

MAnnheim DOCument Server

Iterative methods for heterogeneous media

Author: Lechner Patrick O.
Publication venue
Publication date: 01/01/2006
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OPUS

OpenGrey Repository

The Sixth Copper Mountain Conference on Multigrid Methods, part 1

Author: Manteuffel T. A.
Mccormick S. F.
Melson N. Duane
Publication venue
Publication date
Field of study

The Sixth Copper Mountain Conference on Multigrid Methods was held on 4-9 Apr. 1993, at Copper Mountain, CO. This book is a collection of many of the papers presented at the conference and as such represents the conference proceedings. NASA LaRC graciously provided printing of this document so that all of the papers could be presented in a single forum. Each paper was reviewed by a member of the conference organizing committee under the coordination of the editors. The multigrid discipline continues to expand and mature, as is evident from these proceedings. The vibrancy in this field is amply expressed in these important papers, and the collection clearly shows its rapid trend to further diversity and depth

NASA Technical Reports Server