Mapping unstructured mesh codes onto local memory parallel architectures by Jones, Beryl Wyn
Greenwich Academic Literature Archive (GALA)
– the University of Greenwich open access repository
http://gala.gre.ac.uk
__________________________________________________________________________________________
Citation:
Jones, Beryl Wyn (1994) Mapping unstructured mesh codes onto local memory parallel architectures. 
PhD thesis, University of Greenwich.
__________________________________________________________________________________________
Please note that the full text version provided on GALA is the final published version awarded 
by the university. “I certify that this work has not been accepted in substance for any degree, 
and is not concurrently being submitted for any degree other than that of (name of research 
degree) being studied at the University of Greenwich. I also declare that this work is the result 
of my own investigations except where otherwise identified by references and that I have not 
plagiarised the work of others”.
Jones, Beryl Wyn (1994) Mapping unstructured mesh codes onto local memory parallel architectures  
. ##thesis  _type##  ,  ##institution##  
Available at: http://gala.gre.ac.uk/6201/
__________________________________________________________________________________________
Contact: gala@gre.ac.uk
MAPPING UNSTRUCTURED MESH CODES ONTO LOCAL 
MEMORY PARALLEL ARCHITECTURES
Beryl Wyn Jones
A thesis submitted in partial fulfilment of the
requirements of the University of Greenwich
for the degree of Doctor of Philosophy.
September 1994 
This research programme was funded by SERC
Centre for Numerical Modelling and Process Analysis
School of Mathematics, Statistics and Computing
University of Greenwich
London U.K.
Contents
Table of Contents
Acknowledgements .. .. .. .. .. .. ... v
Abstract .. .. .. .. .. .. .. .. vi
Chapter 1: Introduction .. .. .. .. .. .. 1
1.1 Introduction .. .. .. .. .. .. 2
1.2 Outline of Thesis .. .. .. .. .. .. 3
1.3 Graph Theory .. .. .. .. .. 4
1.4 Parallel Architectures .. .. .. .. .. 7
1.5 MIMD Multiple Instruction Stream, Multiple Data Stream 8
1.6 Processor Configurations .. .. .. .. .. 12
1.7 Parallel Performance Measurement .. .. .. .. 13
1.8 Unstructured Control Volume Grid Methods .. .. 14
1.8.1 Introduction .. .. .. .. .. .. 14
1.8.2 Vertex Centred Approach .. .. .. .. 15
1.8.3 Cell Centred Approach .. .. .. .. 17
1.9 The Mapping Problem .. .. .. .. .. 18
1.9.1 Statement of Problem .. .. .. .. 18
1.9.2 Objectives .. .. .. .. .. .. 19
1.9.3 Complexity .. .. .. .. .. .. 19
1.10 Summary of Existing Methods .. .. .. .. 21
1.10.1 Graph Partitioning Problem .. .. .. .. 21
1.10.2 Graph Embedding Problem .. .. .. .. 23
1.10.3 Prior Work on the Mapping problem .. .. 26
Chapter 2: Overview of Existing Techniques .. .. .. 32
2.1 Introduction .. .. .. .. .. .. .. 33
2.2 Nearest neighbour .. .. .. .. .. .. 33
2.2.1 Introduction .. .. .. .. .. .. 33
2.2.2 Regular Grids .. .. .. .. .. 34
2.2.2.1 One Dimensional Strip Partitioning 34
Contents
2.2.2.2 Two Dimensional Strip Partitioning 36
2.2.3 Non Regular Grids .. .. .. .. .. 38
2.2.4 Analysis of Method .. .. .. .. .. 39
2.3 Recursive Spectral Bisection .. .. .. .. 41
2.3.1 Introduction .. .. .. .. .. .. 41
2.3.2 The Laplacian Matrix .. .. .. .. 41
2.3.3 The Feidler vector .. .. .. .. .. 42
2.3.4 Analysis of method .. .. .. .. .. 44
2.3.5 Multilevel Recursive Spectral Bisection .. .. 44
2.4 Combinatorial Optimisation Methods .. .. .. 45
2.4.1 Introduction .. .. .. .. .. .. 45
2.4.2 General Formulation .. .. .. .. .. 46
2.4.3 General Purpose Algorithms .. .. .. .. 47
2.4.4 Simulated Annealing .. .. .. .. .. 49
2.4.4.1 Introduction .. .. .. .. 49
2.4.4.2 Methodology .. .. .. .. 49
2.4.4.3 Addressing Graph Partitioning .. 50
2.4.4.4 Analysis of Method .. .. .. 51
2.4.5 Tabu Search .. .. .. .. .. .. 52
2.4.5.1 Introduction .. .. .. .. 52
2.4.5.2 Methodology .. .. .. .. 52
2.4.5.3 Analysis of Method .. .. .. 55
Chapter 3: Recursive Clustering Algorithm .. .. .. 57
3.1 Introduction .. .. .. .. .. .. .. 58
3.2 Kerninghan-Lin Graph Bisection Method .. .. .. 58
3.2.1 Definition of Problem .. .. .. .. 58
3.2.2 Analysis of Method .. .. .. .. .. 68
3.2.3 Running Time of the Algorithm .. .. .. 69
3.3 Using Recursive Clustering to Partition Unstructured Meshes 69
3.4 Cost Function .. .. .. .. .. .. 72
n
_____ ______________
 ____________Contents
3.5 The Algorithm .. .. .. .. .. .. 73
Chapter 4: Extension of the Recursive Clustering Algorithm .. 79
4.1 Introduction .. .. .. .. .. .. .. 80
4.2 Eliminating Constraint of 2" Sub-meshes .. .. .. 80
4.3 Local Minima Trap .. .. .. .. .. .. 81
4.3.1 Type 1 .. .. .. .. .. .. 81
4.3.2 Type 2 .. .. .. .. .. .. 81
4.3.3 Renumbering Elements .. .. .. .. 83
4.3.4 Cuthill-McKee Algorithm .. .. .. .. 88
4.4 Specifying Processor Topology .. .. .. .. 94
4.5 The Algorithm .. .. .. .. .. .. 102
4.5.1 Routine "Input" .. .. .. .. .. 103
4.5.2 Routine "Form_Clusters" .. .. .. .. 104
4.5.3 Routine "Swapset" .. .. .. .. .. 105
4.5.4 Routine "Findg" .. .. .. .. .. 108
4.6 Test Cases .. .. .. .. .. .. .. 109
4.7 Larger Meshes .. .. .. .. .. .. 113
Chapter 5: Dealing with Large Meshes .. .. .. .. 117
5.1 Introduction .. .. .. .. .. .. .. 118
5.2 Creating Super-Elements .. .. .. .. .. 119
5.2.1 Recusive Graph Bisection .. .. .. .. 120
5.2.2 Image Network .. .. .. .. .. 124
5.3 Level of Granualarity .. .. .. .. .. 133
5.4 Conclusion .. .. .. .. .. .. .. 142
Chapter 6: Computational Results and Conclusions .. .. 143
6.1 Introduction .  .  .. .. .. .. .. 144
6.2 Parallelisation of UIFS .. .. .. .. .. 144
in
__ _______________ ______________Contents
6.3 Mesh Division .. .. .. .. .. .. 145
6.4 Efficiency of Parallel Solution .. .. .. .. 147
6.4.1 Simple 2D problem .. .. .. .. .. 147
6.4.2 Larger Meshes .. .. .. .. .. 151
6.5 Conclusions and Further Work .. .. .. .. 154
Appendix A .. .. .. .. .. .. 159
References .. .. .. .. .. .. .. 163
IV
Acknowledgements
I would like to thank my supervisors Professor Martin Everett and Professor Mark Cross 
for their invaluable advice, encouragement and guidance received during the course of 
this research.
I would also like to thank Steve Johnson for the indispensable discussions and assistance 
at various stages of the research.
Peter Lawrence and Kevin McManus are also gratefully acknowledged.
Thanks also go to the staff at the School of Mathematics, Statistics and Computing and 
to the postgraduates at the Centre for Numerical Modelling and Process Analysis of the 
University of Greenwich for providing a good working environment.
Thanks also to Frank for those unforgettable three days at Prague.
Finally, the financial support provided by the Science and Engineering Research Council 
is gratefully acknowledged.
Abstract
Initial work on mapping CFD codes onto parallel systems focused upon software which 
employed structured meshes. Increasingly, many large scale CFD codes are being based 
upon unstructured meshes. One of the key problem when implementing such large scale 
unstructured problems on a distributed memory machine is the question of how to 
partition the underlying computational domain efficiently. It is important that all 
processors are kept busy for as large a proportion of the time as possible and that the 
amount, level and frequency of communication should be kept to a minimum.
Proposed techniques for solving the mapping problem have separated out the solution into 
two distinct phases. The first phase is to partition the computational domain into cohesive 
sub-regions. The second phase consists of embedding these sub-regions onto the 
processors. However, it has been shown that performing these two operations in isolation 
can lead to poor mappings and much less optimal communication time.
In this thesis we develop a technique which simultaneously takes account of the processor 
topology whilst identifying the cohesive sub-regions. Our approach is based on an 
unstructured mesh decomposition method that was originally developed by Sadayappan 
et al [SER90] for a hypercube. This technique forms a basis for a method which enables 
a decomposition to an arbitrary number of processors on a specified processor network 
topology. Whilst partitioning the mesh, the optimisation method takes into account the 
processor topology by minimising the total interprocessor communication.
The problem with this technique is that it is not suitable for dealing with very large 
meshes since the calculations often require prodigious amounts of computing processing 
power.
The problem can be overcome by creating clusters of the original elements and using this 
to create a reduced network which is homomorphic to the original mesh. The technique
vi
can now be applied to the image network with comparative ease. The clusters are created 
using an efficient graph bisection method. The coarseness of the reduced mesh inevitably 
leads to a degradation of the solution. However, it is possible to refine the resultant 
partition to recapture some of the richness of the original mesh and hence achieve 
reasonable partitions.
One of the issues to be addressed is the level of granuality to obtain the best balance 
between computational efficiency and optimality of the solution. Some progress has been 
made in trying to find an answer to this important issue.
In this thesis, we show how the above technique can be effectively utilised in large scale 
computations. Results include testing the above technique on large scale meshes for 
complex flow domains.
vn
To Dad
Gresyn blodeuyn mor deg 
Ei ffoi cyn fo'i adeg
Vlll
Chapter 1
Chapter 1 
Introduction
page 1
__________________________________________Chapter 1
1.1 Introduction
Many large scale computational problems are based on unstructured computational 
domains. By using unstructured meshes, this allows the code to cater for completely 
general geometries and hence a wide range of problems in both two and three space 
dimensions. Examples include unstructured grid calculations based on finite volume 
methods in computational fluid dynamics, or structural analysis problems based on finite 
element approximations.
Software packages have been developed with the intention of using the results of the 
analysis for solving such problems. Analysis is carried out for the selected input 
parameters and the results are interpreted for optimising a design. This iterative procedure 
requires interpretation of results and also uses a vast amount of time for solving a given 
problem. To reduce the computation time, various optimisation procedures have been 
incorporated into the code. One practicable approach is to use parallel computation 
techniques. Therefore, there is a demand for parallel computers and the development of 
parallel algorithms to execute on these computers.
One of the important problems to be addressed in this situation is to devise means of 
actually employing a sufficiently high fraction of the raw computational power of a 
parallel computer. Overheads due to interprocessor synchronisation and communication, 
processors sitting idle due to contention for shared hardware resources, and uneven load 
balancing in the distribution of computational load can lead to poor overall performance. 
To optimise the speedup of a parallel program on a parallel computer requires the 
mapping of the parallel tasks of the program among the processors such that the 
computational load is distributed as evenly as possible and at the same time minimising 
the amount of communication between the processors.
This thesis investigates mapping the tasks associated with the solution of unstructured 
grid problems to the processors of a parallel computer such that the execution time is 
minimised.
page 2
__________________________________________Chapter 1
1.2 Outline of Thesis
In the remainder of Chapter 1, we define terminology and notation for graph theory that 
is used throughout the thesis. We then discuss various parallel architectures and various 
configurations that can be used. The mapping problem is discussed with a short summary 
of existing methods.
In Chapter 2, we give a brief outline of some of the existing techniques for graph 
partitioning and embedding. These are methods that we have looked at extensively and 
discussions of the analysis of each method is given.
Chapter 3 discusses the Recursive Clustering algorithm which is a method based on the 
Kerninghan-Lin mincut algorithm [KL70]. We have modified the Recursive Clustering 
algorithm so that our needs are catered for and descriptions of these modifications are 
discussed in Chapter 4.
This new modified algorithm gives reliable decompositions but one drawback is the time 
taken to decompose the meshes. We have overcome this problem and discussions of how 
this is done can be seen in Chapter 5.
Finally, Chapter 6 shows the parallel efficiency of the decompositions used together with 
conclusions and discussions of further work.
page 3
____________ ______ Chapter 1
1.3 Graph Theory
The following terminology and notation is used throughout this thesis [Wil85], [BM76]. 
A graph G is a pair of sets [V,E] where V is non-empty and E is a set of unordered pairs 
of elements of V. The elements of V are called the vertices of G and the elements of E 
are called the edges of E. VG is used to represent the vertices of G and EG is used to 
represent the edges of G. The symbols i)G and eG are used to denote the number of 
vertices and edges in G. If only one graph is being considered, then the letter G will be 
omitted from the symbols, and therefore we use V, E, \) and e instead of VG , EG , 1)G and
Two graphs G and H are said to be isomorphic if there is a one-one correspondence 
between their vertices which has the property that two vertices are joined by an edge in 
one graph if and only if the corresponding vertices are joined by an edge in the other.
Two vertices u, v of a graph G are adjacent if there is an edge joining them i.e. <u,v> 
e E.
With each <u,v> e EG , let there be associated an integer c(<u,v>), called its edge weight, 
and with each v e VG , let there be associated an integer w(v) called its vertex weight. 
Then G, together with these edges and vertex weights is called a weighted graph.
A vertex v and an edge e are incident if v is one of the vertices of e.
The degree pG(v) of a vertex v in G is the number of edges incident with v.
Figure 1.1 shows a graph G where i) = 8 and e = 14.
page 4
Chapter 1
Figure 1.1: A graph G with 8 vertices
To any graph G, there corresponds an adjacency matrix. This is the i) x \) matrix 
A(G)=[ajj], where a^ is the number of edges joining YJ and YJ. The Laplacian matrix of 
a graph G is defined as L(G)=[ljj] where lij=aij for i^j and ly =-pG(Vi) for each YJ e V.
Figure 1.2 shows the adjacency matrix and the Laplacian matrix for the graph G shown 
in Figure 1.1.
A(G) =
01011000
101 10000
01010001
11101111
10010100
00011010
00010101
001 10010
-3
1
0
1
1
0
0
0
1
-3
1
1
0
0
0
0
0
1
-3
1
0
0
0
1
1
1
1
-7
1
1
1
1
1
0
0
1
-3
1
0
0
0
0
0
1
1
-3
1
0
0
0
0
1
0
1
-3
1
0
0
1
1
0
0
1
-3
Figure 1.2 Adjacency matrix and Laplacian matrix of the graph G shown in 
Figure 1.1
page 5
____________ ____________Chapter 1
A directed graph (digraph) D=(V,E) is a graph whose edges are ordered pairs of vertices. 
With each digraph D we can associate a graph G on the same vertex set; corresponding 
to each directed edge of D, there is an edge of G with the same ends.
A network N is defined to be a weighted digraph with two distinguished subsets of 
vertices, X and Y, which are assumed to be disjoint and nonempty. 
The vertices in X are the sources and those in Y are the sinks of N. The edge weight C 
of each edge is a non-negative integer called the capacity.
A cutset in a network N is a set of edges which when removed disconnects the source 
nodes from the sink nodes.
The weight of a cutset is equal to the sum of the capacities of the edges in the cutset. 
The Max-Flow Min-Cut theorem [FF62] states that the value of a maximum flow in a 
network is equal to the weight of a minimum cutset of that network.
page 6
___________ __________Chapter 1
1.4 Parallel Architectures
The availability of relatively cheap and efficient microprocessors has produced a 
tremendous upsurge in the development of parallel computers [Car88], [Cri88], [Duc86]. 
These computers now consist of numerous (up to thousands) of processors. These 
processors usually have reduced instruction sets and are frequently referred to as 
processing elements (PEs). This section gives a brief overview of some of the models 
that exist.
The SISD (Single Instruction Stream, Single data Stream) is the original von Neumann 
model of computation where only one instruction is processed at a time on a single item 
of data. Some parallelism may occur in the internal operations of such machines, for 
example, parallel loading and storing of data items along with actual arithmetic 
operations.
The MISD (Multiple Instruction Stream, Single Data Stream) performs several 
instructions simultaneously on a single stream of data. Strictly speaking, this category 
could contain the operation of internally parallel SISD architectures and pipeline 
processors, but since the user's understanding of computer architectures is in our interest, 
neither is included.
Computer architectures such as the SIMD (Single Instruction Stream, Multiple Data 
Stream) commonly known as vector or pipeline computer architectures. A SIMD 
computational model corresponds to a single stream of instructions each of which is 
applied to multiple data items.
A broad definition of a vector processor is where each processing element allows a 
sequence of identical operations at the same time but acts upon different sets of data. 
This type of operations is often featured in operations involving vectors of data.
page 7
_________________________________________Chapter 1
With pipeline processors, overlapping in the execution of instructions is permitted. The 
data enters the pipeline at the processing element performing the first stage of the 
operation, passing through the other processing elements until finally arriving at the last 
one for the final stage of the operation. Parallelism is achieved when several data items 
pass through such a pipeline, but with each item passing through different stages at the 
same time. It is important that every processing element in the pipeline is kept busy in 
order to achieve a significant speed-up. This is accomplished by passing several data 
items that need the same overall operation to be performed on them through the same 
pipeline. This is typical for vector operations where the data passing through the pipeline 
consists of each consecutive element of the vector(s) concerned.
1.5 MIMD Multiple Instruction Stream, Multiple Data Stream
This type of machine is the one that we are focusing on and it typically consists of a 
number of processing units each capable of executing its own program on separate sets 
of data. All the processing units are interconnected and to achieve parallelism, the overall 
task must be broken down into a group of many sub-tasks .
There are various designs of MIMD machines with a major distinguishing feature being 
the interconnection network. The two extreme classes of machines are discussed, namely 
the shared memory systems and distributed memory systems [Cri88].
Shared memory systems use a shared global memory that is accessible from every 
processing unit via a communication bus. The processors can be considered identical 
(providing the processors are of the same type) and the programmer need not be 
concerned with the issue of mapping which task of the computation onto which processor 
since communication between any pairs of processors is the same. Problems occur with 
such systems when large number of processors are used since the communication bus 
hardware becomes a bottleneck when many processors request access to the global 
memory. Another disadvantage is that the bus only permits one processor to access the
page 8
____________ __________Chapter 1
global memory at any time. Thus if many pairs of processors require interaction on a 
pair-wise basis, they will have to do so in sequence rather than in parallel. Figure 1.3 
shows an example of a shared memory system.
Distributed memory systems consists of processing elements which have their own local 
memory unit. The processors are joined by an interconnection network so an overall task 
can be performed on by many processors and data can be sent from one processor to 
another. With this type of machine, the processors do not have to fight for access to the 
shared global memory and the communication bus does not become a bottleneck, but data 
traffic bottlenecks can occur with a large processor network. Unlike the shared memory 
system, task to processor allocation is not arbitrary and a task should be placed on a 
processor that either holds the data to be accessed or can access the data through as short 
as possible a communications route. The program data should, if possible, be divided 
over all the local memories with a minimum of duplication to ensure efficiency of such 
a system. Figure 1.4 shows an example of a distributed memory system.
PVM (Parallel Virtual machine) [SHH94] from ORNL has become a de-facto standard 
for message-passing systems and because it is freely available, it has spread all over the 
academic community and beyond. PVM has been ported to a big variety of currently 
available machines ranging from workstations to MPP-systems. The highlight of PVM 
is its usability in heterogeneous environments. However, its functionality is limited. 
As a consequence, the international initiative MPI (Message Passing Interface) [Hem94] 
was started in 1992 by the Center for Research in Parallel Computing at Rice University 
and Oak Ridge National laboratory. The goal is to define a message passing interface 
which will then be implemented and supported by all hardware manufacturers. It was not 
the design goal to support low-level features to be used by parallelising compilers. The 
focus of MPI is the point-to-point communication between pairs of processors, and 
collective communication within process groups. More advanced concepts allow creating 
those groups, and giving them topological structure.
page 9
Communication Bus
Global 
Memory
P E : Processing Element
Figure 1.3: Shared Memory System
Chapter 1
page 10
Chapter 1
Interconnection Network
P E : Processing Unit 
M : Memory
Figure 1.4: Distributed Memory System
page 11
Chapter 1
1.6 Processor Configurations
The processor network used can be in a number of different configurations [Car88]. The 
configuration chosen should be influenced by the data access structure of the code 
concerned. The amount of communication time acquired can be minimised by a sensible 
choice of network configuration. Examples of network configurations are shown in Figure 
1.5.
(a)
(b)
(c)
Figure 1.5: Processor Configurations 
(a) Chain; (b) Grid; (c) 3 Way Hypercube
page 12
__________ _______________Chapter 1
1.7 Parallel Performance Measurement
There are two practical ways in which performance of software on parallel systems can 
be measured. The first is speedup (Sp) which is defined as:-
~ _ Time on a single processor 
Time on p processors
Sp gives the number of times faster the software executes on p processors as opposed to 
the execution on a single processor [HJ81], However, there are two possible single 
processor times that can be used, both carrying slightly different information about the 
software.
Firstly, if the single processor time is that of the best serial version, using optimal serial 
algorithms, then the speedup signifies the advantage of using a parallel machine rather 
than a serial machine. If the algorithms used for the parallel version are different to those 
in serial, then the speedup figure can be reduced because the serial performance may be 
sacrificed for the parallel nature of the new algorithm. The second single processor time 
that can be used is that of the parallel version being run on a single processor. This 
speedup represents the performance of a parallel machine as more processors are used 
and not performance over serial because any serial version should always use the best 
serial algorithms available.
Efficiency is the second measure of performance of software on MIMD machines and 
this is a measure of how well an application uses the available computer power. Again, 
there are two types that can be used. The first is known as efficiency percentage (Ep) and 
it is given by :
Ep = * 100 = sPeeduP on P processors
Ep indicates the percentage of available processor time which has been beneficially used,
page 13
__________________________________________Chapter 1
j-, Total Idle Time ^ Ep = \ 
Total Processor Time
1.8 Control Volume Unstructured Grid Methods
1.8.1 Introduction
page 14
___________ _________________Chapter 1
1.8.2 Vertex Centred Approach
Figure 1.6: Finite element mesh
page 15
__________________________________________Chapter 1
Fig 1.7: Vertex-Centred Mesh-Control Volume
page 16
___________ ____________Chapter 1 
1.8.3 Cell-Centred Approach
Fig 1.8: Cell-Centred Mesh-Control Volume
page 17
_________________________Chapter 1
1.9 The Mapping Problem
1.9.1 Statement of Problem
timecomn + iimecomp
page 18
__________________________________________Chapter 1
1.9.2 Objectives
1.9.3 Complexity
page 19
__________________________________________Chapter 1
page 20
_________________________________________Chapter 1
1.10 Summary of Existing Methods
1.10.1 Graph Partitioning Problem
n 
page 21
__________________________________________Chapter 1
x2 
x2 
~page 
__________________________________________Chapter 1
1.10.2 Graph Embedding Problem
page 23
__________ _____
______________Chapter 1
page 24
Chapter 1
(a) Communication cost = 70
(b) Communication cost = 60
(c) Communication cost = 60
Figure 1.9: Possible 4-way partitions of a 40x20 grid with processor topology.
___________________________________________Chapter 1 
1.10.3. Prior work on the Mapping Problem
Figure 1.10: A network flow graph constructed from a task graph with two vertices.
page 26
_________^^^ _____________________Chapter 1
s l 
page 
____________________
_______________Chapter 1
n 
O(n2) 
page 28
__________________________________________Chapter 1
page 29
__________________________________________Chapter 1
page 30
_______________________________________
____Chapter 1
page 31
page 32
Chapter 2
2.1 Introduction
2.2 Nearest Neighbour 
2.2.1 Introduction
page 33
_______ _______Chapter 2 
2.2.2 Regular Grids
2.2.2.1 One Dimensional Strip Partitioning
page 34
Chapter 2
Precmon
Fig 2.1: Example of one-dimensional strip partitioning
__________________________________________Chapter 
Two Dimensional Strip Partitioning
Chapter 2
Processors Processors
:-.ltial Loads
' a . _cac
____________________
____________________
___Chapter 2
2.2.3 Non Regular Grids
page 38
Chapter 2
Figure 2.3: Example of a one-dimensional non-regular graph
i - I i + 
m n 
2.2.4 Analysis of Method
page 39
Chapter 2
WWW\A/ AAAAAAAA
77
I/I/1A/I7171/1/77
77
77
77
77
77
\AAAAAAAA WWWVV\
Figure 2.4: A simple mesh illustrating the limitation of 
the nearest neighbour technique
Chapter 2
2.3 Recursive Spectral Bisection 
2.3.1. Introduction
2.3.2 The Laplacian Matrix
page 41
____ ____ ^ ^ _______________________ Chapter 2
xlt 
2.3.3 The Fiedler Vector
x2 
x2 
x2 
x2 
page 42
Chapter 2
a) Original mesh b) Partition into two using RSB
Figure 2.5: A simple mesh illustrating the RSB producing disconnected sub-domains
tn Pi10 r
Figure 2.6: Recursive Spectral Bisection Algorithm.
page 43
__________________________________________Chapter 2 
2.3.4 Analysis of Method
2.3.5 Multilevel Recursive Spectral Bisection
page 44
_________________________________________Chapter 2
2.4 Combinatorial Optimisation Methods
2.4.1 Introduction
N, 
N 
page 45
____________________
____________________
__Chapter 2
2.4.2 General Formulation
page 46


temperature. 
a 
cooling ratio r, 0 < r < 1, integer temperature length L. 
S 
e^ 
T 
T. 
T, 
S.
2. T > 
S' S.
(S'} - (S).
S = S'.
S S' e
T = rt 
G = (V, E) 
V = V} V2 V 
V Vj V2 
(V,,V2) (V} - M, V2 (V,, 
(VJt 
(V,,V2)= u V2 \ 
V, \ V} and 




E



E 
a / -; av 
yeB
a ' -> ax. 
xeA
^bx 
xeA
by

= 1



Yes
No
i1
2
3
4
5
6
7
8
9
a. 1
2
12
11
7
10
6
1
5
9
b, j
8
3
4
13
16
17
14
18
15
gi
1
2
-1
_2
-1
-1
1
-1
0
G=Ig,
1
3
2
0
-1
-2
-1
-2
-2
kbest
1
2
2
2
2
2
2
2
2


X
X
3/ 
/*
/8
[02101000 
20212110 
12021110 
01201121 
12110210 
01112021 
01121202 
00010120
page 71




7XX7X
7XX/X
X/X/X
X/XZX
77





Processor 1 Processor 2 Processor 4 Processor 5
Processor 3

8
.13
B A
15
10
B
B B
68
B
1 D
7
12 11
8


78 12
1 9 23 2B 35 38
3 9/
/ /36

zzzz
7777
7777
ZZZZZ
ZZZZZZ
ZZZZZ
ZZZZZ

Processor 1 Processor 2 Processor 3 Processor 4 Processor 5




VA/WV
/yv\A
A

Input
Form clusters
I
Swapset
1
Findg
1
Output 
Results
Node number
Element number
56

0112
1011
1101
2110




Proc 
1
Proc 
2
Proc 
3
Proc 
4

7777
7777
VYYVV
&AAAA[ZXYVYWS




W9999999&



/V 7171717177
A
f
\Z\
/



Processor 1 Processor 2 Processor 3 Processor 4 Processor 5












50-
30
20-
64 256 
page 140







&
No of processors
1
2
3
4
5
6
T800
100
98
97
96
94
90
i860
100
92
81
76
70
66













REFERENCES





