A note on the XRAM and PRAM models. by Pierre, Fraigniaud
HAL Id: hal-02102089
https://hal-lara.archives-ouvertes.fr/hal-02102089
Submitted on 17 Apr 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
A note on the XRAM and PRAM models.
Fraigniaud Pierre
To cite this version:
Fraigniaud Pierre. A note on the XRAM and PRAM models.. [Research Report] LIP RR-1996-03,
Laboratoire de l’informatique du parallélisme. 1996, 2+12p. ￿hal-02102089￿
Laboratoire de l’Informatique du Parallélisme
Ecole Normale Supérieure de Lyon
Unité de recherche associée au CNRS n°1398 
A note on the XRAM and PRAM
models
Pierre Fraigniaud January  
Research Report No 
Ecole Normale Supérieure de Lyon
Adresse électronique : lip@lip.ens−lyon.fr 
Téléphone : (+33) 72.72.80.00    Télécopieur : (+33) 72.72.80.80
46 Allée d’Italie, 69364 Lyon Cedex 07, France
A note on the XRAM and PRAM models
Pierre Fraigniaud
January  
Abstract
In this paper  we deal with the XRAMmodel introduced in  We mainly show that the
original denition of the XRAMmodel was not consistent  and must be slightly modied
Therefore  we modify the denition of the XRAM model to make it consistent  and we
study the consequence of this modication on the complexity theory developed in the
XRAM model The new model modies  in particular  the denition of a problem on a
XRAM  and thus on a PRAM and on a RAM since these two models are particular cases
of the XRAM However  we show that  though theoretically important  this modication
has no practical consequence on the complexity theory developed on the XRAM model
Keywords  PRAM  complexity
Resume
Cet article traite du modele XRAM introduit dans  et de ses implications sur le
modele PRAM Il rectie en particulier la denition originelle du modele XRAM pour
rendre ce modele robuste visavis de l	isomorphisme de graphe
Motscles  PRAM  complexite
A note on the XRAM and PRAM models
Pierre Fraigniaud  
Laboratoire de lInformatique du Parallelisme  CNRS
Ecole Normale Superieure de Lyon
	 Lyon Cedex 
 France
email pfraign lipenslyonfr
Abstract
In this paper  we deal with the XRAM model introduced in  We mainly show that
the original denition of the XRAM model was not consistent  and must be slightly modied
Therefore  we modify the denition of the XRAM model to make it consistent  and we study
the consequence of this modication on the complexity theory developed in the XRAM model
The new model modies  in particular  the denition of a problem on a XRAM  and thus on a
PRAM and on a RAM since these two models are particular cases of the XRAM However  we
show that  though theoretically important  this modication has no practical consequence on
the complexity theory developed on the XRAM model
 Supported by the research programs PRS and ANM of the CNRS  and by the DRET of the DGA


  Introduction
This paper deals with the XRAM model introduced by Cosnard and Ferreira in  The XRAM
model generalizes the PRAM model  by taking into account the several possible interconnection
topologies of the existing distributed memory parallel computers these ones are not fully connected
in general 

A random access machine RAM  consists of
  a memory with a potentially innite number of locations  and
  a processor capable of loading and storing data from and into the memory  executing arith
metic and logical operations using a nite number of internal registers  and operating under
the control of a program stored in a control unit
In one step requiring a unit of time  the processor can

 read a datum from an arbitrary location in memory into one of its internal registers 
 perform a computation on the content of one or two registers  and
 write the content of one register into an arbitrary memory location
The parallel RAM PRAM  consists of an arbitrary large number n of RAMs  all sharing
the same common memory Every step of a PRAM consists of three phases all along the paper 
we restrict ourselves to the exclusive read  exclusive write EREW model

 all processors read simultaneously from n dierent locations in the shared memory one for
each processor  and each processor stores the obtained value in one of its internal registers
 all processors perform a computation on the content of one or two local registers
 all processors write simultaneously into n dierent locations in the shared memory one for
each processor
Cosnard and Ferreira generalized the PRAM model by introducing the XRAM model as follows
Denition  From 
Let Xi  i         n  
  be a collection of subsets of f       n  
g An XRAMP M X is an
undirected bipartite graph such that P  fPi  i         n  
g and M  fMi  i         n  
g
are the two partitions representing the processors and the memory locations respectively and such
that Pi is connected to Mj if and only if j  Xi X is the corresponding interconnection network
Each computation step of an XRAM satises the same constraints as the PRAM excepted that the
memory locations that can access processor Pi are limited to Mj  j  Xi  i         n 

For instance  the hypercube RAM is dened by the sets Xi  fj  f       n 
g j the binary
expressions of i and j dier in at most one bit positiong  i         n  
 The hypercube RAM
is denoted by HRAM in the following The PRAM is a special case of Denition 
 where Xi 
f       n 
g for every i         n 
 although the number of memory location is limited to n

instead of being innite  but this can be easily solved by allowing jP j and jM j to be dierent  and
jM j to be arbitrarily large
Though this model is quite attractive  and draws a bridge between PRAMs and distributed
memory computers  it suers of a major default that is two virtually equivalent topologies are not
comparable This default is pointed out in the next section  and a new denition that corrects it
is proposed The main consequence of this new denition is the freedom of the initial and nal
placements of data and results This also forces a new denition of the complexity of a problem 
and then we can prove that two isomorphic topologies have indeed the same computational power
Hopefully  we also show that our new model does not modify the hierarchy of the complexity classes
since all results that were previously derived based on Denition 
 are still valid up to an additive
factor corresponding to the time of the permutation routing problem In Section   we discuss
about several properties of our new model In particular  we show that a problem can be naturally
decomposed in subproblems whereas such a formal decomposition was not easy in the former model
We also discuss about separation theorems  and show that most of the ones proved in  still hold
in our new model We also revisit the speedup folk theorem and the simulation theorem on a
PRAM and prove that a speedup of p 
 is possible on a pprocessors PRAM  even if there is no
constraint on the memory location of the data and the results We generalize in this way the result
obtained in 
  where input and output memory locations were part of the problem Finally  we
conclude the paper in Section  by some comments about the XRAM model as a practical andor
theoretical model for parallel computation
 A new denition of the XRAM model
  Comparability must be reexive
We adopt the same terminology as in  given a problem P and two models of computation M 
andM M P MP resp M P  MP if the complexity of P in the modelM is smaller
resp strictly smaller than the complexity of P in the model M  We will say that the model M 
is less powerful than the model M if M P  MP for every problems P  This is denoted by
M  M Moreover  if M  M  and if there exists a problem P such thatM P  MP  then
we say that M  is strictly less powerful than M  that is denoted by M   M
The twomodels PRAM and HRAM are separated in  as follows Of course HRAM  PRAM
Let us consider the cyclic shift problem dened by CMi CMi   mod n  i         n
  where
CMi denotes the content of the ith memory location This problem can be solved in one step on a
PRAM On the other hand  solving this problem on a HRAM requires at least logn steps since
CMn   must be sent to processor P that is at distance logn from Mn   in the hypercubic
network induced by the HRAM
Even if this proof is virtually correct  one can argue against it because it also proves that
the HRAM is strictly less powerful than   itself Indeed  let us consider two isomorphic copies
G  and G of an Hamiltonian graph G For instance  graphs a and b of Figure 
 are two
isomorphic copies of the dimensional hypercube Q Assume that the vertices of the two copies
are arbitrarily labeled These two graphs induce two XRAMs For instance  XRAMs c and d of
Figure 
 are obtained from the graphs a and b  respectively Now  in a same way as   
     is
an Hamiltonian cycle of graph a in Figure 
 but not of graph b  it is likely true that one can nd
a permutation    n such that the ordered set f i  i        n
g is an Hamiltonian cycle in

0 1
2 3
0 1
2 3
10 2 3 10 2 3
10 2 310 2 3
(a) (b) (c) (d)
Figure 
 Two isomorphic XRAMs obtained from two isomorphic copies of Q
G  but not in G  and a permutation   n such that the ordered set fi  i         n 
g is
an Hamiltonian cycle in G but not in G  Hence  following the same arguments as the separation
proof for the HRAM and the PRAM  one can prove that the two XRAMs obtained from two
isomorphic copies of the same graph G are incomparable
G RAMP  GRAMP and GRAMP
  G RAMP

where P and P  are two dierent versions of the cyclic shift problem adapted to the corresponding
Hamiltonian cycles of G  and G
Therefore  the classication based on the comparator  dened before does not produce a
partial order because it is not reexive As we will see later  it may also produce some strange
results but  in the following section  we rst modify the denition of the XRAM so that 
produces a partial order This will be enough to avoid inconsistent results that can be obtained
with Denition 

   A new denition of the XRAM model
We propose the following new denition for the XRAM model To make a distinction between the
denition of Cosnard and Ferreira  and the new denition  we denote our model by ioXRAM for
inputoutput XRAM
Denition  New denition of the XRAM ioXRAM
Let G be any graph of p vertices An ioXRAM of topology G consists in a set P for processor of
p RAMs  a set M for memory of p memory blocks  each block being potentially innite as does
the memory of a RAM  and two sets I for input and O for output of n memory locations The
p RAMs of P and the p memory blocks of M are connected as the incident bipartite graph of G
Computation on a ioXRAM are performed as follows
  Input the data Data are initially stored in I They are loaded inM using an input function
I  
 
I   

I  f       n
g  f       p
gN that maps I toM  The mapping I depends
on the problem solved but not on the values of the data the ith data  that is the one stored
in position i of I  is stored in the memory block M  
I
i at the address in this memory block
specied by Ii

 Computation This is done exactly in the same way as seen before for the PRAM or the
XRAM computation proceeds step by step  each step being composed of the three phases de
scribed in Section 	 where  for each processor  data can be loaded and stored from
to adjacent
memory blocks following the connections dened by the graph G
 Output the results Results must be placed in O They are loaded from M using an output
function O  
 
O  

O  f       n  
g  f       p  
g  N that maps O to M  As I 
the mapping O depends on the solved problem but not on the values of the results the ith
result  that is the one that must be placed in position i of O is stored in memory block M  
O
i
at the address Oi
The two functions I and O allow to take into account that two XRAMs dened from two
isomorphic copies of the same graph are the same even if the two sets of nodes are labeled in a
dierent way  the choice of the adapted functions I and O will allow to execute the same code for
solving the same problem on the two machines We will formally prove this fact soon but  before 
we need to dene what is the complexity of a problem in the ioXRAM model
  Complexity of a problem
An instance of a problem on an ioXRAM is dened as a function from I to O whereas it was
dened on a XRAM as a function from M to M  We are free to choose the best adapted input
and output functions I and O but this choice is  generally  of no help because it depends on the
problem and not on its instances For instance  one cannot choose the functions I and O such
that   O  I systematically sorts any set of keys
Of course the load of the data from the input set I to the memory  and the store of the result
from the memory to the output set O are only virtual operations It is simply a way to say where
are initially the data and where can be obtained the results Therefore  in the computation process 
phases 
 and  are for free  and only phase  is costly
More precisely  given a problem P   and given I and O  let A be an algorithm solving P  
that is for any instance of P   A transforms the contents of the memory locations according to
the rules of the ioXRAM computation such that if  for every i  the ith component of the data is
placed in memory block M  
I
i at the address 

Ii  then  for every j  the jth component of the
result is placed in M  
O
j at the address 

Oj As usual  the complexity of the algorithm A is the
maximum  taken over all the instances of P   of the number of steps of A required to solve a given
instance of P  Given I and O  the complexity of a problem P is the minimum  taken over all the
algorithms A solving P   of the complexity of A It is denoted by comp I OP
However  Denition  introduces a new degree of freedom  and solving a problem P on an
ioXRAM consists in

 nding I and O
 given I and O  nding the fastest algorithm A solving P 
Therefore  the complexity of a problem P is denoted by compP and satises
compP  min
 I O
comp I OP
We can now prove the following result that was not true with Denition 


Theorem  Let G  and G be two isomorphic copies of a graph G and let X  and X be the two
ioXRAMs obtained from G  and G respectively X  and X have the same power
Proof Let   and  be two arbitrary labelings of the nodes of G  and G respectively   and
 then also label the processors and the memory locations of X  and X These labeling  plus
the isomorphism  between G  and G  induce a permutation   p        
  
   Let I
and O be the best input and output functions for solving a problem P on X   and let A be
the best algorithm used to solved P on X   given I and O  Then choose the input function
   I   

I and the output function   
 
O  

O for X  and apply the algorithm A
 on X where
A is obtained from A by replacing each instruction Pi accesses Mj at the address k by Pi
accesses Mj at the address k A and A
 have the same complexity  
Remark The execution of the algorithm A in the proof of Theorem 
 can also be done using
the XRAM model Denition 
 excepted that the data are not placed initially at their correct
positions and therefore A will not produce the correct answer
Note also that  roughly speaking  the ioPRAM model and the PRAM model are identical
because two labelings of the vertices of the complete graph cannot be distinguished The unique
dierence lies on the statement of problems in these two models in the ioPRAM  a problem is
dened in terms of input and output  and not in term of memory location
To denitively convince that the input and output functions must be included in the denition
of the XRAM  let us consider the following example let Cn be the cycle of n vertices and Qlogn
be the hypercube of n vertices we assume n to be a power of  Label the vertices of Cn from
 to n  
 in the clockwise direction Label the vertices of the hypercube as usual  that is the
labeling obtained using the recursive construction of the cube vertex i is joined by and edge to
vertex j if and only if the binary expressions of i and j dier of exactly one bit Now  consider
the cyclic shift problem P as dened in Section 
 under the XRAM model It allows to prove
that QlognP  CnP Does it mean that the cycle is more powerful than the hypercube Of
course not  again the several ways of labeling the vertices are not taken into account in Denition 
 
and induce inconsistent results In fact the new denition of the XRAM model allow to prove the
following theorem that sounds quite natural but that was not true with the former denition
Theorem  Let G  V E be any graph  and G  V E be a subgraph of G  E  E Then the
ioXRAM of topology G is less powerful that the ioXRAM of topology G ioGRAM  ioGRAM
Proof Let  and  be two arbitrary labeling of the nodes of G and G respectively Since G is a
subgraph of G  one can dene        Let I  A
  O be the placements and the algorithm
solving a problem P on G Using the relabeling function  as we did in the proof of Theorem 
 
one can construct an algorithm A and placements I and O that directly apply to G Therefore
ioGRAMP  ioGRAMP  
Remark Why this straightforward proof did not applied in the model of Denition 
 Simply
because the labeling of the RAMs and the memory locations is more or less forced in Denition 

whereas it is not considered in Denition 
Note also that there exist many conditions for which ioGRAM  ioGRAM  where G is a
subgraph of G For instance it might be the case if the diameter or the girth of G turn to be

much larger than the ones of G However such conditions must be studied in detail because one
must also nd a problem for which these structural modications really induce an increase in the
problem complexity
  XRAM versus ioXRAM
It is known that sorting on hypercube is in logn and in Ologn log log n  Now  can we
prove that the complexity of sorting on an ioHRAM is in this range Such a question is meaningful
because a problem on an ioXRAM does not map the memory to itself  but an input set I to an
output set O  where I and O are both isomorphic to the memory space M   and where the choice
of the isomorphisms I 	 M and O 	 M are free Of course  the answer of this question is yes 
though up to the price of a permutation on the machine More precisely
Theorem  Let us consider an arbitrary pprocessor XRAM of topology G For any problem P 
we have
compIdIdP  OcompP  max
p
compIdIdP
where P is the problem that consists to permute any array A stored in I Ai in position i
following   and to obtain the result in O Ai in position i Moreover  this bound is tight
This theorem shows that although the virtual spaces I and O  and the functions that map these
spaces to the memory  must be introduced to keep consistent the formal denition of an abstraction
of a distributed memory computer  the complexity of a problem can be computed practically in
xing arbitrarily the input and output position of the data Note that the bound of Theorem  is
tight because for every   p  compP  O
 For instance  on a pprocessor hypercube  any
permutation can be oline routed in Olog p steps 
 Therefore all the result for the hypercube
that were previously derived are valid in the ioHRAM model up to an additive logarithmic factor
 General properties of the ioXRAM model
 Decomposition of a problem in subproblems
As we have seen  functions I and O were introduced to insure the reexivity of the comparability
by taking into account the possible graph isomorphisms As we said  such functions cannot be used
for solving a problem because they depend on the problem only  and not on its instances However 
one can be tempted to cheat by decomposing a problem in subproblems For instance  consider the
problem of adding matrices in the following order
 C  A  B
 D At  B 
 At denotes the transposition of A

where A and B are stored in I in row major order  and C and D are stored in O in row major order
This problem implies to transpose A This cannot be done using the input and output functions
once the data have been loaded  intermediate results cannot be output during the computation in
order to be loaded again in dierent memory locations after Indeed  the complexity of a problem
!
is evaluated once the data are loaded  and before they are output Therefore  if a problem P can
be decomposed into two successive subproblems P  and P  then
comp I OP  comp IIdP   compId OP 

However  it could be interesting to redistribute the data between the execution of P  and P
This redistribution might be costly  but may also allow to place the data in the right position so
that P can be executed rapidly For instance  if compP   comp  
I
  
O
P  and compP 
comp 
I
 
O
P  then
compP  compP   compIdId

I  
   
O   compP 
In Equation   compIdId

I  
   
O  is the time necessary to perform the permutation of the data
from their positions after the execution of P  to the positions chosen to perform P optimally It is
not clear whether or not the upper bound  is better than 
 In fact  there is a tradeo between 
on one hand  the time to perform P  and P given the input and output positions of the data  and 
on the other hand  the time to permute the data between P  and P Therefore  we can state the
following general upper bound
compP  min
 I OIO
 
comp I OP   compIdIdI  
  
O   compIOP


More generally  if a problem P can be decomposed into a succession of k subproblems P  P      Pk 
k    that we denote by P  P jPj    jPk  then compP is equal to
min
k 
min
P  P      Pk
P  P jPj    jPk
min

i
I   
i
O
i  
       k

 Pk  i 

comp
 
i
I
 
i
O
Pi  compIdId
i 
I  
i   
O 

 comp
 
k
I
 
k
O
Pk

 
The reader may nd interesting to refer to practical experiments where redistributing the data
between the several phases of a problem yields better results than the direct algorithm   

This is typically the case in the parallel implementations of the ScaLAPack subroutines for linear
algebra 
Remark Such a decomposition in subproblems was not so clear in the former XRAM model Let
us take an example nding the eigenvalues of a matrix is a well dened problem  but nobody will
never understand the sentence nding the eigenvalues of a matrix that is stored on a hypercube
such that row 
 is stored on processor   block    !      is stored on processor 
  column 
is stored on processor 
    as a problem Indeed  the problem is nding the eigenvalues of a
matrix and the other part of the sentence is just indications about the initial storage of the data
Such a distinction between problem and storage formally appears in the ioXRAM model
  About separation theorems
We have seen that the HRAM and the PRAM can be separated in the XRAM model This result
still holds in the ioXRAM model Indeed  let us consider the permutation problem dened by
COi  CIi  i         p  
 where  is an arbitrary permutation of p stored in I between
positions p and p 
 This problem can be solved in one step on a PRAM However  whatever is
the choice of the input and output functions  there exists a permutation  such that the memory

blockM  
I
i and the memory blockM  
I
i are at unbounded distance in the hypercube Indeed 
only a constant amount of data can be stored in each block  otherwise it would already take an
unbounded time just to access locally the data Therefore  it is true that
ioHRAM  PRAM
when we restrict our study to the EREW model Other separation results have been proved in 
Most of them separate not only topologies but also memory access constraints EREW  CREW 
CRCW They stay true in the ioXRAM model because proofs use arguments based on problems
not dened in term of memory location  but in fact in terms of input and output like searching or
prex computation
Tom Leighton deeply investigates in 
 the computational power of several topologies including
cycles linear arrays  meshes  meshes of trees  and hypercubes and related networks We refer the
reader to his book for the several simulation and separation results that link these topologies He
showed in particular that the buttery network is universal in the sense that it can emulate every
bounded degree network with a constant slowdown in the computation time The XRAM model
give a general framework to such results
 Speedup and simulation
Denition  applies to the ioXRAM of topologyKp the complete graph of p vertices  and therefore
to the PRAM model Of course it does not imply any modication of the PRAM theory because 
as we said  two labelings of the vertices of Kp cannot be distinguished However  we need to go
through the proofs of theorems based on problems described in terms of data movement inside the
memory the memory locations of the data and the results are specied as part of the problem
As an example  we consider the speedup folk theorem that says that the speedup of a parallel
algorithm using p processors cannot be greater than Op Of course  super linear speedup can be
obtained in practice that is on real parallel machines because a processor which deals with less
data may avoid problems as  for instance  cache miss  that might strongly slow down the sequential
computation on a large amount of data However  it is often said that a speedup larger than p
cannot be achieved on PRAM Akl  Cosnard and Ferreira 
 have shown that it is not true and
that a speedup of p  
 can be achieved on a PRAM of p processors This result holds mainly
because one must keep in mind that each RAM has a nite number of registers  and therefore a
PRAM of p processors has p times more registers than a single RAM This is why a pprocessors
PRAM is more than p times faster than a RAM
The proof in 
 lies on two arguments in the following  we assume that each processor has a
unique register the generalization to an arbitrary number of registers can be found in 


 there exists a problem that can be solved in one step on a pprocessors PRAM  and that
cannot be solved in less than p 
 steps on a single RAM Theorem  in 

 each step of a pprocessors PRAM can be simulated in p 
 steps on a RAM Theorem 

in 

The second argument stays true even under the model of Denition  However  the rst argument
used a problem of the class named datamovement intensive problem that is dened in terms of
memory location as follows

Problem  Let I        Ip be p distinct integers in the range   p stored in an array A in such a
way that Ai  Ii  i  
       p It is required to modify A so that it satises the following condition
Ai  i if there exists j such that Ij  i
Ai  Ii otherwise

Problem 
 requires 
 step on a pprocessors PRAM  whereas it requires at least p
 steps on a
RAM Indeed  the memory location of the input and the output is imposed  that is the data Ai is
given in memory location i  and the result Ai must be returned in memory location i  with a risk
of overwriting an unread data We could now imagine to store the results elsewhere to avoid this
problem Indeed  what is important is that we must know where is the result  but why the memory
location of the result should be specied in advance In fact it does not correspond to Denition 
Problem 
 under the PRAM model is translated in the following problem in the ioPRAM model
Problem  We are given an array A stored in I Ai in position i We want to modify it
according the rule of Problem 	  and we want the result stored in O Ai in position i
The two sentences stored in I and stored in O just mean we give you the data  and we
want the result  but the position where are stored and loaded the data in the memory is not part
of the problem  it is part of the algorithm solving the problem Anyway  one can still prove that
Problem  cannot be solved in less than p  
 steps on an ioPRAM  that is even with a total
freedom on the memory locations of the data and the results
Lemma  Problem  requires at least p 
 steps on an ioPRAM
Proof Let i  
  i  p  and let i  Oi More precisely  i denotes the memory location where
can be found the result Ai after modications specied by Equation  in Problem 
 CMi   i
if there exists j such that Ij  i  and CMi    Ii otherwise It means that the last instruction
write at the address i must follow at least p instructions read because p reads are necessary
to check whether or not there exists j such that Ij  i Therefore  for every i  
  i  p  each nal
write at the address Oi must be preceded by p reads That is a total of at least p 
 steps are
necessary one step can be economized because one can read and then write in the same step  
Therefore  we get the following result
Theorem  The speedup of a pprocessors ioPRAM over an ioRAM cannot exceed p 
  and
this bound is tight
Proof The tightness of the bound is given by lemma 
 The simulation theorem 
 in 
 shows
that any step of a pprocessors PRAM can be simulated on a RAM in at most p 
 steps  
 XRAM yet another model
Everybody can state the simple but primordial requirement about what must satisfy a computer
model it must be simple and must reect the behavior of real computers Of course quite a few


models satisfy both requirements To conclude this paper  let us analyze the XRAM model in terms
of these two conditions
The XRAM model clearly satises the rst condition It is just a formal way to express the fact
that we work on a machine that has some particular connection properties The huge amount of
results obtained in this framework see for instance 
 proves the fruitfulness of such a model
Concerning the second requirement  the approach followed by theoreticians proving theorems
in the XRAM framework is justied by the fact that real parallel computers are indeed not fully
connected  and that many machines were built with topologies as hypercube  meshes  trees  etc
Now  we can point out many defaults of the XRAM model For instance
  Computation times and communication times are not distinguished in the XRAM model
whereas the elementary computation time and the elementary communication time often
dier by many order of magnitude in real parallel computers 

  Computation steps and communication steps are linked in such a way that the e"cient and
practical method that consists to overlap communications with computations  cannot be
easily expressed in the XRAM model
  The several models of communication costs that apply to parallel machines ! are di"cult
to handle with the XRAM model because they make distinction between startup times 
commutation times  propagation time  etc
  The routing mode related to the XRAM model is packetswitching whereas circuitswitching
or wormhole routing are often preferred on the last generation of parallel computers 


  The computation grain of the XRAM model is ne whereas new parallel computers are often
composed of few very powerful processors 

Does it mean that the XRAM model is useless Of course not First we can argue against some
of the previously listed defaults For instance  the ne grained approach is quite e"cient because
it is often the good manner to derive a parallel algorithm a ne grained algorithm can easily
be transformed in a coarse grained algorithm 
 Other defaults can be treated in modifying the
model by including a router attached to each processor Doing this requires to consider many factors
as elementary communication time  communication constraints  routing mode  etc Moreover  even
if such a model will closely approach the behavior of real machines  it will turn to be so complicated
that quite a few powerful theoretical results will be possible to derive on it So is it a vicious circle
The answer is no because one should not oppose simplicity and practicability Theoreticians
must know about some part of applied computer science so that their models and results can be used
for practical applications Ingineers must know about some of the main theoretical results derived
under abstract models so that they can adapt these results and apply them to real programming
environments From that point of view  we claim that the XRAM model is denitively a good
model
Now  it is true that  in order to derive e"cient algorithms and to compute lower bounds
on the complexities of problems  all the formal environment provided by the input and output
sets  and by the input and output functions could be relaxed  and a model like the one dened
in 
Chapter 

 could be certainly prefered



References

 S Akl  M Cosnard  and A Ferreira Datamovementintensive problems two folk theorems
in parallel computation revisited Theoretical Computer Science  #!  

 E Anderson  A Benzoni  J Dongarra  S Moulton  S Ostrouchov  B Tourancheau  and
R Van de geijn LAPACK for distributed memory architecture In Fifth SIAM Conference
on Parallel Processing for Scientic Computing  USA  


 M Cosnard and A Ferreira Designing parallel non numerical algorithms In Joubert Evans
and Liddell  editors  Parallel Computing  pages #
 Elsevier Science  


 M Cosnard  M Loi  and B Tourancheau A framework for data migrations on the hypercube
In NATO Advanced Research Workshop  Software for Parallel Computation Cetraro  

 R Cypher and G Plaxton Deterministic sorting in nearly logarithmic time on the hypercube
and related computers In Twenty second annual ACM Symposium on Theory of Computing 
pages 
#  

 F Desprez  JJ Dongarra  and B Tourancheau Performance Complexity of LU Factorization
with E"cient Pipelining and Overlap on a Multiprocessor Parallel Processing Letters  II 


! P Fraigniaud and E Lazard Methods and Problems of Communication in Usual Networks
Discrete Applied Mathematics  !#
  

 A Gibbons and W Rytter Ecient parallel algorithms Cambridge University Press  

 J Hopcroft and J Ullman Introduction to automata  languages and computation Addison
Wesley  
!

 T Leighton Introduction to Parallel Algorithms and Architectures Arrays  Trees  Hypercubes
Morgan Kaufmann  



 LM Ni and PK McKinley A survey of wormhole routing techniques in direct networks
Computers  #!  feb 


 Lo$%c Prylli and Bernard Tourancheau E"cient block cyclic data redistribution Research Re
port RR !  INRIALIP  Laboratoire de l	Informatique du Parallelisme  ENSLyon  France 



 Jean De Rumeur Communication dans les reseaux de processeurs Collection Etudes et
Recherches en Informatique Masson  
 English version to appear


