An efficient parallel algorithm for the all pairs shortest path problem using
processor arrays with reconfigurable bus systems by Wankar, Rajeev et al.
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  1
An Efficient Parallel Algorithm for the All Pairs Shortest












 * School of Computer Science, DAVV Indore, India
Abstract
The all pairs shortest path problem is a class of the algebraic path problem. Many parallel algorithms for the solution
of this problem appear in the literature. One of the efficient parallel algorithms on W-RAM model is given by Kuc-
era[17]. Though efficient, algorithms written for the W-RAM model of parallel computation are too idealistic to be
implemented on the current hardware. In this report we present an efficient parallel algorithm for the solution of this
problem using a relatively new model of parallel computing, Processor Arrays with Reconfigurable Bus Systems. The
parallel time complexity of this algorithm is O(log2 n) and processors complexity is .
Keywords: Splitting, SPP, PARBS.
1. This work is supported by the German Academic Exchange Services (DAAD) under the “Sandwich




 2 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
Introduction:
The all pairs shortest path problem is a class of the algebraic path problem. One of the sequential
methods for solving this problem is based on dynamic programming [13]. In the past, a lot of
research has been carried out to solve this and related problems[4,9,10]. One of the methods for
the solution of the all pairs SPP has been given by Kucera[17]. The algorithm presented by Kuc-
era takes O(log2 n) time on P-RAM model and O(log2 n) time on the W-RAM model of computa-
tions. Another method to solve the same problem has been given by G.H. Chen et al.[9]. Their
method is based on the Processor Arrays with Reconfigurable Bus System (PARBS) model of par-
allel computations. They used iterative procedure, requiring log2 n iterations and each consisting
of one matrix addition and two multiplications. The matrix multiplication can be done in constant
time if the “+” operator in the innermost loop is MAX(MIN). This method solves the problem in
O(log n) time using processors. In this paper we have used the standard approach of
dynamic programming to solve the problem. The report consists of three sections: Section 1 gives
the problem definition and states the standard algorithm given by Kucera. In section 2 we explain
the reconfigurable architecture, its variants and present some basic results. In Section 3 we present
the PARBS implementation of the problem.
Section 1:
In this section we present the definition of the problem, the standard algorithm by Kucera.
Definition: Let be a directed graph with n vertices. let M be a cost adjacency matrix
for G such that M(i, i) = 0, . M(i, j) is the length or cost of edge if
and M(i, j) = if and . The all pairs shortest path problem is to determine a
matrix A such that A(i, j) is the length of the shortest path from i to j.
The following algorithm given by Kucera[17], finds the shortest path between every pair of verti-
ces of a weighted graph. The algorithm takes the edge weighted matrix M of a directed graph G
and outputs a matrix A defined as follows:
A(i, j) = 0,
A(i, j)= min{ },
Where the minimum is taken over all possible sequences for which and
. Obviously, A(i, j) is the length of the shortest path from i to j. The iterative algorithm
works in stages. In each iteration the search for the possible shortest path between each pair of
vertices is extended to consider all paths utilizing up to twice the number of currently considered
edges. Thus a logarithmic number of iterations are sufficient. In the W-RAM model of computa-




G V E,( )=
1 i n≤ ≤ i j,〈 〉 i j,〈 〉 E G( )∈
∞ i j≠ i j,〈 〉 E G( )∉
i j=
M i0 i1,( ) M i1 i2,( ) … M ik 1– ik,( )+ + + i j≠
i0 i1 i2 … ik, , , , i i0=
j ik=
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  3
Algorithm:
1. for all i, j in parallel do
2. repeat log2 n times
begin
for all i, j, k in parallel do
for all i, j in parallel do
end
3. for all i, j in parallel do
 if  then else
Section 2: Computational Models
The basic idea of processor arrays with reconfigurable architecture was given by Miller R., V. K.
Prasanna Kumar [18]. They described it as: “Meshes with reconfigurable bus consists of a VLSI
array of processors overlaid with a reconfigurable bus systems”.
In the mid 90’s PARBS has drawn a lot of attention in the scientific community for its high per-
formance computing with general purpose processors. Various models of PARBS appeared in the
literature[4]. Common models are Bus automaton, Polymorphic torus network[14], Reconfigura-
ble Meshes[18,19,15], Bypass capabilities etc.
An PARBS consist of an array of processors that is con-
nected to a r-dimensional grid shaped reconfigurable bus system. The processing elements are
connected to a bus through a fixed number of I/O ports. The ability to change the configuration of
the bus system dynamically by adjusting local or global switches makes this architecture interest-
ing to obtain various computational configurations like row, zig-zag, staircase & diagonal at run
time.
A two dimensional processors array with a reconfigurable bus system of size N2, consisting of
identical processors, connected to a rectangular mesh system, is called reconfigurable
mesh. In figure 1, we see processing elements connected to a grid of buses and a PE with its four
I/O ports and connection patterns.
m i j,( ) M i j,( )←
q i j k, ,( ) m i j,( ) m j k,( )+←
m i j,( ) min m i j,( ) q i 1 j, ,( ), q i 2 j, ,( ) … q i n j, ,( ), ,{ , }←
i j≠ A i j,( ) m i j,( )← A i j,( ) 0←
N1 N2× …× Nr× N1 N2× …× Nr×
N N×
 4 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
The basic computational unit of the reconfigurable mesh is the Processing Element (PE) which
consists of switches, small storage and an ALU. A PE is capable of performing the following
operations in one unit of time
1. Setting up a connection pattern
2. Read from  or write onto a sub bus or memory
3. Performing logical and arithmetic operations
4. Disconnecting itself from the bus.
Reconfigurable bus models are characterized by the following parameters: [7]
• Width: It refers to the data width of the PE. There are two models which differ in the length
of the operand of the PE.
1. Bit model
2. Word model
• Delay: It is the time needed to propagate a signal on the buses. The two models of PE’s are
1. Unit delay model: “No matter how far signal has travelled”
2. Logarithmic delay model: “O(log2N) time is needed”
• Bus Access: Each PE is connected to the bus through its port and will either read or write to
it. There are two common models:
1. ER Model (Similar to CREW PRAM)











Figure 1. Reconfigurable Mesh of size 4x4
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  5
• Connection pattern: Each PE can set the connection between its four ports based on local
data or global instruction. There are 15 different connection patterns possible. Models differ in
the number of connection patterns (a subset of 15) which they allow.
Various Models:
Based on these classifications, various models of reconfigurable bus system appear in the litera-
ture. Most of these models are synchronous in nature and permit unconditional global switch set-
ting in addition to the local switch setting. Unconditional switch setting is performed by
broadcasting a global instruction from a central controller.
These models differ in the way they are allowed to make internal connections, a few to note are:




• PARBS: The most general and most powerful model is PARBS. No restriction is placed on
allowed connections. All 15 patterns of internal connections at each node (notation {xy} to
mean that port x and y are connected to each other) are possible: [20]
•no Connections - {  }
•two-port connections -  {NS}, {EW}, {NW}, {NE}, {SW}, {SE}
•three-port connections - {EWS}, {EWN}, {SNE}, {SNW}
•four-port connections - {EWSN}
•two-pair connections - {EW, SN}, {EN, WS}, {ES, WN}
 6 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
• RMESH: It is a two dimensional mesh where the PEs are located on the intersection of the












N N N N
W W W WE E EE
S S S S
N N N N
W W W WE E E E
S S S S
{EWSN} {E,W,SN} {EW,S,N} {E,W,S,N}
{EWN,S} {EWS,N} {E,WSN} {ESN,W}


























N N N N
W W W WE E EE
S S S S
N N N N
W W W WE E E E
S S S S
{EWSN} {E,W,SN} {EW,S,N} {E,W,S,N}
{EWN,S} {EWS,N} {E,WSN} {ESN,W}
{W,S,NE} {N,W,SE} {E,WS,N} {E,S,NW}
Figure 3. Connection patterns in RMESH
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  7
• RN: The Reconfigurable Network is a general model in which PEs may not lie at the grid
point and a bus segment may join an arbitrary pair of PEs. [15]
“In RN model, each I/O port of a PE is connected to at most one other port”
• Polymorphic Torus: It is identical to the PARBS except that the rows and columns of the
underlying mesh wrap around.[14]
N N N N N
W W W W WE E E E
S S S S S
N N N N N
W W W W WE E E E E
S S S S S
{W,E,NS} {NW,SE} {NW,E,S} {N,W,SE}
E
{WS,N,E}
{W,S,NE} {WE,N,S} {WS,NE} {NS,EW} {N,S,E,W}
Figure 4. Connection pattern in RN
Figure 5. The 5 x 5 Torus
 8 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
Higher dimensional PEs can be formed in the same way. Figure 6 (b) shows a PE in 3-dimension
with six ports labelling (U)pper, (L)ower, (F)ront, (B)ack, (R)ight and (L)eft.
We assume that constant time is needed to broadcast values through the established buses irre-
spective of their distance from the first processor. Although it is a theoretical assumption and
somewhat unrealistic on the current architecture, with the advancement in the fibre optics commu-
nication technology, this architecture is expected to gain wide popularity.
In designing an algorithm for the all pairs SPP, we use the unit delay model of 3-D PARBS. First
we present the basic algorithms on PARBS and later use some of them in designing the final algo-
rithm.
Before presenting lemmas we present a concept of splitting.
Splitting a bus is a technique which shows how the processors can exploit the ability to locally
control the effective size of the subbuses.[19]
Lemma 1: Logical OR or AND of N bits can be obtained in constant time on a linear PARBS of
size N. [19].
Lemma 2: Given a reconfigurable mesh of size N, in which no more than one processor in each
column stores a data value, maximum(minimum) of these ( )values can be determined in
 time using the unit delay model and in O(log2n) time using the log-time delay model. [19].
Algorithm:
We assume that the n values, of which the minimum is to be obtained, are placed at the row 0 of
the 2-D mesh.
Step 1: A column broadcast is used so that every PEi,j contains entry xj.
x11 x12 x13 x14
x21 x22 x23 x24
x31 x32 x33 x34
x41 x42 x43 x44










Figure 6(b). A PE in 3-D.
N1 2⁄
Θ 1( )
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  9
Step 2: Within each row i, processor PEi,i uses a row broadcast to inform all processors PEi,j  the
value of xi, .
Step 3: Every processor computes the boolean result of “xj > xi”.
Step 4: In every column j, the logical OR of these values (In constant time) can be obtained to
decide whether or not xj is the minimum.
There may be more than one column having “0”, bus splitting on a row can be used to inform
PE0,0 the minimum value.
0 i j, n 1–≤ ≤



























Figure 7(a) and 7(b)
 10 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
Lemma 3: Searching of a given number in a list of n numbers can be done in constant time on a
linear PARBS of size n.
6 5 3 2 8
6 6 6 6 6
x j =
6 5 3 2 8
5 5 5 5 5
x j =
6 5 3 2 8
3 3 3 3 3
x j =
6 5 3 2 8
2 2 2 2 2
x j =
6 5 3 2 8





























Figure 8(a) and 8(b)
An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  11
Proof:
Assume that n numbers xi, are stored at n processors. The value “p” to be searched is
available at PE(1). The steps involved are:
Step 1: Connect port E to W and broadcast  “p” on the established bus, perform the operation
 “if p = x(i)”, which results in either 0 or 1.
Step 2: Each processor PE(i) that has “1” as its data value splits its bus by setting its eastern
switch to disconnect its row bus.
Step 3: Each processor that has a “1” as its data value broadcasts the “j” on its sub bus. Processor
PE(1) receives the western most “j” as search position of “p” in n values.





1 i n≤ ≤
6 4 3 8 9 5 2
0 0 0 1 0 0 0
0 0 0 1 0 0 0
0 0 0 1 0 1 0
j
Figure 9(a-d)
 12 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
Section 3:
PARBS algorithm for the all pairs SPP problem:
We assume that the values of m(i,j) are stored at the P((i-1)n+1,j,1), , of
 PARBS.
repeat step 1-7, (log2 n) times
Step 1: Establish sub buses in k-direction. Processors P((i-1)n+1,j,1) broadcast values of m(i,j) on
the buses and then break up the sub buses.
Step 2: Establish sub buses in j-direction (connect port E to W) and broadcast values of m(i,j)
from processors P((i-1)n+1,j,j) to P((i-1)n+1,i,j), , .
Step 3: Disconnect sub buses and establish sub buses in i-direction (Connect port U to D). Broad-
cast values of m’s, received in the step 2, from processors P((i-1)n+1,i, j), ,
on the established sub buses which is received by processors P((k-1)n+1,i,j),
, , . To end this, values of m(i,j) and m(j,k) are available at
P((i-1)n+1, j, k), , , .
Step 4: Find the sum of m(i,j) and m(j,k) at every processor in constant time.
Step 5: Establish sub buses in j-direction (connect port E to W), find minimum of these n values
row wise, in constant time. Store the minimum at processor P((i-1)n+1,1,j) in some vari-
able “x”, , .
Step 6: Breakup sub buses and establish sub buses in j-direction, broadcast values of “x” from
P((i-1)n+1,1,j) to P((i-1)n+1,j,j), , .
Step 7: Disconnect established sub buses and connect sub buses in k-direction, broadcast value of
“x” from P((i-1)n+1,j,j) to P((i-1)n+1,j,1), , and perform operation
m(i,j) = minimum{m(i,j), x} at respective processors.
Lemma 4: The shortest path between each pair of vertices of a directed graph, stored in a 3-D
PARBS can be obtained in O(log2 n) time using  processors.
Proof: The proof is trivial as it can be easily seen that the steps (1-7) takes constant time and the
loop is repeated (log2 n) times.




1 i n≤ ≤ 1 j n≤ ≤
1 i n≤ ≤
1 j n≤ ≤
1 i n≤ ≤ 1 j n≤ ≤ 1 k n≤ ≤
1 i n≤ ≤ 1 j n≤ ≤ 1 k n≤ ≤
1 i n≤ ≤ 1 j n≤ ≤
1 i n≤ ≤ 1 j n≤ ≤




An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  13











0         4       11
6         0        2
3         inf      0
Figure 10(a). Graph G Figure 10(b). Cost matrix M





















































































































i = 1 i = 4 i = 7
i = 1 i = 4 i = 7

































0,0 4,4 5,6 0,0   2,2 3,3 7,inf  0,06,11
0 4   6 5 0   2 3 7  0
i = 1 i = 4 i = 7
i = 1 i = 4 i = 7
i = 1 i = 4 i = 7













 16 An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems
Conclusion:
In this technical report we have presented two algorithms on the PARBS namely an optimal con-
stant time searching algorithm and an efficient parallel algorithm for the solution of the all pairs
shortest path problem. The SPP algorithm is considered as an efficient algorithm since it belongs
to Nick’s class [11]. This algorithm has parallel time complexity of O(log2n) and processors com-
plexity is . The algorithm presented in this report has the same time and processor com-
plexity which has been reported in [9]. The work can further be extended to get the solution of this
problem in constant time or an optimal parallel algorithm by reducing the number of processors.
References:
[1] Aggarwal A., “Optimal Bounds for Finding Maximum on Array of Processors with k Global
Buses”, IEEE Transaction on Computers, pp. 62-64, Vol. c-35, No. 1, January 1986.
[2] Aho A., J. Hopcroft and J. Ullman, The Design and Analysis of Computer Algorithms,
Addison-Wesley, 1974.
[3] Bauer B. E., Practical Parallel Programming, Academic Press Inc., 1992.
[4] Biing-Feng Wang and Gen-Huey Chen, “Constant Time Algorithms for the Transitive Closure
and Some Related Problems on Processor Arrays with Reconfigurable Bus Systems”, IEEE
Transaction on Parallel and Distributed Systems, pp. 500-507, Vol. 1, No. 4, October 1990.
[5] Biing-Feng Wang, Gen-Huey Chen and Ferng-Ching Lin, “Constant Time Sorting on a Proc-
essor Array with Reconfigurable Bus System”, Information Processing Letters, pp. 187-
192, 34(1990).
[6] Bokhari S. H., “Finding Maximum on an Array Processor with a Global Bus”, IEEE Trans-
action on Computers, pp. 133-139, Vol. c-33, No. 2, February 1984.
[7] Bondalpati K. and V.K. Prasanna Kumar, “Reconfigurable Meshes: Theory and practice”,
Reconfigurable Architectures Workshop, International Parallel Processing Symposium,
April 1997.
[8] Fragopoulou. P., “On the Efficient Summation of N Numbers on an N-Processor Reconfigura-
ble Mesh”, Parallel Processing Letters, Vol. 3 No.1 (1993) 71-78.
[9] G-H Chen, B-F Wang and C.J. Lu, “On the parallel computation of the algebraic path prob-
lem”, IEEE Transaction on Parallel and Distributed Systems, pp. 251-256, Vol. 3, No. 2,
March 92.
[10] G-H Chen, S. Olariu et al., “Constant Time Tree Algorithms on Reconfigurable Meshes on




An Efficient Parallel Algorithm for the All  Pairs Shortest Path Problem  using Processor Arrays with Reconfigurable Bus Systems  17
[11] Gibbons A. And Wojciech Rytter, Efficient Parallel Algorithms, Cambridge University
Press, Cambridge, May 1988.
[12] Golub G.H. and James Ortega, Scientific Computing: An Introduction with Parallel
Computing, Academic Press Inc., 1993.
[13] Horowitz E. And S.Sahni, Fundamentals of Computer Algorithms, Addison-Wesley,
Computer Science Press, NY, 1978.
[14] Hungwen Li and Massimo Maresca, “Polymorphic Torus Network”, IEEE Transaction on
Computers, pp. 1345-1351, Vol. 38, No. 9, September 1989.
[15] Ju-Wook J. and V.K. Prasanna, “An Optimal Sorting Algorithm on Reconfigurable Mesh”,
Journal of Parallel and Distributed Computing, pp. 31-41, 25 (1995).
[16] K. Li, Y. Pan and S.Q. Zheng, “Fast and Efficient Parallel Matrix Manipulation on a Linear
Array with a Reconfigurable Pipelined Bus System”, High Performance Computing Sys-
tems and Applications, J.Schaeffer and R.Unrau eds., Kluwer Academic Press, 1998.
[17] Kucera L., “Parallel Computation and Conflict in Memory Access”, Information Processing
Letters Vol. 14, number 2, April 1982.
[18] Miller R., V.K. Prasanna Kumar, Dionisios Reisis and Quentin F. Stout, “Meshes with
Reconfigurable Buses”, Proc. 15th MIT Conference on Advance Research in VLSI, pp.
163-178, March 1988.
[19] Miller R., V.K. Prasanna Kumar, Dionisios Reisis and Quentin F. Stout, “Parallel Computa-
tions on Reconfigurable Meshes”, IEEE Transactions on Computers, Vol. 42, No. 6, June
1993.
[20] Ten-Hwang Lai, M.J.Sheng, “Triangulation on Reconfigurable Meshes: A Natural Decompo-
sition Approach”,  Journal of Parallel and Distributed Computing, 38-51, 30 (1995).
[21] Vipin Kumar et al., Introduction to Parallel Computing, Design and Analysis of Algo-
rithms, The Benjamin Cummings Publishing Company Inc., California, 1994.
[22] Wankar R., “Parallel Algorithms for Solving Symmetric Positive Definite System (SPD) of
Equations”, International Journal of Management and Systems, pp. 311-324, Vol. 11,
No. 3, Sept.-Dec 95.
[23] Wankar R., E. Fehr and N. S. Chaudhari, “A Fast Parallel Algorithm for Special Linear Sys-
tems of Equations using Processor Arrays with Reconfigurable Bus Systems”, Technical
Report B-2-99, January 29, 1999, Freie Universität Berlin, Germany.
