An Improved Parallel Prefix Computation on 2D-Mesh Network  by Jha, Sudhanshu Kumar
 Procedia Technology  10 ( 2013 )  919 – 926 
2212-0173 © 2013 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
Selection and peer-review under responsibility of the University of Kalyani, Department of Computer Science & Engineering
doi: 10.1016/j.protcy.2013.12.438 
ScienceDirect
International Conference on Computational Intelligence: Modeling Techniques and Applications 
(CIMTA) 2013 
An Improved Parallel Prefix Computation on 2D-Mesh Network 
Sudhanshu Kumar Jha* 
Department of Computer Applications, 
National Institute of Technology, Jamshedpur – 831 014 (INDIA 
Abstract 
Parallel prefix is an important technique that has been widely accepted in many area of scientific and engineering research. In this 
paper we propose an improved parallel prefix computation algorithm on n × n mesh network that requires 2n + 5 times. Our 
proposed algorithm can be compare with the traditional parallel prefix algorithm [1-4, 9] that requires 3n + 2 time on same 
architecture.   
© 2013 The Authors. Published by Elsevier Ltd. 
Selection and peer-review under responsibility of the University of Kalyani, Department of Computer Science & Engineering. 
Keywords: prefix computation, parallel prefix computation, modified prefix, 2D-mesh  network 
1. Main text  
Parallel prefix computation is a basic tool that have been researched extensively for its wide application in various 
fields such as data-parallel programming, knapsack problem [5], sorting, parsing, combinator reduction, region 
labeling [6], medial axis transformation [7] etc. For a given sequence of data items say x0, x1, …, xk-1 and a binary 
associative operator say , the process of computing the value ,210i ixxxxp  L  i, ki 0  is called 
prefix computation.  
A rich material is available on parallel prefix in the literature. Cole and Vishkin [8] developed an algorithm 
for prefix computation on a CRCW-PRAM model that requires )loglog/(log nnO  time using nnn log/)loglog(
* Corresponding author. Tel.: +91-9470311972; fax: +91-657-2373246. 
E-mail address:sudhanshukumarjha@gmail.com 
Available online at www.sciencedirect.com
 13 The Authors. Published by Elsevier Ltd. Open access under CC BY-NC-ND license.
l tion and p er-review unde  responsibility of the University of K lyani, Department of Computer S ience & Engi eering
920   Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
processors. Egecioglu and Srinivasan [9] presented an algorithm on nn  mesh network that requires 
)(log2 nOn  time, where τ is the time for a single routing step. Akl [2] shown that a specialized network with 
)log( nnO  processors requires )(lognO time. Lin and Lin [10] presented a parallel prefix computation on a fully 
connected message passing system having n processors require maximum   1log44.1 n communication steps. 
Meijer and Akl [11] presented an optimal-cost algorithm on a binary tree with p processors in )log)/(( ppnO  time 
for a cost of )log( ppnO  . Ranka and Sahni [12] had shown that for 2k window size hypercube requires )(kO time 
to compute the prefix sum. Li, Peng and Chu [13] presented an algorithm for parallel prefix computation for a dual-
cube network Dn with 122 n nodes and n links per node, the communication and computation times of the algorithm 
are at most 2n + 1 and   4n − 2, respectively. Wang and Sahni [14] developed two parallel algorithms for n-point 
prefix computation on an n processor OTIS-Mesh network. Their first algorithm was shown to run in )1(8 n  
electronic moves and 2 OTIS moves for both the SIMD and MIMD models further they had improved it and shown 
it in their second algorithm that require )1(7 n electronic moves and 2 OTIS moves. Jana and Sinha [15] also 
reported an algorithm which is an improvement over the work of Wang and Sahni [14] and shown to run in 5.5n1/4 + 
3 electronic moves and 2 OTIS moves on OTIS-Mesh. The parallel algorithm proposed by Jana et. al.  [16] on an 
extended multi-mesh shown to require 13n1/4 – 5 communication and 4logn1/4 +4 arithmetic steps. A parallel 
algorithm on optical multi-trees (OMULT) is presented in [17] that have been shown to run in O(log n) electronic 
moves + 5 optical moves using 2n3 - n2 processors having n2 data points. Later on, this have been improved in [18] 
and shown to be run in O(log n) electronic moves + 4 optical moves on a same network i.e., OMULT using n3 data 
points. Lucas [19] presented a parallel algorithm for prefix computation on SIMD model of OTIS k-ary 3-cube the 
prefix computation for n (=k6) data elements requires O(k) electronic moves and O(1) OTIS move on k6 processors. 
Mallick and Jana shown a parallel prefix computation on n2 processors Mesh of Trees and n4 processors OTIS-Mesh 
of Trees [20] that requires 4 log n + O(1) time and 13 log n + O(1) electronic moves and 2 OTIS moves 
respectively.  
Mesh architecture is a most popular interconnection network topology for massively parallel processing 
systems. The benefit of mesh network includes their simplicity, regularity and good scalability. A number of large 
research and commercial multicomputer systems have been built based on mesh architectures that include Illiac IV, 
Tera Computer System [21], Intel Paragon SP/S and Paragon XP/E [22], Cray system [23], Blue Gene 
Supercomputer [24] and InfiniBand architecture [25]. Mesh topology have been used to solve many engineering and 
scientific applications include sorting, matrix multiplication and inversion, Fourier transformation, convolution, 
signal and image processing, speech recognition, finite element analysis, polynomial interpolation and many more. 
It is clear that because of the short, local connection among the processors, the area consumed by the wires in mesh 
network is negligible and also realistic and fair to equate the complexity or implementation cost of a 2D mesh 
network. The signal propagation delay among the adjacent processor is quite small making a communication at very 
high speed. A 2D mesh can be laid out on a VLSI chip in an area that increase the linearly with the number of 
processors [4].   
 
In this paper we propose an improved parallel algorithm for prefix computation on 2D mesh network. The 
proposed parallel prefix algorithm is shown to be run in 2n + 5 times which is comparable with the algorithm 
presented in [1-5, 9]. 
The rest of the paper is organized as follows. Section 2 describes the topological structure of the mesh 
architecture. In section 3 we present the proposed parallel prefix algorithm on two dimensional mesh network 
followed by conclusion in section 4. 
2. Topological Structure of Mesh Network 
An n × n mesh network is a simple and popular topology. It consists of n2 processors arranged into form a          
2-dimensional lattice. An n × n mesh has diameter 2n - 2 and bisection width n or n + 1. The processors in          
2-dimensions mesh can be indexed into number of various ways. The most convenient and acceptable way is to 
identify the processor by its particular row and column using origin (1, 1), thus the processor (1, 1) is reside at the 
921 Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
upper left corner. Therefore, the processor (1, n) is at the upper right corner, the processor (n, 1) is at lower left 
corner and the processor (n, n) at the lower right corner of an n2-processors square 2-dimensional mesh. If each node 
is addressed by a pair of indices say (x, y) where 1 ≤ x, y ≤ n, then two processors say (x, y) and (x', y') are connected 
if and only if x' = x ± 1 and y' = y or y' = y ± 1 and x’ = x. As an example 7 × 7 2D mesh is shown in Fig. 1 in which 
the processors and links are represented by small circles and solid lines respectively. 
 
 
 
 
 
 
 
 
 
 
We assume here that all links are bi-directional therefore the data movements can be done in both directions and 
boldfaces adjacent to some processors represent the indices for particular processor. 
3. Proposed Parallel Prefix Computation 
In literature, traditional parallel prefix computation [1-5] requires three basic steps 1) perform row-wise parallel 
prefix computation in parallel 2) perform a modified parallel prefix computation in the rightmost column 3) 
broadcast the computed modified prefix value from the rightmost column processor to all of the processors in the 
respective rows and combine with the initially computed row prefix value. A modified prefix [3] or diminished 
prefix [4] is an important technique for parallel prefix computation in parallel architectures such as mesh [3], mesh 
of trees [20], OTIS mesh [14], OTIS mesh of trees [20], multi mesh [26], multi mesh of trees [27] etc.  
 
Our basic observation is as follows: during the modified prefix at the rightmost column processors all other 
processors remains idle which is the wastage of precious computation power of the processing elements. In an 
interconnection network where all processors are independent to each other to process their task this can be done at 
the midlist column processor instead of the rightmost column processor. While the midlist column processors are 
engaged to perform prefix computation, simultaneously, some of the processors can also perform a prefix 
computation in same way. By this, we can reduce the overall time complexity of the parallel prefix computation in 
2D mesh network.  
 
We can see that, in modified prefix the computation of prefixes shift down by one processor, which 
requires n -1 communication/ combining steps. Therefore, we cannot perform parallel prefix computation and 
modified prefix computation at a same time in SIMD architecture.  This can also be improving in slightly different 
way. Instead of performing of modified prefix, we perform 1) a parallel prefix computation 2) now shift down the 
computed prefix value by one processor that will require n steps.  
 
We assume here all processors have two register in which one should hold the temporarily computed value. 
The proposed algorithm is based on the SIMD architecture where all active processors perform a same task. 
 
(7, 1) (7, 7) 
(1, 1) (1, 7) 
 y 
 x 
Fig. 1 Mesh network consists of 7×7 processors 
922   Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
Initialization: We assume here n2 data elements 2,,, 321 nxxxx K are stored in A register of mesh network in row 
major order as shown in Fig. 2 for n = 5. 
 
Note: In each figure the square represents the processing elements (PE). The upper halves of the square denote the 
contents at A register and lower half of the square represents the contents at register B. The ‘-’ denotes the don’t care 
value. 
 
Algorithm parallel_prefix_Mesh() 
 
Step1: For all processor p(i, j) njiji  ,1,,  do step 1.1 and 1.2 in parallel 
 
1.1 Row i ,  21, nii   computes the prefixes of its  2n  elements and store it at A register of p(i, j), 
 21,,1, njjnii  . After this step the register A of processor ),( jip , ,1,,1, 2njjnii  holds 
the value 
 

2
1
,
n
q
qix as shown in Fig. 3. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1.2 In each row processor p(i,j),  nii  1, ,   njj n  1, 2  compute the prefixes of its   12 n  elements 
starting from (i, n) nii  1,  therefore in this the step the direction of prefixes in each row is 
  )1,()2,()1,(),( 2  nipnipnipnip K  and store the result at B register. After this step the register 
B of processor ),( jip nii  1, ,   ,1, 2 njj n  holds the prefixes value   
1
,
2
n
nq
qix as shown in Fig. 3. 
Step 2: For all processor   niiip n  1,),( 2 do in parallel 
 
The processor   niiip n  1,),( 2  received the data sent from   niiip n  1,),1,( 2  and store into B 
register. The situation is shown in fig. 4. 
 
 
 
Fig. 2 Initialization of 25 data elements on 5 × 5 mesh network. 
  x1   x2   x3   x4   x5 
  x6   x7   x8   x9   x10 
  x11   x12   x13   x14   x15 
  x16   x17   x18   x19   x20 
  x21   x22   x23   x24   x25 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
Fig. 3 Situation after step 1 
  x5-4 
  x10-9 
x15-14 
x20-19 
x25-24 
   x5 
  x10 
x15 
x20 
 x25 
  x1 
  x6 
  x11 
  x16 
  x21 
  x1-2 
  x6-7 
x11-12 
x16-17 
x21-22 
  x1-3 
  x6-8 
x11-13 
x16-18 
x21-23 
x5 
  x10 
x15 
 x20 
x25 
- 
- 
  - 
  - 
  - 
 - 
 - 
 - 
 - 
- 
 - 
 - 
 - 
 - 
 - 
    
  x4 
  x9 
  x14 
  x19 
  x24 
923 Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
Step 3: For all processor   niip n  1),( 2 do in parallel 
 
Perform the prefix computation ‘’ on the contents of A and B registers and save it into B register. The 
result after this step is shown in fig. 5. 
          
B register data            A register value  B register value 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Step 4: For all processor   niiip n  1,),1,( 2 do in parallel 
 
In each row i, i, 1≤ i ≤ n, the processor   niiip n  1,),1,( 2 receive the A register data sent from
  niiip n  1,),,( 2 . Compute the prefixes on received data with own A register data and store in A 
register as shown in fig. 6. 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig. 4 Situation after step 2 
   -
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
    x1   x1-2 
    x6   x6-7 
  x11 x11-12 
  x16   x16-17 
  x21  x21-22 
  x1-3   x4   x5 
  x6-8   x9   x10 
x11-13   x14   x15 
x16-18   x19   x20 
x21-23   x24   x25 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
  x5 
  x10 
  x15 
 x20 
  x25 
Fig. 5 Situation after step 3 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
    x1   x1-2 
    x6   x6-7 
   x11 x11-12 
   x16 x16-17 
   x21 x21-22 
  x1-3   x4   x5 
  x6-8   x9   x10 
x11-13   x14   x15 
x16-18   x19   x20 
x21-23   x24   x25 
  x1-5 
  x6-10 
 x11-15 
 x16-20 
 x21-25 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
  x5 
  x10 
  x15 
 x20 
  x25 
Fig. 6 Situation after step 4 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
x1   x1-2 
x6   x6-7 
x11 x11-12 
  x16 x16-17 
  x21 x21-22 
  x1-3   x1-4   x5 
  x6-8   x6-9   x10 
x11-13 x11-14   x15 
x16-18 x16-20   x20 
x21-23 x21-24   x25 
  x1-5 
  x6-10 
 x11-15 
 x16-20 
 x21-25 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
  x5 
  x10 
  x15 
 x20 
  x25 
Fig. 7 Situation after step 5 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
x1   x1-2 
x6   x6-7 
x11 x11-12 
x16 x16-17 
x21 x21-22 
  x1-3   x1-4 x1-5 
  x6-8   x6-9 x6-10 
x11-13 x11-14 x11-15 
x16-18 x16-20 x16-20 
x21-23 x21-24 x21-25 
  x1-5 
  x1-10 
 x1-15 
 x1-20 
 x1-25 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
 x5 
x10 
  x15 
 x20 
  x25 
924   Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
 
Step 5: For all processor   njnijip n  2,1),( step 5.1 and 5.2 do in parallel 
 
5.1  For all processor nijip  1),( ,   njn  12  do in parallel 
In each row i, computes the prefixes of its remaining   12 n elements. At the end the A register of 
processor ,,1,),( njijip   holds 

j
q
qiji xy
1
,, prefixes value. 
 
5.2  Only column  2n processor computes the prefixes of sums on B register data computed in step 3. Thus at 
the end the B register of processor   niip n  1),( 2  has  
i
q
qiji xy
1
,, .  
         
The result after the step 5.1 and 5.2 is shown in fig. 7. 
 
Step 6: for all processors   niiip n  1,),,( 2 do in parallel 
 
The computations of prefixes shift them down by one processor and store in B register i.e., the processor 
  niip n  1),( 2 send the B register data to processor   niip n  1),1( 2 as shown in fig. 8. 
 
Step 7: For all processor   niip n  1),( 2 do in parallel 
 
Broadcast the contents of B register to all processor in same row. All receiving processor store the value in 
corresponding B register as shown in fig. 9. 
 
Step 8: For all processor njijip  ,1),( do in parallel 
 
Perform the operation  on the contents of A register and B register and store in A register result shown in 
Fig. 10. 
  
A register data            A register value    B register value 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Step 9: Stop 
 
Fig. 8 Situation after step 6 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
x1   x1-2 
x6   x6-7 
x11 x11-12 
x16 x16-17 
x21 x21-22 
  x1-3   x1-4 x1-5 
  x6-8   x6-9 x6-10 
x11-13 x11-14 x11-15 
x16-18 x16-20 x16-20 
x21-23 x21-24 x21-25 
- 
  x1-5 
 x1-10 
 x1-15 
 x1-20 
  x5-4 
  x10-9 
 x15-14 
 x20-19 
 x25-24 
x5 
  x10 
  x15 
 x20 
  x25 
Fig. 9 Situation after step 7 
0 
  x1-5 
x1-10 
x1-15 
x1-20 
0 
  x1-5 
x1-10 
x1-15 
x1-20 
x1   x1-2 
x6   x6-7 
x11 x11-12 
x16 x16-17 
x21 x21-22 
  x1-3   x1-4 x1-5 
  x6-8   x6-9 x6-10 
x11-13 x11-14 x11-15 
x16-18 x16-20 x16-20 
x21-23 x21-24 x21-25 
0 
  x1-5 
 x1-10 
 x1-15 
 x1-20 
0 
  x1-5 
x1-10 
x1-15 
x1-20 
  0 
x1-5 
x1-10 
x1-15 
x1-20 
925 Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
Time Complexity: 
 
Step 1, 7: Perform prefixes of  2n  elements in  2n  time. 
Step 2, 3, 4, 6, 8: Each will take constant time to perform receive, store, prefix operation, shifting of data values etc. 
 
Step 5 will take n time to perform prefixes computation. 
 
Therefore, the proposed algorithm requires 2n + 5 steps to compute the prefix computation of n2 data 
elements on n × n two-dimensional mesh network. 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4. Conclusions 
In this paper we have shown an improved parallel prefix algorithm on 2D mesh network which requires 2n + 5 
times. Our proposed parallel prefix algorithm can be comparable with the parallel prefix algorithm on mesh network 
presented in [1-5, 9].  
References 
1. R. E. Ladner, M. J. Fischer, Parallel Prefix Computation, Journal of The Association for Computing Machinery, Vol. 27, No. 4, October 
1980, pp 831-838. 
2. S. G. Akl, The Design, Analysis of Parallel Algorithms. Englewood Cliffs, NJ: Prentice-Hall, 1989. 
3. E. Horowitz, S. Sahni, S. Rajasekaran, Fundamentals of Computer Algorithms, Galgotia Publications Pvt. Ltd., 1998. 
4. Behrooz Parhami, Introduction to Parallel Processing: Algorithms and Architectures, Springer, 1 edition (January 31, 1999). 
5. A. Grama, G. Karypis, V. Kumar, A. Gupta, Introduction to Parallel Computing, Addison Wesley, 2 edition (January 26, 2003). 
6. P. K. Jana, “Multi-mesh of trees with its parallel algorithms”, Journal of Systems Architecture, 50 (2004) 193–206. 
7. S. Saha, P. K. Jana, A parallel algorithm for medial axis transformation, In proc. of the 2003 International Conference on Parallel and 
distributed processing and applications, Aizu-Wakamatsu, Japan, pp. 356-361. 
8. R. Cole, U. Vishkin, Faster optimal parallel prefix sum and list ranking, Journal of Inform. and Control, 4 (1989) 334–352. 
9. O. Egecioglu, A. Srinivasan, Optimal parallel prefix on mesh architecture, Parallel Algorithms Appl. 1 (1993) 191–209. 
10. Y.C. Lin, C.M. Lin, Efficient parallel prefix algorithms on fully connected message passing computers, in: proc. of 3rd Int. Conf. on High 
Performance Computing (HiPC), Trivandrum, India, December 19–22, 1996. 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
   - 
Fig. 10 Final prefixed value after step 8 
  x1   x1-2 
  x1-6   x1-7 
  x1-11 x1-12 
  x1-16 x1-17 
  x1-21 x1-22 
  x1-3   x1-4 x1-5 
  x1-8   x1-9 x1-10 
x1-13 x1-14 x1-15 
x1-18 x1-20 x1-20 
x1-23 x1-24 x1-25 
926   Sudhanshu Kumar Jha /  Procedia Technology  10 ( 2013 )  919 – 926 
11. H. Meijer and S. G. Akl, Optimal Computation of Prefix Sums on a Binary Tree of Processor, International Journal of Parallel 
Programming, Vol. 16, No. 2, April, 1987. 
12. S. Ranka, S. Sahni, Hypercube algorithms: with applications to image processing and pattern recognition, Springer-Verlag, New York Inc., 
New York, NY, USA Pages: 237, 1990. 
13. Y. Li, S. Peng, W. Chu, Prefix Computation and Sorting in Dual-Cube, In Proc. of the 37th International Conference on Parallel 
Processing, Portland, Oregon, USA, September 8-12, 2008, pp: 389-396. 
14. C. F. Wang and S. Sahni, Basic operations on the OTIS-Mesh optoelectronic computer, IEEE Trans. on Parallel and Distributed Systems 
Vol. 9, No. 12, pp. 1226–1998. December 1998. 
15. P. K. Jana and B. P. Sinha, An Improved parallel prefix algorithm on OTIS-Mesh, Parallel Processing Letters (World Scientific), pp. 429-
440, Vol. 16, No. 4, 2006. 
16. P. K. Jana, B. D. Naidu, S. Kumar, M. Arora, and B. P. Sinha, Parallel prefix computation on extended multi-mesh network, Information 
Processing Letters (Elsevier Science), pp. 295-303, Vol. 84, No. 6, October 2002. 
17. B. P. Sinha and S. Bandyopadhyay, OMULT: An Optical Interconnection System for Parallel Computing, in proc. of International 
Conference on Computers and Devices for Communications. Jan. 1 - 3, 2004, Kolkata, India. 
18. P. K. Jana, Improved Parallel Prefix Computation on Optical Multi-Trees, In Proc. of  IEEE Indicon 2004, IIT Kharagpur, India, 20 - 22 
Dec. 2004, pp. 414-418. 
19. K. T. Lucas, Parallel Algorithm for Prefix Computation on OTIS k-Ary 3-Cube Parallel Computers, International Journal of Recent Trends 
in Engineering, Vol. 1, No. 1, May 2009. 
20. D. K. Mallick and P. K. Jana, Parallel Prefix on Mesh of Trees and OTIS Mesh of Trees, in proc. of the Intl. Conf. on Parallel and Distri. 
Processing Techniques and Applications (PDPTA'08), Las Vegas, Nevada, USA, 13-17 July 2008, pp. 359-364. 
21. R. Alverson et al., The Tera Computer System, in proc. of Int'l Conf. Supercomputing, Assoc. of Comput. Machinery, N.Y., 1990, pp.1-6. 
22. http://ed-thelen.org/comp-hist/intel-paragon.html 
23. S. Scott and G. Thorson., Optimized routing in the Cray T3D, in Proc. of First International Workshop on parallel Computer Routing and 
Communication, volume 853 of Lecture Notes in Computer Science, pages 281–294, 1994. 
24. N A. Gara et al., Overview of the Blue Gene/L system architecture, IBM  J. Res. & Dev., 49(2/3):195–212, 2005. 
25. InfiniBand Trade Association. InfiniBand architecture specification release 1.2, 2004. 
26. D. Das, B. P. Sinha, Multi-mesh––an efficient topology for parallel processing, In Proc. of the Ninth International Parallel Processing 
Symposium, Santa Barbara, CA, April 25–28, 1995, pp. 17–21. 
27. S. K. Jha, P. K. Jana, “Fast Parallel Prefix on Multi-Mesh of Trees”, In Proc. of IEEE International Conference on Computer & 
Communication Technology, Motilal Nehru National Institute of Technology, Allahabad, 17-19 September 2010, pp: 641-646. 
 
