Sublogarithmic deterministic selection on arrays with a reconfigurable optical bus by Han, Y. et al.
Sublogarithmic Deterministic Selection on
Arrays with a Reconfigurable Optical Bus
Yijie Han, Yi Pan, Senior Member, IEEE, and Hong Shen
AbstractÐThe Linear Array with a Reconfigurable Pipelined Bus System (LARPBS) is a newly introduced parallel computational
model, where processors are connected by a reconfigurable optical bus. In this paper, we show that the selection problem can be
solved on the LARPBS model deterministically in Olog logN2= log log logN time. To our best knowledge, this is the best
deterministic selection algorithm on any model with a reconfigurable optical bus.
Index TermsÐAnalysis of algorithms, massive parallelism, optical bus, parallel algorithms, selection.
æ
1 INTRODUCTION
FIBER optic communication offers the advantage of highbandwidth, low error probability, and gigabit transmis-
sion capacity. Therefore, its use in interconnecting multi-
processors of a parallel computer has been proposed by
many researchers [4], [15], [16], [31]. Among them, the
distributed-memory SIMD (Single Instruction Multiple
Data) computer with pipelined optical buses has received
much attention [6], [10], [16], [17], [18], [27], [28] due to its
simplicity and low cost. On such systems, messages are
transmitted concurrently on the optical bus which connects
all processors. One cycle of such a parallel system is merely
the delay of the light between the furthest processors over a
waveguided bus. This design integrates the advantages of
both optical transmission and electronic computation.
Several slightly different versions of such a system have
been proposed in the literature, including the array
processors with pipelined buses (APPB) [7], [16], the array
processors with pipelined buses using switches (APPBS) [8], the
array with synchronous optical switches (ASOS) [28], the
reconfigurable array with spanning optical buses (ROSOB) [29].
Based on the research in reconfigurable meshes and
pipelined optical buses, the model of arrays with
reconfigurable optical buses has been proposed indepen-
dently by Pavel and Akl [23], Pavel [24], and by Pan and
Hamdi [19], and Pan and Li [20]. The model proposed by
Pavel and Akl [23] and Pavel [24] is the arrays with
reconfigurable optical buses (AROB), and the model pro-
posed by Pan and Hamdi [19] and Pan and Li [20] is the
linear arrays with a reconfigurable pipelined bus system
(LARPBS). Many algorithms have been designed on these
models, including inversion number computation [9],
neural network computation [3], various matrix opera-
tions [11], [12], [25], selection [18], [30], sorting [19], [20],
[26], [30]. The two models differ in some aspects. For
example, counting is not allowed during a bus cycle on
the LARPBS model, while it is permitted on the AROB
model.
In this paper, we study the selection problem on the
LARPBS model. The selection problem, when given s, is to
select the sth smallest data item among the N ordered (but
not sorted) data items. A straightforward way of solving the
selection problem is to sort the input data items and then
pick the sth smallest item. This approach, however, is
usually not efficient due the relatively high cost of
performing the sorting. Currently, sorting requires
OlogN expected time or Olog2 N deterministic time for
sorting N data items on N processor LARPBS [19], [21].
Solving the selection problem without sorting first has been
considered by researchers. Pan has studied the selection
problem on the LARPBS model and gave a selection
algorithm which runs in ON logN=p expected time using
p processors [18]. Li et al. described a deterministic
algorithm which selects in OlogN deterministic time for
N data items on N-processor LARPBS. Selection problem
has also been considered on other related model. For
example, Rajasekaran and Sahni solved the selection
problem on the 2D AROB model. Their randomized
algorithm runs in O1 time with high probability [30].
We note that the selection problem on the PRAM model
has been studied by Cole [2] and Chaudhuri et al. [1]. The
original idea is from Cole's paper [2]. Chaudhuri et al.
improved Cole's algorithm on the CRCW model by using
approximate counting. Currently, the PRAM algorithms on
the EREW model [2] runs in OlogN logN time with
optimal processor speedup. On the CRCW model, the
algorithm by Chaudhuri et al. [1] runs in OlogN= log logN
time with optimal processor speedup. Selection algorithms
have also been studied on hypercubes [32].
In this paper, we exploit the features of the LARPBS
model. We also make use of the ideas in Cole's paper [2]. By
using fast sorting algorithm to sort a smaller set of data on
the LARPBS model, we are able to exhibit a selection
702 IEEE TRANSACTIONS ON COMPUTERS, VOL. 51, NO. 6, JUNE 2002
. Y. Han is with Electronic Data Systems, Inc., 750 Tower Dr., CPS, Mail
Stop 7121, Troy, MI 48098. E-mail: han@cstp.umkc.edu.
. Y. Pan is with the Department of Computer Science, Georgia State
University, Atlanta, GA 30303. E-mail: pan@cs.gsu.edu.
. H. Shen is with the Graduate School of Information Science, Japan
Advanced Institute of Science and Technology, Tatsunokuchi, Ishikawa
923-1292, Japan. E-mail: shen@jaist.ac.jp.
Manuscript received 2 July 1999; revised, 30 Apr. 2001; accepted 22 May
2001.
For information on obtaining reprints of this article, please send e-mail to:
tc@computer.org, and reference IEEECS Log Number 110175.
0018-9340/02/$17.00 ß 2002 IEEE
algorithm for the LARPBS model which runs in
Olog logN2= log log logN time. Note that our algorithm
is not simply a simulation of Cole's PRAM algorithm. Novel
features and efficient basic data movement operations on
the LARPBS model are exploited to speedup the algorithm,
thus achieving a better time complexity than those on the
PRAM. Many research groups are working in this area [9],
[3], [11], [12], [25], [30]. To our best knowledge, this is the
first deterministic selection algorithm with sublogarithmic
time complexity on the LARPBS model.
2 THE LARPBS MODEL
The LARPBS model uses an optical bus to connect its
processors. A pipelined optical bus system uses optical
waveguides instead of electrical buses to transfer mes-
sages among electronic processors. The advantages of
using waveguides can be seen as follows: Besides the
high propagation speed of light, there are two important
properties of optical signal (pulse) transmission on an
optical bus: unidirectional propagation and predictable
propagation delay per unit length. These two properties
enable synchronized concurrent access of an optical bus
in a pipelined fashion [6], [16], [27], [28]. This, combined
with the abilities of a bus structure to do efficient
broadcasting or multicasting, makes the architecture
suitable for many applications that involve intensive
communication operations.
Fig. 1 shows a linear array in which electronic processors
are connected with an optical bus. Each processor is
connected to the bus with two-directional couplers, one for
transmitting on the upper segment and the other for receiving
from the lower segment of the bus [6], [16], [27], [28].
Messages are organized as fixed-lengthmessageframes. Note
that optical signals propagate unidirectionally from left to
right on the upper segment and from right to left on the lower
segment. This bus system is also referred to as the folded-bus
connection in [6].
A linear array with a reconfigurable pipelined bus
system (LARPBS) consists of N processors P1; P2; . . . ; PN
connected by an optical bus. In addition to the tremendous
communication capabilities, a LARPBS can also be parti-
tioned into k  2 independent subarrays
LARPBS1;LARPBS2; . . . ;LARPBSk;
such that LARPBSj contains processors
Pijÿ11; Pijÿ12; . . . ; Pij ;
where 0  i0 < i1 < i2    < ik  N . The subarrays can
operate as regular linear arrays with pipelined optical bus
systems and all subarrays can be used independently for
different computations without interference [19], [21]. Fig. 2
shows the LARPBS model with six processors. The array is
split into two subarrays, with the first one having four
processors and the second one having two processors. As in
many other synchronous parallel computing systems, a
LARPBS computation is a sequence of alternate global
communication and local computation steps. The time
complexity of an algorithm is measured in terms of the
total number of bus cycles in all the communication steps,
as long as the time of the local computation steps between
successive communication steps is bounded by a constant
and independent of the problem size. This complexity
measure implies that a bus cycle takes constant time and
this assumption has been adopted widely in the literature
[3], [6], [5], [9], [11], [12], [13], [23], [24], [25], [26], [27], [28],
[29], [30], [33]. (Remark: To avoid controversy, let us
emphasize that in this paper, by ªOfp time,º we mean
Ofp bus cycles for global communication plus Ofp
number of local arithmetic/logic operations.)
3 BASIC OPERATIONS
For ease of algorithm development and specification, a
number of basic communication, data movement, and
global operations on the LARPBS model implemented
HAN ET AL.: SUBLOGARITHMIC DETERMINISTIC SELECTION ON ARRAYS WITH A RECONFIGURABLE OPTICAL BUS 703
Fig. 1. A linear optical bus system of N processors.
Fig. 2. The LARPBS model of size 6 with two subarrays.
using the coincident pulse processor addressing technique
have been developed ([11], [19], [20], [21]). Each of these
primitive operations can be performed in a constant
number of bus cycles. These powerful primitives that
support massive parallel communications, plus the reconfi-
gurability of the LARPBS model, make the LARPBS very
attractive in solving problems. Optical buses are not only
communication channels among the processors, but also
active components and agents of certain computations, e.g.,
binary prefix sum. The following primitive operations on
LARPBS are used in this paper and our selection algorithm
is developed using these operations as building blocks.
3.1 One-to-One Communication
In this operation, each processor sends one data item to
another processor. This operation can be done easily in one
bus cycle (see detailed discussion in [19], [21]).
3.2 Broadcast
In a broadcast operation, we have a source processor who
sends a value in its local register to all the N processors. The
details of this operation are described in [19], [21] and can
be accomplished in one bus cycle.
3.3 Multicast
Multicast is a one-to-many communication operation. Each
processor may send a message to a group of processors in
the system. Each processor receives only one message from
a source processor during a bus cycle. This is a special case
of the h-relation, where h  O1, defined in [33], can be
done in O1 bus cycles.
3.4 Compression
Assume an array of N data elements with each processor
having one data element. Also, assume that the number of
active data elements in the array is s. Active elements are
labeled based upon certain values of their local variables. A
processor with an active element is referred to as an active
processor. The compression algorithm moves these active
data elements to processors N ÿ sÿ 1; N ÿ s; . . . ; N ÿ 1. In
other words, the compression algorithm moves all active
data items to the right side of the array. This operation can
be done in O1 bus cycle on a LARPBS [20], [21].
3.5 Binary Prefix Sum
Consider a LARPBS with N processors and N binary values
vi, 0  i  N ÿ 1. The binary prefix sum requires the
computation of
psumi  v0  v1  . . . viÿ1;
for all 0  i  N ÿ 1. It is shown that the binary prefix sum
can be done in O1 bus cycle on a LARPBS [20], [21].
4 FAST SELECTION ON THE LARPBS MODEL
We apply the ideas in [2] to speed up our algorithm. In
order to do so, we need a fast sorting algorithm. We first
show that N data items can be sorted in constant time with
N2 processors on the LARPBS model and that M data items
can be sorted in Olog logN time with Mk processors on





algorithms will be used to speed up the selection process.
To sort N data items a0; a1; . . . ; aNÿ1 with N2 proces-
sors, we first broadcast ai to processors iN through
i 1N ÿ 1. Then, processor iN  j compare ai with aj at
processor jN  i. If aj < ai or aj  ai and j < i processor,
iN  j will record a 1 otherwise it will record 0. Then, we
compute the sum Si of binary bits at processors iN
through i 1N ÿ 1. Si is the rank of ai. After computing
Si, ai is moved to the Sith processor. This sorts the input
data items and the time consumed is a constant. See more
details of the algorithm in [22].
We use fM to denote the time needed to sort M data














data items. We sort each set in
parallel and by recursion. This recursion takes fM= kp  time.





S0; S1; . . . ; S kp ÿ1, each containingM= kp data items. Now, we
show how to merge them into one sorted set. We use ga; b to
denote the time needed for merging b sorted sets each
containing a data items with k processors allowed for each
data item. Thus, our merging problem takes gM= kp ; kp 
time. For each set, we pick every M=kth data item from the




data items are picked from each
set. The total number of data items picked is k. We sort picked
data items in constant time using the algorithm outlined in the
previous paragraph. Let the k data items in sorted order be
b0; b1; . . . ; bkÿ1. These k picked data items, after being sorted,
defines k 1 intervals. The jth interval is> bjÿ1 but bj (bÿ1
is ÿ1 and bk is1). Each input data item is in one interval.
Because the way these k data items are picked, there are no





falls within any interval. Each data item uses k processors to
compare itself with the k picked data items to determine into





into k 1 segments with each segment containing no more
than M=k data items. We first compress the data items from




, in the 0th interval, in constant time, using the
compressing algorithm in the previous section. We then
move data item a in processor i, 0  i < M, to processor
Mjÿ 1  i if a falls into the jth interval for j > 0. We
compress these data items. We then arrange data items so that
data items that fall into the 0th interval are before data items in
other intervals. What we have done so far is to have data items
in the j 1th interval follow data items in the jth interval. The
data items in the jth interval are coming from Si,





sets,T0; T1; . . . ; T kp ÿ1, of sorted data items in
each interval into one set. As we have explained before, each




, has no more than M=k data items. Thus, the
merging which needs to be done takes gM=k; kp  time.
From the above paragraph, we arrive at two equations:



























where c1 is a constant. The first equation says that to sort M









sorted sets with each set containing




data items. The second equation says















sets with each set containing no more than M=k data items.







p   c2 because k
data items can be sorted in constant time with k2 processors
using the sorting algorithm in previous paragraph.
Solving gM= kp ; kp   gM=k; kp   c1, we obtain that
gM; kp   c3 logM= log k, where c3 is a constant.





  c3 logM= log k
 c3
log k













































Using the ideas in [2], we now show how to speed up
the selection process. Suppose that we have L data items
and N  Lk processors and we are going to select the sth
item. We show how to find a data item p such that the






and s. The same
method also allows us to find another data item q such












=4 data items. Therefore, we reduce the selec-
tion problem of selecting from L data items to the problem




=4 data items. As will be seen,
the whole reduction takes Olog logN time.
We use S to denote the set of L data items. We have Lk
processors and we are going to select the sth data item. We
use two stages. In the first stage, we first divide L data items








data items. We sort each set. Because k processors can be
allocated to each data item, the sorting can be done in
Olog logN time by our sorting algorithm given in previous




=2th item from each
sorted set and form set S1. The item ranked s1th in set S1 is
































In the second stage, we execute a loop until we find p.
Suppose we have a set Si of Li data items and Liki
processors at the beginning of the ith iteration of the loop
and we are looking for the sith item. If ki  Li, we use our
constant time sorting algorithm to sort the Li data items and
then return the sith data item. Otherwise, we divide Li data
items into Li=ki sets with each set containing ki items. We
then sort each set in constant time by using our sorting























. And, we have finished the ith
iteration of the loop.
Suppose that the loop in the second stage executed j
iterations and p is the final item returned from the jth
iteration which ranked sj in Sj. By the way, we fix si,
2  i  j, the rank of p in S1 is at most s1. Also, by the way,









p ÿ Ljÿ1= kjÿ1=kjÿ2q









ÿ   
ÿ Ljÿ1=






















=Li  k3=2i :




in S1. Also, by the
























































After we find q (which is symmetrical to p) which is







use p and q to eliminate most of the data items in S so that




=4 data items left from
which we form set T . Suppose the rank of p in S is s0. By
setting a 1 for data items less than p and setting a 0 for other
data items, and by using summation on binary bits in
HAN ET AL.: SUBLOGARITHMIC DETERMINISTIC SELECTION ON ARRAYS WITH A RECONFIGURABLE OPTICAL BUS 705
constant time, we can compute s0. We have now reduced
the selection problem of selecting the sth data item in S to
the problem of selecting the sÿ s0th data item in T . This
reduction takes Olog logN time.
Thus, we are able to reduce a selection problem of size L





=4 in Olog logN time. We call such a reduction
a phase of our algorithm. Initially, the processor advantage
is a constant c, i.e., we could use one processor to simulate
c processors. After i phases of reduction, the number of data







total number of phases needed to reduce the number of
data items to constant is Olog logN= log log logN. Because
we use Olog logN time for each phase, the time complex-
ity of our algorithm is Olog logN2= log log logN.
Theorem 1. The selection problem for N data items can be solved
on an N processor LARPBS in Olog logN2= log log logN
time.
5 CONCLUSIONS
We have demonstrated a fast deterministic selection
algorithm on the LARPBS model. Because certain opera-
tions such as compression and sorting a smaller set of data
can be done in constant time on the LARPBS model, we are
able to take advantage of them and to make our selection
algorithm faster than any existing PRAM selection algo-
rithm in the literature. We believe many other algorithms
can also take advantage of the high communication
bandwidth on the LARPBS model. Our selection algorithm
may also implicitly improve the results for many other
algorithms on the LARPBS and related models. We expect
to see more results in this area published in the future.
ACKNOWLEDGMENTS
This research was supported in part by the US National
Science Foundation under Grants CCR-9211621, OSR-
9350540, and CCR-9503882 and the Australian Research
Council under its Large Grants Scheme (1996-98)
A849602031 and Small Grants Scheme (1998).
REFERENCES
[1] S. Chaudhuri, T. Hagerup, and R. Raman, Computer Science,
Springer-Verlag, pp. 352-361, 1993.
[2] R.J. Cole, ªAn Optimally Efficient Selection Algorithm,º Informa-
tion Processing Letters 26, pp. 295-299, 1987/1988.
[3] B. Cong, ªMapping of ANNs on Linear Array with a Reconfigur-
able Pipelined Bus System,º Proc. 1997 Int'l Conf. Parallel and
Distributed Processing Techniques and Applications, pp. 522-529,
1997.
[4] P.W. Dowd, ªWavelength Division Multiple Access Channel
Hypercube Processor Interconnection,º IEEE Trans. Computers,
vol. 41, no. 10, pp. 1223-1241, Oct. 1992.
[5] M. Hamdi, C. Qiao, Y. Pan, and J. Tong, ªCommunication-
Efficient Sorting Algorithms on Reconfigurable Array of Proces-
sors with Slotted Optical Buses,º J. Parallel and Distributed
Computing, vol. 57, no. 2, pp. 166-187, May 1999.
[6] Z. Guo, R. Melhem, R. Hall, D. Chiarulli, and S. Levitan,
ªPipelined Communication in Optically Interconnected Arrays,º
J. Parallel and Distributed Computing, vol. 12, no. 3, pp. 269-282,
1991.
[7] Z. Guo, ªSorting on Array Processors with Pipelined Buses,º Proc.
1992 Int'l Conf. Parallel Processing, pp. 289-292, 1992.
[8] Z. Guo, ªOptically Interconnected Processor Arrays with Switch-
ing Capacity,º J. Parallel and Distributed Computing, vol. 23, pp. 314-
329, 1994.
[9] H. Kimm, ªInversion Number Algorithm on a Linear Array with
Reconfigurable Pipelined Bus System,º Proc. 1996 Int'l Conf.
Parallel and Distributed Processing Techniques and Applications,
pp. 1398-1408, 1996.
[10] S. Levitan, D. Chiarulli, and R. Melhem, ªCoincident Pulse
Techniques for Multiprocessor Interconnection Structures,º Ap-
plied Optics, vol. 29, no. 14, pp. 2024-2039, 1990.
[11] K. Li, Y. Pan, and S.-Q. Zheng, ªFast and Processor Efficient
Parallel Matrix Multiplication Algorithms on a Linear Array with
Reconfigurable Pipelined Bus System,º IEEE Trans. Parallel and
Distributed Systems, vol. 9, no. 8, pp. 705-720, Aug. 1998.
[12] K. Li, Y. Pan, and S.-Q. Zheng, ªParallel Matrix Computations
Using a Reconfigurable Pipelined Optical Bus,º J. Parallel and
Distributed Computing, vol. 59, no. 1, pp. 13-30, Oct. 1999.
[13] K. Li, Y. Pan, and S.-Q. Zheng, ªEfficient Deterministic and
Probabilistic Simulations of PRAMs on Linear Arrays with
Reconfigurable Pipelined Bus Systems,º The J. Supercomputing,
vol. 15, no. 2, pp. 163-181, Feb. 2000.
[14] Y. Li, Y. Pan, and S.-Q. Zheng, ªPipelined Time-Division Multi-
plexing Optical Bus with Conditional Delays,º Optical Eng., vol. 36,
no. 9, pp. 2417-2424, Sept. 1997.
[15] A. Louri, ªThree-Dimensional Optical Architecture and Data-
Parallel Algorithms for Massively Parallel Computing,º IEEE
Micro, vol. 11, no. 2, Apr. 1991.
[16] R. Melhem, D. Chiarulli, and S. Levitan, ªSpace Multiplexing of
Waveguides in Optically Interconnected Multiprocessor Sys-
tems,º The Computer J., vol. 32, no. 4, pp. 362-369, 1989.
[17] Y. Pan, ªHough Transform on Arrays with an Optical Bus,º Proc.
Fifth ISMM Int'l Conf. Parallel and Distributed Computing and
Systems, pp. 161-166, 1992.
[18] Y. Pan, ªOrder Statistics on Optically Interconnected Multi-
processor Systems,º Proc. First Int'l Workshop Massively Parallel
Processing Using Optical Interconnections, pp. 162-169, 1994.
[19] Y. Pan and M. Hamdi, ªQuicksort on a Linear Array with a
Reconfigurable Pipelined Bus System,º Proc. IEEE Int'l Symp.
Parallel Architectures, Algorithms, and Networks, pp. 313-319, 1996.
[20] Y. Pan and K. Li, ªLinear Array with a Reconfigurable Pipelined
Bus SystemÐConcepts and Applications,º Information Sciences,
vol. 106, no. 3/4, pp. 237-258, May 1998.
[21] Y. Pan, ªBasic Data Movement Operations on the LARPBS
Model,º Parallel Computing Using Optical Interconnections, K. Li,
Y. Pan, and S.Q. Zheng, eds., Boston: Kluwer Academic Publish-
ers, 1998.
[22] Y. Pan, K. Li, and S.-Q. Zheng, ªFast Nearest Neighbor Algorithms
on a Linear Array with a Reconfigurable Pipelined Bus System,º
Parallel Algorithms and Applications, vol. 13, pp. 1-25, 1998.
[23] S. Pavel and S.G. Akl, ªOn the Power of Arrays with Optical
Pipelined Buses,º Proc. 1996 Int'l Conf. Parallel and Distributed
Processing Techniques and Applications, pp. 1443-1454, 1996.
[24] S. Pavel, ªComputation and Communication Aspects of Arrays
with Optical Pipelined Buses,º PhD Dissertation, Dept. of
Computing and Information Science, Queen's Univ., Canada,
Oct. 1996.
[25] S. Pavel and S.G. Akl, ªMatrix Operations Using Arrays with
Reconfigurable Optical Buses, Parallel Algorithms and Applications,
vol. 11, pp. 223-242, 1996.
[26] S. Pavel and S.G. Akl, ªInteger Sorting and Routing in Arrays with
Reconfigurable Optical Bus,º Proc. 1996 Int'l Conf. Parallel
Processing, vol. III, pp. 90-94, Aug. 1996.
[27] C. Qiao, R. Melhem, D. Chiarulli, and S. Levitan, ªOptical
Multicasting in Linear Arrays,º Int'l J. Optical Computing, vol. 2,
no. 1, pp. 31-48, 1991.
[28] C. Qiao and R. Melhem, ªTime-Division Optical Communications
in Multiprocessor Arrays,º IEEE Trans. Computers, vol. 42, no. 5,
pp. 577-590, May 1993.
[29] C. Qiao, ªEfficient Matrix Operations in a Reconfigurable Array
with Spanning Optical Buses,º Proc. Fifth IEEE Symp. Frontiers of
Massively Parallel Computations, pp. 273-280, 1995.
706 IEEE TRANSACTIONS ON COMPUTERS, VOL. 51, NO. 6, JUNE 2002
[30] S. Rajasekaran and S. Sahni, ªSorting, Selection and Routing on the
Arrays with Reconfigurable Optical Buses,º IEEE Trans. Parallel
and Distributed Systems, vol. 8, no. 11, pp. 1123-1132 Nov. 1997.
[31] S. Sahni, ªModels and Algorithms for Optical and Optoelectronic
Parallel Computers,º Proc. Fourth IEEE Int'l Symp. Parallel
Architectures, Algorithms, and Networks, pp. 2-7, 1999.
[32] H. Shen, ªImproved Universal k-Selection in Hypercubes,º
Parallel Computing, vol. 18, no. 2, pp. 177-184, 1992.
[33] J.L. Trahan, A.G. Bourgeois, Y. Pan, and R. Vaidyanathan, ªAn
Optimal and Scalable Algorithm for Permutation Routing on
Reconfigurable Linear Arrays with Optically Pipelined Buses,º J.
Parallel and Distributed Computing, vol. 60, no. 9, pp. 1125-1136,
Sept. 2000.
Yijie Han received the BS degree from the
University of Science and Technology of China,
Hefei, Anhui, China, and the MA and the PhD
degrees from Duke University, Durham, North
Carolina, respectively. He was an assistant
professor at the University of Kentucky, a
lecturer at the Hong Kong University, and an
information specialist at Electronic Data Sys-
tems, Inc. Currently, he is at the University of
Missouri, Kansas City. His main research inter-
est is algorithm design. Parallel algorithms invented by him together with
his collegues on several important problems including linked list
coloring, graph coloring, all-pairs shortest path, integer sorting, minimum
spanning tree and connected components (on the EREW PRAM model),
maximal independent set, maximal matching, etc. are the best
algorithms to date.
Yi Pan received the BEng degree in computer
engineering from Tsinghua University, China, in
1982, and the PhD degree in computer science
from the University of Pittsburgh in 1991.
Currently, he is an associate professor in the
Department of Computer Science at Georgia
State University. Previously, he was a faculty
member in the Department of Computer Science
at the University of Dayton, Ohio. His research
interests include parallel algorithms and archi-
tectures, optical communication and computing, wireless networks,
high-performance data mining, distributed computing, task scheduling,
and networking. He has published more than 120 research papers
including over 50 papers in international journals such as IEEE
Transactions on Computers, IEEE Transactions on Parallel and
Distributed Systems, IEEE/ACM Transactions on Networking, IEEE
Transactions on Circuits and Systems, IEEE Transactions on Systems,
Man, and Cybernetics, IEEE Transactions on Vehicular Technology,
IEEE Transactions on Reliability, IEEE Communications Magazine,
IEEE Intelligent Systems, IEEE Computing in Science and Engineering,
Journal of Parallel and Distributed Computing, Optical Engineering, and
The Journal of Supercomputing. He has received many awards
including the Outstanding Scholarship Award of the College of Arts
and Sciences at University of Dayton (1999), a Japanese Society for the
Promotion of Science Fellowship (1998), an AFOSR Summer Faculty
Fellowship (1997), US National Science Foundation (NSF) Research
Opportunity Awards (1994 and 1996), and the best paper award from
PDPTA '96 (1996). His research has been supported by the NSF, the
AFOSR, the US Air Force, and the state of Ohio. Dr. Pan is currently an
associate editor of the IEEE Transactions on Systems, Man, and
Cybernetics, area editor-in-chief of the journal of Information, editor of
the journal of Parallel and Distributed Computing Practices, associate
editor of the International Journal of Parallel and Distributed Systems
and Networks, and serves on the editorial board of The Journal of
Supercomputing. He has served as a guest editor of special issues for
several journals, and as general chair, program chair, vice program
chair, publicity chair, session chair, and as a member for steering,
advisory, and program committees for numerous international confer-
ences and workshops. He is an IEEE Computer Society Distinguished
Visitor, a senior member of the IEEE and a member of the IEEE
Computer Society. He is listed in Men of Achievement, Marquis Who's
Who in America, and Marquis Who's Who in Midwest.
Hong Shen received the BEng degree from the
Beijing University of Science and Technology,
the MEng degree from the University of Science
and Technology of China, and the PhLic and
PhD degrees from Abo Akademi University,
Finland. Currently, he is a professor in the
Graduate School of Information Science, Japan
Advanced Institute of Science and Technology.
Professor Shen has published more than 130
technical papers on algorithms, parallel and
distributed computing, networking, parallel databases, data mining,
and multimedia systems. He has served as an editor, associate editor,
and editorial-board member of five international journals and chaired
several international conferences.
. For more information on this or any computing topic, please visit
our Digital Library at http://computer.org/publications/dlib.
HAN ET AL.: SUBLOGARITHMIC DETERMINISTIC SELECTION ON ARRAYS WITH A RECONFIGURABLE OPTICAL BUS 707
