Abstract-We study the connection capacity of a class of rearrangeable nonblocking (RNB) and strictly nonblocking (SNB) networks with/without crosstalk-free constraint, model their routing problems as weak or strong edge-colorings of bipartite graphs, and propose efficient routing algorithms for these networks using parallel processing techniques. This class of networks includes networks constructed from Banyan networks by horizontal concatenation of extra stages and/or vertical stacking of multiple planes. We present a parallel algorithm that runs in Oðlg 2 NÞ time for the RNB networks of complexities ranging from OðN lg NÞ to OðN 1:5 lg NÞ crosspoints and parallel algorithms that run in Oðminfd Ã lg N; ffiffiffiffi ffi N p gÞ time for the SNB networks of OðN 1:5 lg NÞ crosspoints, using a completely connected multiprocessor system of N processing elements. Our algorithms can be translated into algorithms with an Oðlg N lg lg NÞ slowdown factor for the class of N-processor hypercubic networks, whose structures are no more complex than a single plane in the RNB and SNB networks considered.
Abstract-We study the connection capacity of a class of rearrangeable nonblocking (RNB) and strictly nonblocking (SNB) networks with/without crosstalk-free constraint, model their routing problems as weak or strong edge-colorings of bipartite graphs, and propose efficient routing algorithms for these networks using parallel processing techniques. This class of networks includes networks constructed from Banyan networks by horizontal concatenation of extra stages and/or vertical stacking of multiple planes. We present a parallel algorithm that runs in Oðlg 2 NÞ time for the RNB networks of complexities ranging from OðN lg NÞ to OðN 1:5 lg NÞ crosspoints and parallel algorithms that run in Oðminfd Ã lg N; ffiffiffiffi ffi N p gÞ time for the SNB networks of OðN 1:5 lg NÞ crosspoints, using a completely connected multiprocessor system of N processing elements. Our algorithms can be translated into algorithms with an Oðlg N lg lg NÞ slowdown factor for the class of N-processor hypercubic networks, whose structures are no more complex than a single plane in the RNB and SNB networks considered.
Index Terms-Banyan network, crosstalk, optical switching, rearrangeable nonblocking network, strictly nonblocking network, switch control, self-routing, graph coloring, parallel algorithm.
ae

INTRODUCTION
T O build a large IP router with capacity of 1 Tb/s and beyond, either electronic or optical switching can be used. The deployment of optical fibers as a transmission medium has prompted searching for the solution to the problem of speed mismatching between transmission and switching. Optical routers have better scalability than electronic routers in terms of switching capacity. However, the required optical technologies are immature for alloptical switching to happen any time soon. A hybrid approach in which optical signals are switched, but both switch control and routing decisions are carried out electronically, becomes more practical. Advances in electro-optic technologies provide a promising choice to meet the increasing demands for high channel bandwidth and low communication latency in optical communication. However, due to the nature of optical devices, optical switches hold their own challenges [26] .
Crosstalk in Photonic Switching
A switching network usually comprises a number of switching elements (SEs), grouped into several stages interconnected by a set of links. Without loss of generality, we assume that an SE is of size 2 Â 2, i.e., it has two inputs and two outputs. The two inputs (respectively, outputs) of an SE intending to be connected with the same output (respectively, input) causes output link conflict(respectively, input link conflict). If an I/O connection path does not have any link conflict with other connection paths, it is called a conflict-free path. Nonblocking switching networks have been favored in switching systems because they can be used to set up any conflict-free one-to-one I/O connection paths. There are three types of nonblocking networks: strictly nonblocking (SNB), wide-sense nonblocking (WSNB), and rearrangeable nonblocking (RNB) [3] , [13] . In both SNB and WSNB networks, a connection can be established from any idle input to any idle output without disturbing existing connections. In SNB networks any of available conflict-free paths for a connection can be chosen and in WSNB networks, however, a rule must be followed to choose one. The high degree of connection capability in SNB and WSNB networks is at a high hardware cost. RNB networks, usually constructed with lower hardware cost, can establish a conflict-free path for the connection from any idle input to any idle output if the rearrangement of existing connections is allowed.
In an electrical switching network, links are wires and SEs are simple crossbar switches. In an optical switching network, links are implemented by optical waveguides and SEs can be implemented by electro-optical SEs such as common lithium-niobate (LiNbO 3 ) SEs (e.g., [11] , [12] , [28] ). Each electro-optical SE is a directional coupler with two inputs and two outputs. Depending on the amount of voltage at the junction of two waveguides, optical signals carried on either of two inputs can be coupled to either of two outputs. An electronically controlled optical SE can have switching speed ranging from hundreds of picoseconds to tens of nanoseconds [27] . However, due to the nature of optical devices, optical switches introduce additional challenges. One problem is path dependent loss, the substantial signal loss is directly proportional to connection diameter, the number of SEs on the longest connection path.
Another problem is crosstalk, 1 which is caused by undesired coupling between signals with the same wavelength carried in two waveguides so that two signal channels interfere with each other within an SE.
The crosstalk problem in photonic switching networks adds a new dimension of blocking, called node conflict, which happens when more than one connection with the same wavelength passes through the same SE at the same time. A technique called space dilation was introduced to avoid node conflict by increasing the number of SEs in a switching network (e.g., [15] , [16] , [24] , [29] , [30] , [31] , [33] ).
Motivation and Main Results
In a switching network, when more than one input request to be connected with the same output, output contention occurs. Output contentions can be resolved by switch scheduling. For a set of connection requests without output contentions, the process of establishing conflict-free connection paths to satisfy these requests is called switch routing. A switch routing (or simply, routing) algorithm is needed to find these paths. Once a set of conflict-free paths is found, the SEs on these paths can be properly set up. Routing algorithms play a more fundamental role in WSNB and RNB networks since the nonblockingness depends on them. For SNB networks, routing algorithms tend to be overlooked since a conflict-free path is always guaranteed for the connection from any idle input to any idle output without rerouting the existing connections. An efficient routing algorithm, however, is still needed to find such a conflict-free path for each connection request. Any routing algorithm requiring more than linear time would be considered too slow. Thus, finding efficient algorithms to speed up routing process is crucial for high-speed switching networks.1 2 pðx þ lg NÞN ¼ OðpN lg NÞ SEs, and its diameter is Oðlg NÞ. BðN; x; p; 0Þ and BðN; x; p; 1Þ are suitable for electronic and optical implementation, respectively. It has been shown that BðN; x; p; Þ can be SNB, WSNB, and RNB with certain values of x and p for given N and [15] , [16] , [21] , [30] , [31] .
The focus of this paper is studying the control aspect of the class BðN; x; p; Þ networks in the context of being used as electrical and optical switching networks. In particular, our objective is to speed up routing process using parallel processing techniques. By examining the connection capacity of BðN; x; p; Þ, we reduce the routing problems for this class of networks to a problem of partitioning a bipartite graph into "disjoint" subgraphs. Three general approaches for solving this type of graph partition problems have been reported. They are matrix decomposition (e.g., [5] , [17] , [23] , [25] ), matching (e.g., [6] , [7] , [9] ), and graph edge-coloring (e.g., [6] , [7] , [10] , [19] , [22] , [32] ). For routing, these approaches are essentially equivalent [13] . We model the routing problems for this class of networks as weak and strong edge-colorings of bipartite graphs, which unifies and extends previous models for RNB and SNB networks. Basing on our model, we propose fast routing algorithms for BðN; x; p; Þ using parallel processing techniques. We show that the presented parallel routing algorithms can route K connections in Oðlg N lg KÞ time for an RNB BðN; x; p; Þ and in In Section 2, we discuss the topology of BðN; x; p; Þ. In Section 3, we model routing in BðN; x; p; Þ as two coloring problems of an I/O mapping graph GðN; K; gÞ. In Section 4, we propose a fast parallel routing algorithm for RNB BðN; x; p; Þ based on a weak g-edge coloring of GðN; K; gÞ. In Section 5, we present parallel routing algorithms for SNB BðN; x; p; Þ based on a strong ð2g À 1Þ-edge coloring of GðN; K; gÞ. We conclude our paper in Section 6.
NONBLOCKING NETWORKS BASED ON BANYAN NETWORKS
Banyan-Type Networks
A switching network is a self-routing network if any connection within which can be established only by the addresses of its source and destination regardless of other connections. Self-routing is an attractive feature in that no complicated control mechanism is needed for establishing connection. A class of multistage self-routing networks, Banyan-type networks, has received considerable attention. A network belonging to this class satisfies the following basic properties:
n outputs, n-stages, and N=2 SEs in each stage. 2. There is a unique path between each input and each output. 3. Let u and v be two SEs in stage i, and let S j ðuÞ and S j ðvÞ be two sets of SEs to which u and v can reach in stage j, 0 < j ¼ i þ 1 lg N, respectively. Then, S j ðuÞ \ S j ðvÞ ¼ ; or S j ðuÞ ¼ S j ðvÞ for any u and v. Because of the above three properties (short connection diameter, unique connection path, uniform modularity, etc.), Banyan-type networks are very attractive for constructing switching networks. Several well-known networks, such as Banyan, Omega, and Baseline, belong to this class. It has been shown that these networks are topologically equivalent [1] , [34] . In this paper, we use Baseline network as the representative of Banyan-type networks. 1 . In this paper, the crosstalk is referred to the first-order nonfilterable SE crosstalk [20] , [21] .
2. In this paper, N ¼ 2 n (n ¼ lg N) and all logarithms are in base 2.
An N Â N Baseline network, denoted by BLðNÞ, is constructed recursively. A BLð2Þ is a 2 Â 2 SE. A BLðNÞ consists of a switching stage of N=2 SEs, and a shuffle connection, followed by a stack of two BLðN=2Þs. Thus, a BLðNÞ has lg N stages labeled by 0; Á Á Á ; n À 1 from left to right, and each stage has N=2 SEs labeled by 0; Á Á Á ; N=2 À 1 from top to bottom. The upper and lower outputs of each SE in stage i are connected with two BLðN=2 iþ1 Þs, named upper subnetwork and lower subnetwork, respectively. The N links interconnecting two adjacent stages i and i þ 1 are called output links of stage i and input links of stage i þ 1. The input (respectively, output) links in the first (respectively, last) stage of BLðNÞ are connected with N inputs (respectively, outputs) of BLðNÞ. To facilitate our discussions, the labels of stages, links, and SEs are represented by binary numbers. Let a l a lÀ1 Á Á Á a 1 a 0 be the binary representation of a. We use " a a to denote the integer that has the binary representation a l a lÀ1 Á Á Á a 1 ð1 À a 0 Þ. An example is shown in Fig. 1 .
The self-routing in BLðNÞ is decided by the destination,
the input of the SE on the connection path in stage i is connected to the SE's upper output, and to the lower output otherwise (i.e., d nÀiÀ1 ¼ 1). As shown in Fig. 1 , connection paths P 0 and P 1 are set up by self-routing in BLð16Þ. In general, the unique path for a connection from source s nÀ1 Á Á Á s 0 to destination d nÀ1 Á Á Á d 0 can be derived as follows: the path enters SE
of the SE and leaving the SE using its output link
. By this self-routing property, the connection path for any input/output pairs of BLðNÞ can be computed in Oðlg NÞ time. Therefore, we have the following simple fact: Lemma 1. Given any K( N) one-to-one distinct input/output pairs, the connection paths in BLðNÞ for these pairs can be computed in Oðlg NÞ time using N processing elements (PEs) if each PE is assigned to Oð1Þ pairs.
Horizontal Concatenation and Vertical Stacking
If Baseline network is used for photonic switching, it is a blocking network since two connections may pass through the same SE, which causes node conflict. Even if Baseline network is used for electronic switching, it is still blocking since two connections may try to pass through the same input (respectively, output) link, which causes input (respectively, output) link conflict. Fig. 1 shows two connection paths P 0 from 0010 to 1011 and P 1 from 0100 to 1010. P 0 and P 1 have output link conflict in stage 2 and input link conflict in stage 3. If each SE is an electro-optic SE in BLð16Þ, then they also have node conflict at SEs 4 and 5 in stages 2 and 3, respectively. Although a Baseline network is blocking, a nonblocking network can be built by extending it in three ways: horizontal concatenation of extra stages to the back of a Baseline network, vertical stacking of multiple copies of a Baseline network, and the combination of both horizontal concatenation and vertical stacking [15] , [16] , [30] , [31] . In the general approach, a network is constructed by concatenating the mirror image of the first xð< nÞ stages of BLðNÞ to the back of a BLðNÞ to obtain BLðN; xÞ, then vertically making p copies of BLðN; xÞ, where each copy is called a plane and, finally, connecting the inputs (respectively, outputs) in the first (respectively, last) stage to N 1 Â p splitters (respectively, p Â 1 combiners). Specifically, the ith input (respectively, output) of the jth plane is connected with the jth output (respectively, input) of the ith 1 Â p splitter (respectively, p Â 1 combiner), which is connected with the ith input (respectively, output) of this network. We denote a network constructed in this way by BðN; x; p; Þ, where is crosstalk factor: ¼ 0 if the network has no crosstalk-free constraint (i.e., the network has only link conflict-free constraint) and ¼ 1 if the network has crosstalk-free constraint (i.e., the network has node conflictfree constraint). Asymptotically, the cost of BðN; x; p; Þ is OðpN lg NÞ, measured either by the number of SEs or by the number of crosspoints [13] . Note that BðN; x; p; Þ can be nonblocking for certain combinations of N, x, p, and . The complexity of RNB networks considered in this paper have complexities ranging from OðN lg NÞ to from OðN 1:5 lg NÞ and the SNB networks considered have complexity OðN 1:5 lg NÞ. In BðN; x; 1; Þ, a subnetwork, denoted by BðN; x; 1=2 l ; Þ (0 l n À 1) is defined as a BðN=2 l ; maxfx À l; 0g; 1; Þ from stage l to stage n þ maxfx À l; 0g À 1. Fig. 2 shows an example of Bð16; 2; 3; Þ, which contains three planes of Bð16; 2; 1; Þ, and each Bð16; 2; 1; Þ is constructed from Bð16; 0; 1; Þ by adding two extra stages. Each Bð16; 2; 1; Þ contains two Bð16; 2; 1=2; Þs, each being Bð8; 1; 1; Þ, and four Bð16; 2; 1=4; Þs, each being Bð4; 0; 1; Þ.
Designing Parallel Switch Routing Algorithms
A trivial lower bound on the time for routing K ð0 K NÞ connections sequentially in BðN; x; p; Þ is ðK lg NÞ. This lower bound is obtained by assuming that for any connection it takes Oð1Þ time to correctly guess which plane to use without conflict and Oðlg NÞ time to compute the connection path in that plane. Clearly, correctly assigning connections to planes is not a simple task, when x 6 ¼ 0 and p > 1. When the number of connection requests is large, the routing time complexity is greater than OðNÞ. Parallel processing techniques should be used to meet the stringent real-time timing requirement [13] . To the best of our knowledge, except for some special cases such as Banyan network (i.e., BðN; 0; 1; Þ) and Benes network (i.e., BðN; lg N À 1; 1; Þ), no effort of investigating faster routing for the whole class of these networks has been reported in the literature.
We choose to present our parallel algorithms for a completely connected multiprocessor system. A completely connected multiprocessor system of size N consists of N processing elements (PEs), P E i , 0 i N À 1, connected in such a way that there is a connection between every pair of PEs. We assume that each PE can communicate with at most one PE during a communication step. The time complexity of an algorithm on such a multiprocessor system is measured in terms of the total number of parallel computation and communication steps required by the algorithm. Such a multiprocessor system is by no means to be practical, but used as a general abstract model to derive parallel algorithms. Efficient algorithms on more realistic models, such as the class of hypercubic parallel computers, whose architectural complexity is the same as that of a single plane of BðN; x; p; Þ, can be easily obtained from our algorithms.
GRAPH MODEL
I/O Mapping Graphs
For BðN; x; p; Þ, let I be a set of N inputs, I 0 ; Á Á Á ; I NÀ1 , and O be a set of N outputs,
, 0 i n. Then, the kth modulo-g input group comprises inputs I ðkÀ1Þg ; I ðkÀ1Þgþ1 ; Á Á Á ; I kgÀ1 , and the kth modulo-g output group comprises outputs O ðkÀ1Þg ; O ðkÀ1Þgþ1 ; Á Á Á ; O kgÀ1 , where 1 k N=g. Let : I7 À!O be an I=O mapping that indicates connections from I to O. If there is a connection from I i to O j , then set ðiÞ ¼ j and À1 ðjÞ ¼ i; otherwise, set ðiÞ ¼ À1. If j 6 ¼ ðiÞ for any I i , then set À1 ðjÞ ¼ À1. We say that an input (respectively, output, link, SE) is active if it is on a connection path, and idle otherwise. An I/O mapping from I to O is one-to-one if each I i is mapped to at most one O j and ðiÞ 6 ¼ ðjÞ for any i 6 ¼ j. In this paper, all I/O mappings are one-to-one and all connections belong to a one-to-one I/O mapping. Our goal is to quickly route Kð NÞ link (respectively, node) conflict-free paths for K connections of any I/O mapping in BðN; x; p; 0Þ (respectively, BðN; x; p; 1Þ). To achieve this goal, we decompose a set of connections into disjoint subsets, and route each subset in one plane of BðN; x; p; Þ so that each subset is feasible for its assigned plane.
Given any I/O mapping with K connections for BðN; x; p; Þ, we construct a graph GðN; K; gÞ, named I/O mapping graph, as follows: The vertex set consists of two parts, V 1 and V 2 . Each of them has N=g vertices labeled from 0 to N=g À 1. Each modulo-g input (respectively, output) group is represented by a vertex in V 1 (respectively, V 2 ). There is an edge between vertex bi=gc in V 1 and vertex bj=gc in V 2 if j ¼ ðiÞ. Thus, GðN; K; gÞ is a bipartite graph with N=g vertices in each of V 1 and V 2 and K edges, where at most g edges are incident at any vertex. Clearly, the degree of GðN; K; gÞ, the maximum number of edges incident at a vertex, is no larger than g. Since there may be more than one connection from a modulo-g input group to the same modulo-g output group, GðN; K; gÞ may have parallel edges, the edges between the same two vertices, and it may be a multigraph. However, there is a one-to-one correspondence between active inputs/outputs in an I/O mapping and the edges in the I/O mapping graph and, thus, we can label each edge by its corresponding input.
An edge e is called the left edge (respectively, right edge) of edge f if e ¼ " f f (respectively, ðeÞ ¼ ðfÞ). Any edge has at most one left edge and at most one right edge in GðN; K; gÞ. Two edges e and f are called neighboring edges if e is the left or right edge of f. We define a linear component (or simply, a component) of GðN; K; gÞ as follows: two edges e and f belong to the same component if and only if there is a sequence of edges e ¼ e 1 ; Á Á Á ; e j ¼ f such that e i and e iþ1 , 1 i j À 1, are neighboring edges. If every edge in a component has two neighboring edges, the component is called a closed component; otherwise, it is called an open component. By generalizing "neighboring edge" to an equivalent relation, each edge is in exactly one component and, thus, components are edge disjoint in GðN; K; gÞ. Fig. 3a shows an I/O mapping with 32 inputs, 25 of which are active. Fig. 3b shows the I/O mapping graph Gð32; 25; 8Þ of Fig. 3a , where V 1 (respectively, V 2 ) of Gð32; 25; 8Þ has four vertices and each vertex in V 1 (respectively, V 2 ) includes eight inputs (respectively, outputs) belonging to the same modulo-8 input (respectively, output) group. Fig. 3c shows all components of Gð32; 25; 8Þ in Fig. 3b. 
Graph Coloring and Nonblockingness
Let us study the connection capability of BðN; x; p; Þ first. We say that two connections share a modulo-g input (respectively, output) group if their sources (respectively, destinations) are in the same modulo-g input (respectively, output) group. Lemma 2. For any connection set C of BðN; 0; 1; Þ, if no two connections in C share any modulo-g input (respectively, output) group, then the connection paths for C satisfy the following conditions: 1) they are node conflict-free in the first (respectively, last) lg g stages, and 2) they are input link conflict-free in the first lg g þ 1 (respectively, last lg g) stages and output link conflict-free in the first lg g (respectively, last lg g þ 1) stages.
Lemma 3. For any pair of input and output in BðN; x; 1; Þ, there are 2 x paths connecting them. It is easy to verify that Lemmas 2 and 3 are true according to the topology of BLðNÞ (refer to [21] for formal proofs). We say that a set C of I/O connections is feasible for BðN; x; p; 0Þ (respectively, BðN; x; p; 1Þ) if they can be routed without any link (respectively, node) conflict. Using the above two lemmas, the following claim can be easily derived from the results of [21] . By Lemma 4, if we assign the connections of BðN; x; p; Þ with sources (respectively, destinations) passing through the same modulo-g input (respectively, output) group to different planes, then we can route connections in BðN; x; p; Þ without conflict. Thus, in order to route conflict-free connections in BðN; x; p; Þ, we first need to determine which plane to be used for each connection. By constructing an I/O mapping graph GðN; K; gÞ with g ¼ 2 b nÀxþ 2 c , we can reduce the problem of routing K connections in BðN; x; p; Þ to the following two graph coloring problems:
Weak Edge Coloring Problem (WEC problem): Given an I/O mapping graph GðN; K; gÞ with K 0 ð< KÞ colored edges, color K edges with a set of colors such that no two edges with the same color are incident at the same vertex of GðN; K; gÞ with changing the colors of the K 0 colored edges allowed. If we can find a weak edge-coloring of GðN; K; gÞ using at most c 1 different colors, we call this coloring a (weak) 3 c 1 -edge coloring of GðN; K; gÞ. Strong Edge Coloring Problem(SEC problem): Given an I/O mapping graph GðN; K; gÞ with K 0 ð< KÞ colored edges, color K À K 0 uncolored edges with a set of colors such that no two edges with the same color are incident at the same vertex of GðN; K; gÞ without changing the colors of the K 0 colored edges. If we can find a strong edge-coloring of GðN; K; gÞ using at most c 2 different colors, we call this coloring a strong c 2 -edge coloring of GðN; K; gÞ.
If we consider the colored (respectively, uncolored) edges in GðN; K; gÞ as the existing (respectively, new) connections in BðN; x; p; Þ, a solution to the WEC problem is a plane assignment for routing in an RNB network since we can reroute existing connections, and a solution to the SEC problem is a plane assignment for routing in an SNB network since rerouting existing connections is prohibited.
Clearly, for the same GðN; K; gÞ, c 1 c 2 . In Fig. 4, we Fig. 4a , and an SEC solution is given in Fig. 4b . Note that, in Fig. 4b , an additional color is needed for edge b because the colors of existing colored edges a and c cannot be changed. To our knowledge, no parallel algorithm for the SEC problem has been reported in the literature. It is important to note that the minimum value of p in Lemma 5 equals to the value of g in Lemma 4, where p is the number of BðN; x; 1; Þ planes required for BðN; x; p; Þ to be rearrangeable nonblocking. The number of crosspoints in such an RNB network is OðN lg NÞ for x ¼ n À 1 and OðN 1:5 lg NÞ for x ¼ 0. By Lemmas 4 and 5, if we assign the connections (including existing and new connections) sharing the same modulo-g input/output group to different planes, the connections assigned to each plane are feasible for that plane. Then, the routing can be completed by finding conflict-free connection paths within each plane. The following known fact is useful.
Lemma 6. Every bipartite multigraph G has a ÁðGÞ-edge coloring, where ÁðGÞ is the degree of G.
By Lemma 6 (see a proof in [4] ), if we set g ¼ 2 b nÀxþ 2 c in GðN; K; gÞ, the plane assignments for a set of connections in RNB BðN; x; p; Þ can be solved by finding a g-edge coloring of GðN; K; gÞ.
Algorithm for Balanced 2-Coloring of GðN; K; gÞ
In order to solve WEC problem efficiently, we present an algorithm for a related problem, named balanced 2-coloring problem: Given an I/O mapping graph GðN; K; gÞ, color its edges with two colors so that every vertex is adjacent to at most g=2 edges with one color and g=2 with the other.
Our algorithm is for a completely connected multiprocessor system of size N consists of N PEs. Initially, each PE i reads ðiÞ from input i and sets the value of À1 in PE ðiÞ as i. Then, the algorithm performs the following two steps.
3. The definition of weak edge-coloring is the same as the definition of edge-coloring in graph theory. Thus, we omit "weak" in the rest of this paper. Step 1. Divide the I/O mapping graph GðN; K; gÞ into a set of components. This step can be done by each edge finding its left edge " i i and right edge À1 ððiÞÞ.
Step 2. Color components with two colors, red and blue, so that neighboring edges in each component have different colors.
Each component has two specific representatives, simply referred to Reps. (There is an exception: for the component with length of 1, there is only one Rep, which is itself.). For closed and open components, the Reps are defined differently. For a closed component, we define two edges with the minimum labels as two Reps; for an open component, if an edge e has no left edge or e's left edge has no right edge, e is defined as one Rep. Fig. 3c shows the Reps of all possible types of components.
Step 2 can be done by coloring edges with the Reps as references using the pointer jumping technique in [14] . At the beginning, each edge sets its pointer to point to the right edge of its left edge if it exists and to itself otherwise. By doing so, two disjoint directed cycles are formed for a closed component, and two disjoint directed paths are formed for an open component with more than one edge, each containing a Rep. For an open component, furthermore, the end pointer of every directed path is pointing to one of the Reps. For example, Fig. 3d shows that the directed cycles and paths formed from the components of Fig. 3c . Then, by performing dlg K=2e times of parallel pointer jumping, each edge finds the Rep belonging to the same directed cycle or path. Finally, each edge can be colored by comparing the value of the Rep found by itself with that by its neighbor. That is, if the value of the Rep founded by an edge is no larger than its neighbor's, color the edge with red; otherwise, color it with blue. The detailed implementation of a balanced 2-coloring algorithm is referred to Algorithm 1 4 (see Fig. 5 ), and the correctness and time complexity of this algorithm are given in the following theorem. Theorem 1. A balanced 2-coloring of any GðN; K; gÞ can be found in Oðlg KÞ time using a completely connected multiprocessor system of N PEs.
Proof. Given an I/O mapping graph GðN; K; gÞ, Step 1 can be done in Oð1Þ time using a completely connected multiprocessor system of N PEs. In Step 2, since the length of each directed cycle or path is at most dK=2e, each edge can find a Rep by dlg K=2e times of pointer jumping. Clearly, all edges in the same directed cycle or path are colored with the same color since they find the same Rep. The pointer initialization implies that each edge and its neighboring edge are in different directed cycle or path and, thus, they have different colors. By the definition of left/right edge, there are no more than g=2 pairs of neighboring edges incident at any vertex of GðN; K; gÞ. Thus, the coloring of all components compose a balanced 2-coloring of GðN; K; gÞ. Therefore, a balanced 2-coloring of any GðN; K; gÞ can be found in Oðlg KÞ time. t u
Algorithm for g-Edge Coloring of GðN; K; gÞ
Based on the balanced 2-coloring algorithm, a W EC solution to any I/O mapping graph GðN; K; gÞ with no more than g colors can be found as follows: Let d be the degree of GðN; K; gÞ. Let k be the smallest integer such that d 2 k . Clearly, 0 k lg g since d g. First, remove colors of the K 0 colored edges. Then, perform at most dlg de iterations as follows: In initial iteration (i.e., iteration 0), we find a balanced 2-coloring of GðN; K; gÞ using colors 0 and 1 if d > 1, and let G 0 and G 1 be the graphs induced by the edges with colors 0 and 1, respectively. If ÁðG 0 Þ > 1 (respectively, ÁðG 1 Þ > 1), we execute iteration 1 to find a balanced 2-coloring for G 0 (respectively, G 1 ) using colors 00 and 01 (respectively, 10 and 11). This process recursively continues in a binary tree fashion until a solution to WEC is reached. More formally, in each recursive iteration i, 1 i dlg de À 1, we find a balanced 2-coloring for each graph G z using colors z0 and z1 (i.e., concatenate 0 or 1 with z) if ÁðG z Þ > 1, where z is a binary representation of an integer in f0; 1; Á Á Á ; 2 i À 1g denoting the color of edges in G z in iteration i À 1.
Theorem 2. For any I/O mapping graph GðN; K; gÞ, a g-edge coloring can be found in Oðlg d Á lg KÞ time using a completely connected multiprocessor system of N PEs, where d is the degree of GðN; K; gÞ.
k . We prove the theorem by induction on k. If k ¼ 1, it is true since a balanced 2-coloring is a 2-edge coloring by Theorem 1. Assume that for any k < m n, the theorem holds. Now, we prove that the theorem holds for k ¼ m. 
Parallel Routing in a Plane
We have shown how to assign each connection to a plane in an RNB BðN; x; p; Þ. In this section, we show how connections are routed within each plane. which subnetwork is to be used for each connection since there are 2 i BðN; x; 1=2 i ; Þs. This can be reduced to a 2-edge coloring of a bipartite graph with degree of 2. For each subnetwork BðN; x; 1=2 i ; Þ, 0 i x À 1, we construct an I/O mapping graph GðN=2 i ; K i ; 2Þ, where K i is the number of connections passing through it. We color the edges of GðN=2 i ; K i ; 2Þ with two different colors and assign the connections (edges) with the same color to the same subnetwork BðN; x; 1=2 iþ1 ; Þ. Specifically, in each iteration i, 0 i x À 1, we run g-edge coloring algorithm for 2 i GðN=2 i ; K i ; 2Þs with g ¼ 2. By Theorem 2, each iteration can be done in Oðlg KÞ time. Thus, the time to route K feasible connections in the first and last x stages is Oðx lg KÞ. By Lemmas 1 and 7, we can route the connections in the middle lg N À x stages by self-routing, which takes lg N À x time. Therefore, the total time to route K feasible connections of BðN; x; 1; Þ is Oðx lg K þ lg NÞ using a completely connected multiprocessor system of N PEs. tively. For the RNB BðN; n À 1; 1; 0Þ, which is the electronic Benes network, this performance is the same as the best known results reported in [19] , [22] .
ROUTING IN STRICTLY NONBLOCKING NETWORKS
Strict Nonblockingness
The following lemma can be easily derived from the results of [31] .
then BðN; x; p; Þ is strictly nonblocking.
For an SNB network, we can route new connections (as long as these connections form an I/O mapping from idle inputs to idle outputs) without disturbing the existing ones; however, this routing problem is harder than that in an RNB network when we need to route the new connections simultaneously. Based on the discussions in Section 3.2, we know that the routing problem for an SNB BðN; x; p; Þ can be solved by finding a strong edge-coloring of the I/O mapping graph GðN; K; gÞ.
Lemma 9. Any multigraph G has a strong ð2Á À 1Þ-edge coloring, where Á is the degree of G.
Proof. Consider coloring edges in an arbitrary order. Since each edge in G is adjacent to at most 2Á À 2 edges, any uncolored edge in G can always be assigned a color so that the total number of colors used is no larger than
We consider a subclass of SNB networks, BðN; 0;
By Lemma 8, we know that BðN; 0; p Ã ; Þ is an SNB network. Since each plane of BðN; 0; p Ã ; Þ is a Baseline network, the routing of connections in any plane can be done by self-routing. Thus, the problem of routing connections in BðN; 0; p Ã ; Þ is reduced to finding a plane for each new connection so that all connections, including existing ones, are conflict-free. By Lemmas 4 and 9, this can be done by finding a strong ð2g À 1Þ-edge coloring for GðN; K; gÞ of BðN; 0; p Ã ; Þ with K 0 existing connections and
2 . In the next two sections, we present two parallel algorithms to find a strong ð2g À 1Þ-edge coloring of GðN; K; gÞ using different approaches.
Before presenting our algorithms, we give a couple of definitions. Let GðN; K À K 0 ; gÞ and GðN; K 0 ; gÞ denote the graphs obtained from GðN; K; gÞ by removing the K 0 colored edges and only keeping K 0 colored edges, respectively. Since GðN; K; gÞ is a bipartite multigraph, GðN; K À K 0 ; gÞ is also a bipartite multigraph with two vertex set V 1 ¼ fv k corresponds to the kth modulo-g input group and output group, respectively. We say color c is free at vertex v if none of edges adjacent to v has color c. If color c is free at two ends of edge e, then c is free for e. One edge e is conflict with another edge f if e and f are adjacent to each other and they have the same color.
First Algorithm for Strong Edge-Coloring of
GðN; K; gÞ
The idea of the first algorithm is that we first partition the set of uncolored edges into edge-disjoint subsets, and then we color the subsets one by one. The edges in the same subset may be colored differently depending on the free colors for each edge. The edge-disjoint subsets can be found by finding a set of matchings of GðN; K À K 0 ; gÞ, where a matching of GðN; K À K 0 ; gÞ is defined as a set M of edges in GðN; K À K 0 ; gÞ such that no two edges in M are adjacent.
k . Our first algorithm computes a strong ð2g À 1Þ-edge coloring of GðN; K; gÞ with K 0 ð< KÞ colored edges by performing the following two steps.
Step 1: Find a set of matchings fM 1 ; M 2 ; Á Á Á ; M d 0 g of GðN; K À K 0 ; gÞ. Our second algorithm consists of 2g iterations. In each iteration, we try to color a set of nonparallel uncolored edges using one of colors in a set of 2g colors, f0; 1; Á Á Á ; 2g À 1g, so that no two edges with the same color are adjacent to the same vertex. Then, for each edge e with color 2g À 1, we recolor it by a free color in f0; 1; Á Á Á ; 2g À 2g. The following is the outline of the algorithm:
for all i; j 2 f1; 2; Á Á Á ; N=gg do c i;j :¼ ði þ j þ lÞ mod 2g; if there is an uncolored edge in E i;j and color c i;j is free at The correctness of this algorithm can be derived from the following five simple facts:
1. In iteration i, one uncolored edge, if any, in each E i;j is selected. This is obvious. Note that such a selected edge may not be colored in the iteration. 2. In iteration i, if two edges, one in E i;j and one in E p;q , are assigned the same color, i.e., c i;j ¼ c p;q , then i 6 ¼ p and j 6 ¼ q. 
, which implies that jj À qj ¼ 2g Â y, where y is a nonnegative iteger. Since j; q 2 f1; 2; Á Á Á ; N=gg and g ¼ 2 b nþ 2 c , we have jj À qj < 2g. Thus, y ¼ 0 and j ¼ q, which contradicts the assumption. 3. For each uncolored edge, all 2g possible colors are tried before it is assigned a color in the worst case. By the algorithm, this is obviously true. 4. After 2g iterations, no two adjacent edges are assigned the same color. By Fact 2, this is obviously true for any two nonparallel edges. For any two (parallel) edges in E i;j , they are assigned different colors because of Fact 3 and the fact that their colors are computed using different l values in different iterations. 5. The edges with the same color 2g can be recolored concurrently using the colors in f0; 1; Á Á Á ; 2g À 2g so that none of adjacent edges is assigned the same color. By Fact 4 and Lemma 9, each edge with color 2g can be reassigned a color in f0; 1; Á Á Á ; 2g À 2g without resulting in any color conflict. Now, we show that this algorithm can be implemented in OðgÞ ¼ Oð ffiffiffiffi ffi N p Þ time using a completely connected multiprocessor system of N PEs. This is equivalent to showing that each of the 2g iterations takes Oð1Þ time. We associate a 2g-bit binary array C v ½0::2g À 1 with each vertex v of GðN; K; gÞ such that C v ½c ¼ 1 if and only if color c is available at vertex v, and assign N=ð2gÞ PEs to v. Then, the operations of finding if a given color c is available at v and updating C v ½c can be carried out in Oð1Þ time. We only need to make sure that the operation of finding an uncolored edge in E i;j , 1 i; j N=g, (if any) in each iteration can be done in Oð1Þ time. This can be achieved by a preprocessing step of sorting. For each vertex v 0 i , we can sort all edges in each E i ¼ [ N=g j¼1 E i;j , 1 i N=g, of GðN; K À K 0 ; gÞ, using g PEs with Oð1Þ edges per PE, in nondecreasing order of j in Oðlg 2 gÞ time. Then, we assign a set of N=ð2gÞ ¼ OðgÞ PEs to each vertex of GðN; K; gÞ in such a way that each E i;j is allocated Oð1Þ PE, which is used to find an uncolored edge in E i;j . Based on the sorted edges, a PE associated with E i;j can find the starting locations of its assigned edges in OðgÞ time. After this preprocessing, the operation of finding uncolored edges in each iteration can be done in Oð1Þ time.
Finally, recoloring edges with color 2g can be done in Oðlg gÞ time, since this operation is similar to one iteration of
Step 2 of our first algorithm presented in the previous section. In summary, we have the following result.
Theorem 6. For any I/O mapping graph GðN; K; gÞ with K 0 ð< KÞ colored edges, a strong ð2g À 1Þ-edge coloring can be found in OðgÞ time using a completely connected multiprocessor system of N PEs.
Performance Analysis
We summarize the overall performance of our routing algorithm for SNB network BðN; 0; p Ã ; Þ by the following theorem. 
CONCLUSION
The major contribution of this paper is the design and analysis of parallel routing algorithms for a class of nonblocking switching networks, BðN; x; p; Þ. Although the assumed parallel machine model is a completely connected multiprocessor system of N PEs, the proposed algorithms can be transformed to algorithms for more realistic parallel computing models. The pointer jumping technique and any one-to-one permutation communication step used in our proposed algorithms can be implemented by sorting on realistic parallel computing structures. Let SðNÞ be the time for sorting N elements on a parallel machine M with N processors, then our algorithms can be implemented with a slow-down factor SðNÞ on M. It is known that sorting N numbers on the class of hypercubic networks takes Oðlg N lg lg NÞ time [8] , [18] . This class of networks include hypercube, cube-connected-cycles, butterfly networks, baseline networks, reverse baseline networks, Omega networks, flip networks, de Bruijin graphs, shuffleexchange networks, banyan networks, delta networks, bidelta networks, k-ary butterflies, and Benes networks [18] . Our algorithms can route connections in BðN; x; p; Þ with a slow-down factor Oðlg N lg lg NÞ on all these realistic parallel machine models, though some have topologies that are quite different from others, whose structural complexities are no larger than that of one plane in BðN; x; p; Þ.
Compared with sequential algorithms, we consider that our algorithms on realistic parallel computers provide a significant speedup, making them potentially valid and useful for large switches. The approach of applying edge-coloring techniques to investigate the capacity and routability of RNB switching networks has been widely used (refer to [6] , [13] , [19] , [22] ). We extended this approach to SNB networks by defining strong edge-coloring. For a class of RNB and SNB banyanbased switching networks obtained by horizontal expansion and vertical replication, we proposed a unified mathematical formulation, namely, WEC and SEC problems, for designing parallel routing algorithms using this approach. Our algorithms can find the solutions for WEC problem in polylogarithmic time and SEC problem in sublinear time. Finding faster parallel algorithms for WEC and SEC problems, especially for the SEC problem, however, remains to be very challenging.
The results of this paper have valuable architectural implications for the design and implementation of future large-scale electronic and optical switching networks. Scalable nonblocking switching networks tend to have no self-routing capability. For example, for a nonblocking switching network BðN; x; p; Þ, though self-routing capabilities exist in a portion of it, its routing is still computation intensive. Therefore, for the design of a switching network, in addition to its hardware cost in terms of the cost of SEs and interconnection links (and wavelengths), we must take the routing complexity into consideration. It remains a great challenge for finding lowcost high-speed nonblocking switching networks. S.Q. Zheng received the PhD degree from the University of California, Santa Barbara, in 1987. After being on the faculty of Louisiana State University for 11 years, he joined the University of Texas at Dallas in 1998, where he is currently a professor of computer science, computer engineering, and telecommunications engineering. Dr. Zheng's research interests include algorithms, computer architectures, networks, parallel and distributed processing, telecommunications, and VLSI design. He has published approximately 200 papers in these areas. He served as the program committee chairman of numerous international conferences and the editor of several professional journals. He is a senior member of the IEEE.
. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
