Abstract
Introduction
Interconnection networks have many different applications, including but not limited to, being used as interconnects for communications among processors and between processors and memory modules in a multiprocessor or multicomputer system, and as a switching network within a network router or switch. Roughly speaking interconnection networks are classified into two classes, direct (routerbased) networks and indirect (switch-based) networks [3] . A typical indirect interconnection network is a multistage interconnection network (MIN). In this paper, we consider MINs in the context of their being used as switching networks. We investigate their ability of simultaneously realizing one-to-one I/O mappings in the form of permutations.
A switching network usually comprises a number of electronic or photonic switching elements (SEs) grouped into several stages interconnected by a set of wires or optical links. A photonic switching network can be built from ¾ ¢ ¾ electro-optical SEs such as common lithium-niobate (LiNbO ¿ ) SE (e.g. [4, 5] ). Each SE is a directional coupler with two inputs and two outputs. Depending on the amount of voltage at the junction of two waveguides, optical signals carried on either of two inputs can be coupled to either of two outputs. An electronically controlled optical SE can have switching speed ranging from hundreds of picoseconds to tens of nanoseconds [15] . However, due to the nature of optical devices, optical switches hold their own challenges. One problem is crosstalk, which is caused by undesired coupling between signals carried in two waveguides so that two signal channels interfere with each other. Fig. 1 shows an example of crosstalk in an SE. Each SE has two logic states, namely, straight and cross (see Fig. 1 (a)). For the straight state, a small fraction of input signal injected at the upper input may be detected at the lower output (see Fig. 1 (b) ). Crosstalk can also occur when an SE is in the cross state. Consequently, the input signal will be distorted at output due to crosstalk accumulated along connection path. by increasing the number of SEs in a switching network (e.g. [8, 9, 14, 16, 17, 18, 19] ). Nonblocking networks have been favored in switching systems since they can set up any one-to-one I/O mapping. There are three types of nonblocking networks: strictly nonblocking (SNB), wide-sense nonblocking (WSNB) and rearrangeable nonblocking (RNB) [1, 6] . In both SNB and WSNB networks, a connection can be established from any idle input to any idle output without disturbing existing connections. In SNB networks any of available conflict-free paths for a connection can be chosen and in WSNB networks, however, a rule must be followed to choose one. The high degree of connection capability in SNB and WSNB networks is at a high hardware cost. RNB networks, usually constructed with lower hardware cost, can establish a conflict-free path for the connection from any idle input to any idle output if the rearrangement of existing connections is allowed. A network is self-routing if any connection is established only by the addresses of its source and destination regardless of other connections. A self-routing network can be either blocking such as a Banyan network or nonblocking such as a crossbar.
In a switching network, when more than one input requests to be connected with the same output, output contention occurs. Output contentions can be resolved by switch scheduling. For a set of connection requests without output contentions, the process of establishing conflict-free connection paths to satisfy these requests is called switch routing. A switch routing (or simply, routing) algorithm is needed to find these paths. Once a set of conflict-free paths is found, the SEs on these paths can be properly set up. Routing algorithms play a more fundamental role in WSNB and RNB networks since the nonblockingness depends on them. For SNB networks, routing algorithms tend to be overlooked, since a conflict-free path is always guaranteed for the connection from any idle input to any idle output without rerouting the existing connections. An efficient routing algorithm, however, is still needed to find such a conflict-free path for each connection request. Any routing algorithm requiring more than linear time would be considered too slow. Thus, finding efficient algorithms to speed up routing process is crucial for high-speed switching networks.
Recently, a class of multistage nonblocking switching networks has been proposed. In this class each network, denoted by ´AE Ü Ô «µ, has relatively low hardware cost and short connection diameter, Ç´AE ½ Ð AEµ and Ç´Ð AEµ respectively, in terms of the number of SEs 1 . Self-routing in Ä´AEµ is decided by the destination of each connection. If the´Ò µ-th bit, Ò ½ , of the destination equals to ¼, the input of the SE that the connection path enters in stage is connected to the SE's upper output, and otherwise (i.e., Ò ½ ½ ) to the lower output. Since two adjacent stages are connected by shuffle connection, the unique path for each connection can be derived.
If Baseline network is used for photonic switching, it is a blocking network since two connections may pass through the same SE, which causes node conflict. Even if Baseline network is used for electronic switching, it is still a blocking network since two connections may try to pass through the same input (resp. output) link, which causes input (resp. output) link conflict. Fig. 2 shows two connection paths È ¼ from ¼¼½¼ to ½¼½½ and È ½ from ¼½¼¼ to ½¼½¼. È ¼ and È ½ have the output link conflict in stage ¾ and input link conflict in stage ¿ because both two active inputs of SE in stage ¾ intend to be connected with its lower output and both active outputs of SE in stage ¿ intend to be connected with its upper input; they have node conflicts at SEs and in stages ¾ and ¿, respectively. Although a Baseline network is blocking, a nonblocking network can be built by extending it in three ways: horizontal concatenation of extra stages to the back of a Baseline network, vertical stacking of multiple copies of a Baseline network, and the combination of both horizontal concatenation and vertical stacking [8, 9, 17, 18] . In the general approach, a network is constructed by concatenating the mirror image of the first Ü´ Ò µ stages of Ä´AEµ to the back of a Ä´AEµ, then vertically making Ô copies of the extended Ä´AEµ (each copy is called a plane), and finally connecting the inputs (resp. outputs) in the first (resp. last) stage to AE ½ ¢ Ô splitters (resp. Ô ¢ ½ combiners). Specifically, the -th input (resp. output) of the -th plane is connected with the -th output (resp. input) of the -th ½ ¢ Ô splitter (resp. Ô ¢ ½ combiner), which is connected with the -th input (resp. output) of this network. We denote a network constructed in this way by ´AE Ü Ô «µ, where « is crosstalk factor. That is, « ¼ if the network has no crosstalkfree constraint and « ½ if the network has crosstalkfree constraint. Clearly, ´AE ¼ ½ « µ is a Baseline network and ´AE Ò ½ ½ « µ is a Benes network [1] . Fig. 3 shows an example of ´½ ¾ ¿ « µ, which contains three planes of ´½ ¾ ½ « µ, and each ´½ ¾ ½ « µ contains two extra stages. 
Graph Model

I/O Mapping Graphs
Let Á and Ç be the sets of AE inputs, denoted by We say that an input (resp. output, link, SE) is active if it is on a connection path, and idle otherwise. An I/O mapping from Á to Ç is one-to-one if each Á is mapped to at most one Ç and ´ µ ´ µ for any . In this paper, all I/O mappings are one-to-one and all connections belong to a one-to-one I/O mapping.
If a connection path does not have any link (resp. node) conflict with other connection paths, it is called a link conflict-free (resp. node conflict-free) path. Clearly node conflict-free path is also link conflict-free, but the converse is not true. If a set of connections can be set up by conflict-free paths in ´AE Ü ½ « µ, these connections are In Fig. 4, (a) shows an I/O mapping with ¿¾ inputs, 25 of which are active; (b) shows the I/O mapping graph ´¿¾ ¾ µ of (a), where each of Î ½ and Î ¾ of ´¿¾ ¾ µ has vertices and each vertex includes inputs (resp. outputs) belonging to the same modulo-input (resp. output) group; (c) shows all components of ´¿¾ ¾ µ in (b).
Graph Coloring and Nonblockingness
If we set up connections in ´AE Ü Ô «µ one by one by sequential algorithms, the time complexity for establishing Ã connections is ª´Ã ¡´Ð AE · Üµµ since it takes ª´Ð AE ·Üµ time to set up one connection. For a large number of connections, the time required is more than Ç´AEµ, which is not acceptable for real-time applications. Parallel processing techniques can be used to speed up routing in ´AE Ü Ô «µ. We say that two connections share a modulo-input (resp. output) group if their sources (resp. destinations) are in the same modulo-input (resp. output) group. Let us study the connection capability of ´AE Ü Ô «µ first. It is easy to verify that Lemmas 1 and 2 are true according to the topology of Ä´AEµ (refer to [12] for formal proofs). Using the above two lemmas, the following claim can be easily derived from the results of [12] . By Lemma 3, if we assign the connections of ´AE Ü Ô «µ with sources (resp. destinations) passing through the same modulo-input (resp. output) group to different planes, then we can route connections in ´AE Ü Ô «µ without conflict. Thus, in order to set up conflict-free connections in ´AE Ü Ô «µ, we first need to determine which plane to be used for each connection. If we think the colored (resp. uncolored) edges in ´AE Ã µ as the existing (resp. new) connections in ´AE Ü Ô «µ, a solution to the Ï problem is a plane assignment for routing in an RNB network since we can reroute existing connections in such a network, and a solution to the Ë problem is a plane assignment for routing in an SNB network since rerouting existing connections is not allowed in such a network. Clearly, for the same ´AE Ã µ, ½ ¾ . 2 The definition of weak edge coloring is the same as the definition of edge coloring in graph theory. Thus we omit "weak" in the following of paper. The above claim is implied by the results of [12] . It is important to note that the minimum value of Ô in Lemma 4 equals to the value of in Lemma 3, where Ô is the number of ´AE Ü ½ « µ planes required for ´AE Ü Ô «µ to be rearrangeable nonblocking.
Lemma 3 Given a connection
By Lemmas 3 and 4, if we assign the connections (including existing and new connections) sharing the same modulo-input/output group to different planes, the connections are feasible for every assigned plane. Then, the routing can be completed by setting up conflict-free connection paths within each plane.
Lemma 5 Every bipartite graph has a ¡-edge coloring, where ¡ is the degree of .
By Lemma 5 (see a proof in [2] ), if we set ¾ Ò Ü·« ¾ in ´AE Ã µ, the plane assignments for a set of connections in RNB ´AE Ü Ô «µ can be solved by finding aedge coloring of ´AE Ã µ since the degree of ´AE Ã µ equals to .
Algorithm for Balanced ¾-Coloring of
´AE Ã µ
In order to solve Ï problem efficiently, we present an algorithm for a problem, named balanced 2-coloring problem: given an I/O mapping graph ´AE Ã µ, color its edges with ¾ colors so that every vertex is adjacent to at most ¾ edges with one color and ¾ with the other.
We choose to present our parallel algorithms for a completely connected multiprocessor system since any algorithm for this parallel computing model can be easily transformed to algorithms on more realistic multiprocessor systems. A completely connected multiprocessor system of size AE consists of a set of processing elements (PEs) È , ¼ AE ½, connected in such a way that there is a connection between every pair of PEs. We assume that each PE can communicate with at most one PE during a communication step.
Initially, each PE reads ´ µ from input , sets value of ½ in PE ´ µ as , and then performs the following two steps.
Step 1. Divide the I/O mapping graph ´AE Ã µ into a set of components. This step can be done by each edge finding its left edge and right edge ½´ ´ µµ.
Step 2. Fig. 4(c) shows the Rep's of all possible types of components, where the Rep's of each component are marked as dark lines and edges are labeled by their corresponding inputs Step 2 can be done by coloring edges with the Rep's as references using the pointer jumping technique in [7] . At the beginning, each edge sets its pointer to point to the right edge of its left edge if it exists and to itself otherwise. By doing so, for a closed component or an open component with more than one edge, two disjoint directed cycles or paths are formed, each containing a Rep. For an open component, furthermore, the end pointer of every directed path is pointing to one of the Rep's. For example, Fig. 4(d) shows that the directed cycles and paths formed from the components of Fig. 4(c) . Then, by performing Ð Ã ¾ times of parallel pointer jumping, each edge finds the Rep belonging to the same directed cycle or path. Finally, each edge can be colored by comparing the value of the Rep found by itself with that by its neighbor. That is, if the value of the Rep founded by an edge is no larger than its neighbor's, color the edge with red; and otherwise color it with blue. Fig. 4(b) shows a balanced ¾-coloring of an I/O mapping graph of Fig. 4(c Proof. Given an I/O mapping graph ´AE Ã µ, Step 1 can be done in Ç´½µ time using a completely connected multiprocessor system of AE PEs. In Step 2, since the length of each directed cycle or path is at most Ã ¾ , each edge can find a Rep by Ð Ã ¾ times of pointer jumping. Clearly, all edges in the same directed cycle or path are colored with the same color since they find the same Rep. The pointer initialization implies that each edge and its neighboring edge are in different directed cycle or path, and thus, they have different colors. By the definition of left/right edge, there are no more than ¾ pairs of neighboring edges incident at any vertex of ´AE µ. Thus, the coloring of all components compose a balanced ¾-coloring of ´AE µ.
Therefore, a balanced ¾-coloring of any ´AE Ã µ can be found in Ç´Ð Ãµ time. 
Algorithm for -Edge Coloring of ´AE Ã µ
Based on the balanced 2-coloring algorithm, a Ï solution to any I/O mapping graph ´AE Ã µ with no more than colors can be found as follows. Initially, we remove all colors for Ã ¼ already colored edges. In initial step (i.e., step 0), we find a balanced 2-coloring of ´AE Ã µ using colors ¼ and ½, and let ¼ and ½ be the graphs induced by the edges with colors ¼ and ½, respectively. In step 1, if the degree of ¼ and/or ½ is no less than ¾, we find a balanced 2-coloring for ¼ using colors ¼¼ and ¼½, and/or 
Parallel Routing in a Plane
We have shown how to assign a plane to each connection in an RNB ´AE Ü Ô «µ. In this section, we show how connections are routed within each plane. 
Lemma 6
Routing for Strictly Nonblocking Networks
Strict Nonblockingness
The following lemma can be easily derived from the results of [18] . For an SNB network, we can set up new connections (as long as these connections form an I/O mapping from idle inputs to idle outputs) without disturbing the existing ones; however, this routing problem is by no means to be simpler than that in an RNB network when we need to set up the new connections simultaneously. In this section, we present a parallel algorithm based on graph coloring to speed up routing time.
Based on the discussions in Section 3, we know that the 
Algorithm for Strong´¾ ½µ-Edge Coloring of ´AE Ã µ
A matching is defined as a set of edges that does not contain any adjacent edges. Conceptually, a strong´¾ ½µ-edge coloring of ´AE Ã µ with Ã ¼´ Ã µ colored edges can be done in the following two steps.
Step 1: find a set of matchings in ´AE Ã Ã ¼ µ;
Step 2: color matchings one by one without changing the existing colors.
It is easy to see that the edges with the same color com- Proof. In ´AE Ã Ôµ, we assume the edges corresponding to the existing connections in the -th plane of ´AE ¼ Ô « µ have been colored with color and the edges corresponding to the new connections have not been colored yet. By Theorem 5, we can find a strong´¾Ô ½µ-edge coloring of ´AE Ã Ôµ in Ç´Ð Ô Ð Ã · Ô Ð Ôµ time using a completely connected multiprocessor system of AE PEs. We assign each new connection with color to the -th plane of ´AE ¼ Ô « µ. By Lemma3, these new connections can be set up by self-routing in Ç´Ð AE µ time.
¾
By Lemma 7, we can derive the minimum number of planes, Ô Ñ Ò , in ´AE ¼ Ô « µ.
Compared with ´AE ¼ Ô Ñ Ò « µ, the hardware redundancy of ´AE ¼ Ô £ « µ is shown as follows. The hardware cost of ´AE ¼ Ô £ « µ, in terms of the number of SEs, is higher than that of ´AE ¼ Ô Ñ Ò « µ in half of the cases, but both have the same hardware complexity of ¢´AE ½ Ð AE µ. The routing time for setting up Ç´AE µ connections, however, is improved to sublinear Ç´ÔAE Ð AE µ from ª´AE Ð AE µ.
Concluding Remarks
One major contribution of this paper is the design and analysis of parallel routing algorithms for a class of nonblocking switching networks, ´AE Ü Ô «µ's. Although the assumed parallel machine model is a completely connected multiprocessor system of AE PEs, the proposed algorithms can be transformed to algorithms for more realistic parallel computing models. The pointer jumping and binary searching, which dominate the complexity of the proposed algorithms, can be reduced to sorting on realistic parallel computing structures. It is interesting to note that the sorting can be implemented in Banyan-type network in Ç´Ð ¾ AE µ time [10] . Thus the proposed algorithms can set up connections in ´AE Ü Ô «µ with a slow-down factor Ç´Ð ¾ AE µ on a Banyan-type network, whose complexity is no larger than one plane of ´AE Ü Ô «µ.
