In this paper we present a hardware implementation of an algorithm for generating node disj~t routes in a Kautz network. Kautz networks are ba~d on a fanfily of digraphs descn'bed by W.H. Kautz [Kautz 68]. A Kautz network with in-degree and out-degree d has N = d ~ + d ~-~ nodes (for any cardinals d, k> 0). The diameter is at most k, the degree is fixed and independent of the network size. Moreover, it is fault-tolerant, the connectivity is d and the mapping of standard computation graphs such as a linear array, a ring and a tree on a Kantz network is straightforward. The network has a simple routing mechanism, even when nodes or links are faulty. [nmse et al. [Imase 86] showed the existence of d node disjoint paths between any pair of vertices. In Smitet al. [Smit 91] an algorithm is described that generates d node disjoint routes between two arbi~'y nodes in the network. In this paper we.present a simple and fast -hardware implementation of this algorithm. It can be realized with standard components (Field Programmable Gate Arrays).
Introduction.
COMPAS (COMmunication in PArallel Systems) is a research project carried out at the University of Twente. The goals of this project are the design, realization and evaluation of interconnection networks for parallel systems. In this paper we consider regular interconnection networks for multi-compoter systems. A multi-computer system is defined as a collection of linked node compub ers (abbreviated as nodes), in which the nodes communicate via message passing [Dally 87 ]. We assume that each node consists of a communication processor and one or more node processors with local memory (see fig. 1 ).
The choice of ~ suitable interconnection network topology is an important issue of parallel systems. As the number of processors increases, the demands imposed on the network become more strict. There are several, sometimes conflicting, demands that have to be considered, such as:
-Diameter. The network should have a small diameter and thus a low average tl/stance.
-Degree: The degree of a node should be independent of the actual size of the network.).
Communication
Fig.l: Architecture of a node.
-Fault tolerance: The network should be faulMolerant. In a system consisting of a large number of nodes, the probab/lity of a node or a link fa/lure, cannot be neglected. In this paper we will only consider directed Kautz graphs (Kaum digraphs). The undirected Kautz graphs can be obtained from the associated digraphs by removing the orientation, and the parallel edges. Table 1 ).
The degree of the graph isfixed and independent of N. Networks of arbitrary large size can be built This means that the performance degradation due to increased routing distances resulting from faults is fairly low. Another interesting property of the network is the fact that it allows for self routing of messages, both when the network is fault free, as well as when some nodes or links are faulty. Self routing refers to the ability to route messages from node to node using only the address of source and destination (see section 3).
A Kautz digraph can emulate standard computation graphs. Fiol [Fio184] showed that Kaulz digraphs are line digraphs of Eulerian digraphs, so a Kautz digraph contains a Hamilton cycle. This implies that a linear array and a ring can be mapped on it. Furthermore a tree can be mapped on a K(d,k) digraph. Because the diameter of a K(d,k) is at most k (<dlogN) a mesh can be embedded in O(alogN).
2.3 Comparison with other graphs. Note that for the de Bruijn and Kautz digraphs the outdegree and in-degree are half the degree mentioned in the table. Thus a Kantz digraph with in-degree and outdegree 4 and a diameter of 8 connects 81920 nodes, which is significantly more than the 256 nodes in a hypercube.
3. Routing in a Kantz digraph.
Generic routing.
One of the interesting properties of a Kautz digraph is its straightforward way of routing. Th/s follows immediately from the definition and is known as the selfrouting property.
To find a route R(x,y) from x = (a, .... a~) to y = (b~ .... b0, we first select one particular out-neighbour vertex (a2 .... a~, ,bl) of x (provided b, ;~ a0. Ir~ othar words, we find the first intermediate node (a2 .... ak, bO of the route R(x,y) from x to y, by applying a "shift" operation on the letters of the source x, where a~ is shifted out and b~ is shifted in. We continue this process of shifting out source letters and shifting in destination loners uat~ ILe source word is completely repla~d by the desfir~km word (see Table 2 ). In this way we can ~y ~na-lion of source and destination words) gerrera~ a straightforward generic route R(x,y) = <al . 
Example:
Using the graph of figure 2 we find the following two routes from (120) to (201) with the Imase approach: R I = < 1201 > and R2 = < 12020201 >. {R 2 contains a loop!} Our algor/thm finds the following two routes: R3=< 1201 >and R4=< 120201 >. {R~ and R4 both have a length < k} 3.3 Informal description.
Our algorithm is based on the properly that two generated routes that depart from the source node via two different out-nodes, and arrive at the destination node via two different in-nodes, are node disjoint. Fig. 3 .: Routing strategy.
The algorithm finds all routes in three phases.
Phase A: Routes with length < k (R,). Phase B:
Routes with length k+l (R,I). Phase C:
Routes with length k+2 (Rm).
We start with generating the routes R~, after that the routes Ru and finally the routes R m are generated. Let ON(x) denote the set of out-neighbour nodes (outnodes) of x and IN(x) the set of in-neighbour nodes (/n-nodes). Let ON~(x) denote the set of out-nodes of x that have been used in a previous path and IN,(x) the set of used in-nodes. Initially ON~(x) and IN,(x) are empty.
Phase A: Routes Rx.
There fis a unique, ordered set of routes Ri with length k. Only when there is an overlap of the source and destination words, then there is a possible candidate R~ for set R~. For each route R~ we test whether the outneighbour of the route ~ ONu(x) and the in-neighbour of the route ~ IN~(y) or not. If one of the neighbour nodes was used before, the route is rejected, because it /s not node d/sjoint. Otherwise a new route is found and the neighbour nodes are added to ON~(x) and INu(y) respectively. It is straightforward that searching starts with an overlap for k-1 positions. The set of routes Rl is uniquely defined by: Note that if there are more than one routes of length k+2 then these routes are not uniquely defined.
Example:
In figure 2 Re = < 021201 > is a route from (021) to (201) via node (212) and (120). onae = (212); inRe = (120); ON(021) = 1(212), 4. Implementation.
The hardware implementation of the routing algorithm is also described in VHDL. A simplified schematic representation is given in figure 4 . The datapath consists of a match block, two shift/increment parts and a shiftblock. The used in-nodes and used out-nodes parts take care of the registration for the used in-nodes and outnodes. All blocks are supervised by the control unit.
There are external signals for initializing, for genemt-1. IS[ denotes the cardinality of set S.
ing a new route and for status information (such as: a route has been found or there are no more routes). Actually the last signal is not necessary, as it follows from the degree of the network. The route generator operates as follows:
1. In the initialization phase the source and destination are shifted into the match block and the control registers (used in-node and used oat-node) are set to their initial values. 2. The destination is shifted via the shift/increment parts to the shift block until an overlap between source and destination is found (i.e. all match units indicate a match). Only if an overlap is found and the corresponding in-nodes and out-nodes have not been used before (which is checked by the used innode and used out-node units), a new route is found. This is signalled to the environment by the control unit. When the control unit recewes the signal to generate the next route, the above process continues. 3. For routes with length k~-I the first shift/increment part generates all possible letters that can be inserted. If a letter is not used before, which is checked in the modules used in-nodes and used outnodes, a new mule is found. 4. For routes with length k+2 bath shift/increment parts generate these letters. When all out-nodes have been used, no more mutes can be generated. This is also signalled to the environment.
The VHDL description is designed hierarchically. The matchblock for instance consists of k blocks (match units) that each compare one letter of a node word. 
Conclusion.
Kaulz graphs form a class of interconnecfion networks with nice properties such as: diameter at most k (for N = d k + d~-~), the degree is independent of the network size, the network is fault-tolerant, it can embed sWanderd computation graphs and has a simple roaring algo-rithm. Kaum networks have the advantage of connecting considerably more processors for a given degree and diameter than other networks. Therefore they are an important candidate for interconnection networks in a new generation of massively parallel computer system. We have presented an algorithm that finds d node disjoint paths between arbitrary nodes in the network. We have shown that a hardware implementation of this algori~n is simple and straightforward. The algorithm is described in VHDL The hardware can be realized with standard components such as Field Programmable Gate Arrays. We aim at real/zing the route generator, together w/th a dedicated communication processor, in one single FPGA.
