We consider a board-level routing problem applicable to FPGA-based logic emulation systems such as the Realizer System [Varghese et al. 1993] and the Enterprise Emulation System [Maliniak 1992 ] manufactured by Quickturn Design Systems. Optimal algorithms have been proposed for the case where all nets are two-terminal nets [Chan and Schlag 1993; Mak and Wong 1995] . We show how multiterminal nets can be handled by decomposition into twoterminal nets. We show that the multiterminal net decomposition problem can be modeled as a bounded-degree hypergraph-to-graph transformation problem where hyperedges are transformed to spanning trees. A network flow-based algorithm that solves both problems is proposed. It determines if there is a feasible decomposition and gives one whenever such a decomposition exists.
INTRODUCTION
Logic simulation is indispensable for the verification of digital system designs. Recently, several logic emulation systems [Varghese et al. 1993; Slimane-Kadi et al. 1994; Maliniak 1992; Walters 1991; Yamada et al. 1994 ] that consist of a set of interconnected Field-Programmable Gate Arrays (FPGAs) [Brown et al. 1992; Trimberger 1994 ] to prototype large digital logic designs have been developed. These systems can emulate complex digital system designs several orders of magnitude faster than software simulators. As a result, FPGA-based logic emulators can verify large designs that are otherwise not verifiable by software simulators.
For logic emulation, we first partition a large design into parts, each of which can fit inside a single FPGA on the logic emulator [Chou et al. 1994; Yang and Wong 1995] . Then, board-level routing is performed to connect signals between the FPGA chips. We call this the board-level routing problem (BLRP) .
In logic emulators such as the Realizer system [Varghese et al. 1993 ] and the Enterprise Emulation system [Maliniak 1992 ], the set of FPGAs for implementing the logics are interconnected by a set of small full crossbars. Here, we address the problem of board-level routing applicable to the logic emulation systems that use small full crossbars for interconnection.
The interconnection crossbars only connect to the FPGAs but not to each other. The I/O-pins of each FPGA are evenly divided into proper subsets, using the same division on each one. The pins of each crossbar are connected to the same subset of pins from each FPGA. Thus crossbar S is connected to subset S of each FPGA's pins ( Figure 1 ). As many crossbars are used as subsets in a FPGA, and each crossbar has as many pins as the number of pins in a subset times the number of FPGA chips. An interchip net can be connected via crossbar S if its net pins in different FPGA chips are all assigned to I/O pins in subset S. A net pin can be assigned to any subset type of I/O pins. In the rest of the paper, all "net(s)" should be understood to be "interchip net(s)," since here we are only interested in routing interchip nets. Optimal algorithms for board-level routing, when all nets are twoterminal nets and the I/O-pin subset size is even, were proposed independently by Chan and Schlag [1993] and Mak and Wong [1995] . The algorithms connect each net via a crossbar by assigning both its net pins to I/O pins of the same subset type. One hundred percent routing completion is guaranteed. However, we also show in Mak and Wong [1995] that the problem of assigning net pins to I/O pins, so that all net pins of the same net have to be assigned to I/O pins of the same subset type, is NP-complete in the presence of multiterminal nets. Here, we propose a way to relax the constraint to allow net pins of the same multiterminal net to be assigned to I/O pins of different subset types, and to complete the connection of the net by connecting some net pins to more than one I/O pin on a FPGA.
In Section 3, we will present a network flow-based algorithm to decompose multiterminal nets into sets of two-terminal nets when there are some extra I/O pins available, so that the two-terminal net BLRP algorithm can But first, in Section 2, we introduce a hypergraph-to-graph transformation problem closely related to our multiterminal net decomposition problem.
BOUNDED-DEGREE HYPERGRAPH-TO-GRAPH TRANSFORMATION
We are interested in the problem of transforming a hypergraph to a graph by modeling each hyperedge as a spanning tree, so that the degree of each vertex v in the resultant graph does not exceed some given bound v . We call this the bounded-degree hypergraph-to-graph transformation problem. Figure 2 shows a transformation of a hypergraph to a graph where the degrees of all vertices are bounded by 3. Each hyperedge is transformed to a spanning tree that connects all the vertices in the hyperedge. In general, the degree bound v can be different for different vertex v.
To model a hyperedge of p(Ն 2) vertices as a spanning tree that connects the p vertices, the sum of the degrees of the vertices in the spanning tree must clearly be 2( p Ϫ 1) and the degree of each vertex must be at least one. On the other hand, we will prove that given any vector 
PROOF. It is clear that the condition is a necessary one, since any spanning tree on a set of p vertices has exactly ( p Ϫ 1) edges that connect Fig. 2 . Hypergraph-to-graph transformation; (a) a hypergraph with three hyperedges: e1, e2, and e3; (b) a graph formed by combining three spanning trees corresponding to the three hyperedges in part (a).
Board-Level Multiterminal Net Routing
• all the vertices. We will prove that it is sufficient by induction on the vector dimension p.
Base case. When p ϭ 2, by conditions (i) and (ii) both d 1 and d 2 must be equal to 1. Clearly, a spanning tree with a single edge can be constructed which satisfies the degree specification vector d ϭ (1, 1).
Induction step. Assume the lemma holds for p ϭ r where r Ն 2. Let
rϩ1 be a vector that satisfies conditions (i) and (ii). We want to prove that d is a valid degree specification vector. Without loss of generality, we may assume
, which is equal to 2(r Ϫ 1) since d satisfies (i). Since d satisfies (ii) and d k Ն 2, we have dЈ i Ն 1 for i ϭ 1, . . . , r. Hence dЈ ʦ N r satisfies (i) and (ii), and is a valid degree specification vector by the induction hypothesis. Thus there exists a spanning tree TЈ of r vertices whose degrees are specified by dЈ. Let u be the vertex in TЈ whose degree is dЈ kϪ1 (ϭ d k Ϫ 1). If we add a new vertex to TЈ and connect it to u, we will get a spanning tree of r ϩ 1 vertices whose degrees are exactly the r ϩ 1 elements of d. Hence d is a valid degree specification vector. e Note that there may be more than one spanning tree satisfying a given valid degree specification vector. For example, Figure 3 shows two spanning trees for the degree specification vector (1, 1, 1, 1, 4, 2, 2).
We present a simple algorithm for constructing a spanning tree of the smallest height from a set of vertices, given the degree specification of each vertex and a particular vertex as the root. (We will see in the next section that it is advantageous to obtain a minimum height spanning tree to minimize the delay of a net.)
Minimum Height Spanning Tree Construction Algorithm
Input: A set of n vertices and their degrees, and a vertex r chosen to be the root of the spanning tree. Output: A minimum height spanning tree of the n vertices with r as the root.
Order the vertices so that v 0 ϭ r and degree (v 1 
The correctness of the above algorithm follows from the following Lemma.
LEMMA 2. Given the degree specification for each vertex in a tree and the tree root r, the tree height is minimized when the degree of any vertex at depth k from the root r is no less than those vertices at depth k
PROOF. Let T be a minimum height tree with root r satisfying the degree specification and let the degree of each vertex w be equal to d w . Suppose there exist vertices u at depth k, and v at depth k ϩ 1 in T such that d u Ͻ d v for some k Ն 1. We will show how to obtain another tree with the same root, the same degree for each vertex, and the same tree height but with the positions of u and v swapped.
There are two cases, either v is a child of u (see Figure 4 (a)) or v is not a child of u (see Figure 5 Note that in both cases the tree height will be decreased by 1 or remain unchanged after the transformation. And since T is already of the minimum possible height, the resultant tree must also have the minimum height.
We can use the same argument repeatedly, after a finite number of steps we will get a minimum height tree such that the degree of any vertex at depth k is no less than those vertices at depth k ϩ 1 for all k Ն 1. And the desired result follows. e Next, we describe the algorithm for the bounded-degree hypergraph-tograph transformation problem.
Suppose we are to transform a hypergraph H ϭ (V, E) to a graph G, given the degree bound v of each vertex v in V. We construct a flow network W ϭ (ᏺ, Ꮽ) as follows. The node set ᏺ is {e 1 , e 2 , . . . , e ͉E͉ , v 1 , v 2 , . . . , v ͉V͉ , s, t} where node e i corresponds to hyperedge e i in E (i ϭ 1, . . . , ͉E͉), node v j corresponds to vertex v j in V ( j ϭ 1, . . . , ͉V͉), node s is the source and node t is the sink. For every hyperedge e i , if it connects p vertices then there is an arc from node s to node e i with capacity c(s, e i ) ϭ p Ϫ 2, and for every vertex v j connected by hyperedge e i , there is an arc from node e i to node v j with capacity c(e i , v j ) ϭ p Ϫ 2. For every vertex v j in V, there is an arc from node v j to node t with capacity c (v j 
156
• transform the hypergraph in Figure 2 (a) to a graph where the degree of each vertex is bounded by 3, the network shown in Figure 6 is constructed.
To model each hyperedge as a spanning tree so that the total degree of each vertex v in the resultant graph is bounded by v , we have to find an integral maximum flow from s to t in the constructed network [Bazaraa et al. 1990] . It is well known that if the capacities of all arcs in a network are integers, then there exists an integral maximum flow (i.e., the flow in each arc is an integer). And, in this case, a maximum flow algorithm such as the Ford and Fulkerson [1962] method always produces an integral maximum flow. In the following theorem, we show how a feasible transformation can be derived from an integral maximum flow solution. Let f (u, v) denote the flow from node u to node v. For every hyperedge e i , if e i connects p i vertices:
, the capacity of arc (s, e i ) by assumption) and f(e i , v k i ) ϩ 1 Ն 1 for all k.
1 By the construction of W, if one maximum flow saturates arcs (s, e 1 ), . . . , (s, e ͉E͉ ), then any other maximum flow will also saturate arcs (s, e 1 ), . . . , (s, e ͉E͉ ). 
MULTITERMINAL NET DECOMPOSITION
We introduce a decomposition scheme for multiterminal nets where each multiterminal net is decomposed into a set of two-terminal nets based on the bounded-degree hypergraph-to-graph transformation. We want to decompose each p-terminal net (a hyperedge) into a set of p Ϫ 1 two-terminal nets (a spanning tree). We call the two-terminal nets originated from net n the subnets of n. Consider the BLRP instance shown in Figure 7 (a) where net 1 and net 2 are four-terminal nets, and there are extra I/O-pins on FPGA chips 2, 3, and 4 (one on each of chip 2 and chip 3, and two on chip 4). We can transform this multiterminal net BLRP instance to a two-terminal net BLRP instance as shown in Figure 7 (b). Net 1 is decomposed into three two-terminal nets, namely, subnets 1Ј, 1Љ, and 1Љ Ј. Similarly, net 2 is decomposed into three two-terminal nets, namely, subnets 2Ј, 2Љ, and 2Љ Ј. The underlying spanning trees of this decomposition of net 1 and net 2 are shown in Figure 11 where vertex v i corresponds to chip i.
To see why this decomposition is useful, we first apply an optimal algorithm for two-terminal net BLRP [Chan and Schlag 1993; Mak and Wong 1995] to determine a feasible assignment of the subnets and the original two-terminal nets to the I/O-pin subset types on the FPGA chips. A solution is shown in Figure 8 . Note that it is not necessary to have all subnets of the same multiterminal net assigned to the same pin subset type. In Figure 8 , subnets 1Ј and 1ٞ are assigned to pin subset A while subnet 1Љ is assigned to pin subset B. To connect net 1, we can connect its net pin in chip 1 to the I/O pin assigned to subnet 1Ј, connect its net pin in chip 2 to the two I/O pins that are assigned to subnets 1Ј and 1Љ, connect its net pin in chip 3 to the two I/O pins that are assigned to subnets 1Љ and 1Љ Ј, and connect its net pin in chip 4 to the I/O pin assigned to subnet 1Љ Ј. In a similar way, we can connect net 2. Thus, when connecting a multiterminal net, we take advantage of the fact that a net pin inside a chip can be connected to more than one I/O pin on the chip.
If we view the set of FPGA chips in a BLRP instance as a set of vertices, and each net as a hyperedge connecting the vertices (see Figure 9) , then the 
Board-Level Multiterminal Net Routing
• decomposition of multiterminal nets into two-terminal subnets is equivalent to a bounded-degree hypergraph-to-graph transformation. Since subnets of the same multiterminal net in the same chip (vertex) will be assigned to distinct I/O pins, the degree bound of each vertex in the resultant graph should be set to the number of I/O-pins on the chip. The whole algorithm is given below. 3. Decompose each multiterminal net into subnets according to the corresponding hyperedge to spanning tree transformation.
We noted in Section 2 that there may be more than one spanning tree satisfying a given valid degree specification vector. Decomposing a multiterminal net into subnets according to different hyperedge-to-spanning tree transformations has a significant effect on the delay of a signal travelling from the source to the sinks of the net.
So when decomposing a multiterminal net, in step 2(d) of the above algorithm, we should build a spanning tree of the minimum possible height with the vertex corresponding to the chip containing the source of the net as the root of the spanning tree, using the algorithm in Section 2. This can minimize the delay of the net in the final routing solution.
For example, to decompose the multiterminal nets in the BLRP instance shown in Figure 7 (a). We first model it as the hypergraph in Figure 9 where hyperedge e i represents net i and vertex v j represents chip j. Then, we construct the network in Figure 10(a) . A maximum flow f of the network is found in Figure 10(b) . Using flow f, the degree specification vector for hyperedge e 1 is ( f (e 1 , v 1 ) ϩ 1, f (e 1 , v 2 ) ϩ 1, f (e 1 , v 3 ) ϩ 1, f (e 1 , v 4 ) ϩ 1) ϭ (1, 2, 2, 1) and the degree specification vector for hyperedge e 2 is ( f (e 2 , v 1 ) ϩ 1, f (e 2 , v 2 ) ϩ 1, f (e 2 , v 3 ) ϩ 1, f (e 2 , v 4 ) ϩ 1) ϭ (1, 1, 1, 3 ). Suppose chip 2 and chip 1 are the sources of net 1 and net 2, respectively. Using the algorithm in Section 2, we transform hyperedge e 1 into a Fig. 11 . Spanning trees to model hyperedges e 1 and e 2 .
• spanning tree T 1 of the minimum possible height that has vertex v 2 as the root where the degrees of vertices v 1 , v 2 , v 3 , and v 4 are equal to 1, 2, 2, and 1, respectively. And transform hyperedge e 2 into a spanning tree T 2 of the minimum possible height that has vertex v 1 as the root where the degrees of vertices v 1 , v 2 , v 3 , and v 4 are equal to 1, 1, 1, and 3, respectively. Trees T 1 and T 2 are shown in Figure 11 . According to the way T 1 is connected, net 1 is decomposed into subnets 1Ј, 1Љ, and 1Љ Ј where subnet 1Ј is shared by chip 1 and chip 2, subnet 1Љ is shared by chip 2 and chip 3, subnet 1Љ Јis shared by chip 3 and chip 4. And net 2 is decomposed into subnets 2Ј, 2Љ, and 2Љ Ј according to T 2 . Finally, we get the BLRP instance shown in Figure  7 (b).
To complete the story, let us briefly mention how two-terminal nets can be routed. Routing of two-terminal nets can be completed using the optimal algorithm in Mak and Wong [1995] . It starts with an arbitrary assignment of the nets to I/O-pin subset types; that is, both terminals of a net are assigned to the same type of I/O pins. If this is not a feasible assignment, which means that in some chip there are two I/O-pin subsets, one has more net pins assigned to it than its number of I/O pins, while the other still has room and can be assigned more net pins, then we will iteratively balance the number of nets assigned to such subsets. Because a netlist containing only two-terminal nets can simply be modeled by a graph as contrasted with a hypergraph, the balancing can be done efficiently by Euler circuit computation on its subgraphs. The details of the algorithm can be found in Mak and Wong [1995] .
The counterpart of Theorem 1 for multiterminal nets decomposition is given below. 
So we need to guarantee that
Therefore each chip can be up to ( x ϩ rx ϩ r 2 x ϩ . . .)% ϭ x ⁄1Ϫr% ϭ (1 Ϫ 2/N) ⅐ 100/(1 ϩ r/3 ϩ r 2 /6 ϩ r 3 / 2) % full, and it is still guaranteed that all the nets can be successfully decomposed. e
EXPERIMENTAL RESULTS
We implemented the network flow-based decomposition algorithm and used it to decompose the multiterminal nets of some randomly generated BLRP instances under different net type distributions. An architecture with 16 FPGA chips, each with 128 I/O pins was used for the tests. For each distribution, we experimentally determined how full the chips can be before the algorithm failed to give a decomposition of all nets.
The results are shown in Table I . An I/O limit of x% means that x is the largest integer such that complete decomposition was obtained for all 100 randomly generated BLRP instances, where the chips were no more than x% full. The results show that performing multiterminal net decomposition is a very viable way to solve to the BLRP. 
CONCLUSIONS
In this paper we showed that by decomposing all the multiterminal nets to two-terminal nets using some extra I/O pins, any optimal two-terminal net BLRP algorithm can be applied to complete the routing. We also introduced an efficient network flow-based decomposition algorithm for the task. We showed how to minimize the delay of a decomposed net. The experimental results confirmed that the decomposition algorithm is a very viable approach in solving the BLRP with multiterminal nets.
Our approach of allowing a net to be routed through multiple crossbars not only has the advantage of improving the routability, it also made it possible for us to come up with an efficient algorithm for routability analysis and a nontrivial condition to guarantee routability. Note that it is likely that our approach would degrade circuit performance. When a multiterminal net is decomposed into a set of two-terminal nets and if these two-terminal nets are assigned to different crossbars, net delay would be increased and source-to-sink delays might become skewed. We have attempted to address the performance issue by generating a minimumheight decomposition tree for each net. Further improvement in circuit performance can be obtained by modifying two-terminal net routing algorithms [Chan and Schlag 1993; Mak and Wong 1995] to encourage that two-terminal nets belonging to the same net be assigned to the same crossbar.
