Abstract-An
Introduction
The well-known SRAM-based FPGA architecture [3, 5, 61 consists of an array of 2-D Logical Blocks (L-cells) separated by vertical and horizontal channels, each with W (called channel density) prefabricated wire segments (tracks) for routing, see Figure 1 . Each track within a channel is assigned an integer in { 1, . . . , W ) as its track ID. There is a connection box (C-box) in the channel area between each pair of adjacent L-cells, and a switch box (S-box) at each intersection of a vertical and horizontal channels. Both C-boxes and S-boxes contain programmable switches.
When an FPGA is used to realize a specified Boolean function, the pins used to realize the Boolean function are partitioned into groups (called nets). Then the pins in each group (net) are connected together by using available wire segments and switches in both C-boxes and S-boxes. This process is referred to as a routing. Conventionally, the routing process is divided into two subsequent steps, global router and detailed router, although there is no absolute need for doing routing in these two phases. The global router specifies various connection topologies for all nets, while the detailed router decides assignments of wire segments and switches used to materialize the complete routing. As the connectivity within a C-box is complete, the routability of the entire chip only depends on the structure and connectivity of the S-boxes [l, 3, 4, 5, 8, 9, 11, 12, 13, 141 . It is clearly desirable to design switch boxes with maximized routability and the minimum number of switches.
The Universal Switch Module proposed in [5] is routable for all possible global routings surrounding an S-box. However, there is a restriction that this model assumes the case of 2-pin nets only. In this paper, we propose a new view of global routings, and a powerful graph model for the most general F'PGA routing problems covering multi-pin nets (including nets with 2 3 pins) and being adaptable to the optimum routing problems covering the entire chip.
In order to complete a detailed routing for an entire chip, a greedy routing architecture has been proposed in [ l l , 141 . The approach starts the detailed routing from a pre-specified S-box.
Then routings of adjacent C-boxes are determined and serves as the predetermined side(s) of other neighboring (and unmapped) S-boxes. This process is repeated (propagated) until the entire chip routing is done. Depending on the propagation order (e.g., either spiral or snake-like [ l l , 14] ), the process can be decomposed into a sequence of h-side-predetermined, k-sided S-box design problems for k = 3 , 4 and 0 5 h 5 k. With this design scheme, we need to design an h-side-predetermined, k-sided S-box, where 2 5 k 5 4 and 0 5 h 5 IC, that can accommodate any global routing (called being hyper-universal) with the minimum number of switches. For simplicity, we call a O-sidepredetermined k-sided S-box a k-sided S-box. In this regard, the well-studied Xilinx-based S-box shown in [lo] and the Universal Switch Box [5] belong to the 4-sided routing models.
In our formulated graph model G for a k-sided S-box, the track with ID j on the i-th side is denoted by a vertex vi,j and each switch&he S-box is represented by an edge. The global routing specified for an S-box is represented by a collection of subsets (nets) of { 1 , 2 , . . . , k}. A detailed routing of a net is represented by a subtree of graph G with vertices representing the pins on different sides. For example, a net of a global routing is represented by the set { 1,2,3}, if it connects three wire segments located on sides 1,2, and 3, respectively. A detailed routing of this net will be represented by a tree of three vertices with its ends in (v1,jlj = 1 , . . . , W } , {vz,jIj = 1 , . . ., W} and { vg,j I j = 1, . . . , W } , respectively. Therefore, the switch box design problem becomes a k-partite graph design problem, that is, to design a IC-partite graph with the minimum number of edges which can realize any global routings. This flexible mathematical model can also be generalized to h-side-predetermined, k-sided S-box design problems with k 2 5, and 0 5 h 5 k, which is useful for potential routing problems involving multiple routing dimensions.
Definitions and Problems
The terminology and symbols of graphs are referred to [2] . Let 
. . , k} appears d times. A localized global routing in practice may not be balanced but we can always make it balanced by including some singletons (1-pin nets). An r-bounded global routing is a global routing in which the size of each net is at most P. The case when r = 2 has been used as the target model in the design of universal switch modules [5] .
There are two major advantages of representing k-way global routings as a collection of subsets of { 1, . . . , k}. One is that we can make use of the theory and methods in combinatorics, the other is that such a representation actually is a hypergraph and a BGR is a regular hypergraph. This hypergraph representation will help us gain valuable ideas and simplify our presentation.
Let k 2 2, W 2 1 be integers and vi = {wi,jlj = 1 , . . . , W} for i = 1,. . .,k. A IC-partite graph on (VI,. . . , Vk) is a graph with vertex set U t = l x and each is an independent set for i = 1, . . . , k. We denote a k-partite graph on (VI, . . . , v k ) with edge
(1) T ( N i ) is a tree of IN; I vertices, and We note that a k-partite graph G with W vertices in each part is hyper-universal if it contains a detailed routing for any primitive and balanced localized (k, W)-GR ((k, W)-PBGR). To see this, let R be a (k, d)-GR with d 5 W . We first add some singletons to R to make a (k, W)-BGR, then by combining some singletons, we obtain a (k, W)-PBGR, say R'. Find a detailed routing of R' in G. A detailed routing of R can be derived from the detailed routing of R' by simply deleting the edges of those one-edge trees representing the nets of size two in R' which are obtained by combining the unequal nets of size 1. Therefore, to verify that a (k, W) S-box is hyper-universal, we only need to show that each (k, W)-PBGR is routable in the S-box.
Our approach to the S-box design problem depends on a powerful decomposition property of localized global routings. Note that f ( k ) is uniquely determined by k by Lemma 1, and it is equal to the maximum density of all minimal k-way BGRs. Let GR1 and GR2 be two localized k-way global routings and m a positive integer. We denote the disjoint union (as a multiple set) of GR1 and GR2 by GR1+ GR2, and the union of m copies of GR1 by mGR1.
Example: Let G R = {{l,Z}, {1,3}, {1,4}, {2,3), {2,3), {3,4}} which represents the localized global routing in Figure 2-(a) . G R is a localized (4,4)-GR which is not balanced. GR'= G R + { { l ) , {2}, {4), (4)) is a(4,4)-BGR.GRtcanbe transformed into a PBGR (not unique) GR" = {{ 1,2), { 1,3), {1,4}, {2,3), {2,3), {3,4), {1,2), (41, (4) ). This PBGR can be decomposed into the union of three minimal, localized PBGRs {1,3}, {4}, (4)). Figure 2 shows the transformation in hypergraph notation, where the dashed link represents the 2-pin net obtained by combining two singletons, while Figure 3 shows a (4,4) S-box and a detailed routing of GR" in the box.
In the rest of the paper, a global routing refers to a localized global routing for simplicity.
GR" = {{1,2)9 (3,411 + {{1,4), {2,3)) + {{1,21, (2, 319 
e(36,W) = O(W)
For any integer k 2 2, let { P I , . . . , rt} be the set consisting of all densities of minimal k-way global routings. Then ri 5 f(k), and t depends only on k, where f(k) is determined by Lemma 1.
Let p ( k) be the least common multiple of r1, . . . , rt.
Our goal is to design all 12-sided HUSBs for any fixed k with minimized number of switches. The idea is to design a few ksided HUSBs and combine these S-boxes to obtain a (k, 
( E ( F ( k ,
It follows that e ( k , W ) = O(W) for a fixed k.
I
4 Optimum (36, W)-HUSBs fork = 2 , 3
Our basic method for solving the optimum (k , W)-HUSB design problem is first to give a lower bound of e ( k , w), then to find a k-partite graph with the number of edges equal to the lower bound and to prove that it is hyper-universal. The following theorem is obvious. 
Theorem2
The graph G ( 2 , W) with vertex set {q,jlj = 1,. . ., W} U {vz,jlj = 1 , . . ., W} and edge set {vl,ivz,i14 = 1 , . . . , W } is an optimum (2, W)-HUSB, and e(2, W) = W. 
The proof of Theorem 3 also gives an eficient algorithm to finda detailed routing of a (3, W ) -G R in G ( 3 , W ) .
and e(3, W) = 3W. Figure 5. H ( 4 , 2 ) and ~( 4 , 3 ) .
(4, W)-HUSBs
In this section, we first give a lower bound estimation of e ( k , W). Then we construct three ( 4 , W)-HUSBs. The first two are connected with one being 4-regular. The third design is not connected but contains less number of edges, which provides an upper bound for e(4, W). We note that IE(H(4, W ) ) ( = 6W, which is the lower bound given by Theorem 4. It is easy to verify that H ( 4 , 2 ) is a (4,2)-HUSB, and therefore it is optimum. But H ( 4 , 3 ) is not a ( 4 , 3 ) To design and verify a (4, W)-HUSB, we need to find all minimal 4-way PBGRs. By Lemma 1, we can find all the 35 different minimal PBGRs, which are classified into eight equivalent classes by the permutation group S 4 . In the following we list only one representative from each equivalent class. We use G7Z; to denote the class of minimal (4, i)-PBGRs of type j . The number of elements in a class is represented by a superscript.
Let GI = K4, Gz = H(4,2), G3, G4, G5 G6 and G7 be as in Figure 7 . Then we have IE(G1)l = 6, IE(G2)I = 12, IE(G3)I = 20, IE(G4)I = 26, IE(G5)I = 32, (E(Gs)I = 40 and JE(G7)I = 46. We note that G3 contains vertex disjoint G1 and G2; G4 contains two vertex disjoint Gz's; G5 contains vertex disjoint Gz and G3; Gs contains three vertex disjoint G2's; and G7 contains vertex disjoint G3 and G4.
Lemma 3 Gi is a (4, i)-HUSB for 1 5 i 5 7.
Proof. It is obvious that G1 is hyper-universal. Let GR be a (4, i)-PBGR. We need to show that GR is routable in Gi for 2 5 is 7. For i = 2, GR is either a member in 872: U OR: U ER: or a union of two members in QR: U GRi U 672.;. It is easy to verify that G2 is a (4,2)-HUSB.
Let i = 3. If GR is a member of GR: U !3$"R we can verify that G3 contains a detailed routing of GR. If GR is a union of a (4,l)-PBGR and a (4,2)-PBGR, then G3 contains a detailed routing of GR since G3 contains vertex disjoint subgraphs G1 and G2.
For i = 4, if GR is a union of a (4,1)-PBGR and a minimal (4,3)-PBGR, then we can verify that G4 contains a detailed routing of GR. Suppose GR is a union of two (4,2)-PBGRs. In this case, G4 contains a detailed routing of GR as G4 contains two vertex disjoint G2's.
For i = 5, GR can always be decomposed into a union of a (4,2)-PBGR and a (4,3)-PBGR. It is easy to see that G5 contains a detailed routing for GR since G5 contains vertex disjoint G2 Let i = 6. If GR is a union of three (4,2)-PBGRs, then Gs contains a detailed routing of GR as Gg contains three vertex disjoint Gz's. If GR is a union of two (4,3)-PBGRs, then we can verify that G4 contains a detailed routing of GR.
Finally, let i = 7. Then GR can always be decomposed into a (4,3)-PBGR and a (4,4)-PBGR. G7 contains a detailed routing of GR since G7 contains vertex disjoint subgraphs G3 and G4. 
I

Conclusion
We have developed the first general mathematical model that covers the multi-pin net perfect routing and design problems for arbitrary-dimension FPGA switch boxes. Under the new models, the switch box design problem is formulated as an optimum kpartite graph design problem. The new models have many advantages. Firstly, it simplifies the representations of the global and detailed routings and makes it possible to use techniques in graph theory and combinatorics to attack the problem. As a result, optimum 2-way and 3-way S-boxes have been obtained. Secondly, the new model of the global routing has a powerful decomposition property which enables us to construct large (k, W)-HUSBs by combining a few number of smaller k-sidedHUSBs. This construction has led to a very low cost (4, W)-HUSB. The decomposition property also guarantees the existence of polynomial time algorithm for detailed routing in the universal S-boxes we have designed. Thirdly, the new models enable us to generalize the k-way S-box design for k = 2 , 3 and 4 in 2D-FPGA to general (k, W)-design problem with k p 5, which can be directly applied for the higher dimension (2 30) switch box designs. The theory developed here can also be used to solve various switch box design problems, like h-side-predetermined, k-sided switch boxes and switch boxes for r-restricted global routings that can be applied for designing the non-homogenious greedy routing structures aiming for optimum routings covering the whole chip.
