A new class of routing structures with fixed orthogonal wire segments and field programmable switches at the intersections of the wire segments is proposed. In comparison with the conventional two dimensional Field-Programmable Gate Array (FPGA) routing structure, this class of routing structures has the advantage of using a smaller number of programmable switches. Using a probabilistic model, we prove that complete routing can be achieved with a high degree of probability in a routing structure of this class in which the number of tracks in each channel approaches the lower bound asymptotically. A sequential routing algorithm which is based on the solution of the single net routing problem is presented. We take into account the delay introduced by the programmable switches on a routing path and formulate the single net routing problem as a Node-Weighted Steiner Minimum Tree (NWSMT) problem in a bipartite graph G . Since our single net routing problem is NP-complete, a polynomial time approximate algorithm is proposed. We prove that our single net routing algorithm produces an optimal solution for some special classes of bipartite graphs. In general, the solution obtained by our algorithm has aperformance bound of min{A(V\Z), 121-1).
Introduction
By using programmable switches, field programmable chips offer the advantages of short production time and low prototyping cost. The ON/OFF status of each programmable switch is programmed b y the user without going through the foundry facility. Among different kinds of field-programmable integrated circuits, the FPGA is a good choice for circuits of moderate size. To implement circuits that cannot be fitted on a single FPGA, the FieldProgrammable Interconnect Chip (FPIC) was introduced [l, 81 . A large circuit is divided into several parts, and each part is implemented in an FPGA. An FPIC is then Each external pin of the FPGAs is connected to an FPIC pin through a trace on the board. An FPIC does not implement any logic function. Rather, it provides interconnecting paths between its pins (and, thus, between FPGAs) through orthogonal wiring segments. Since it is re-programmable on the board, the FPIC enables a designer to change the functionality of a circuit by altering the pin connections on the FPIC. Therefore, the FPIC offers a quick and reprogrammable means for inter-FPGA connections and thus allows fast design modification of large circuits. Figure 1 shows conventional two-dimensional FPGA routing structures proposed in [3, 4, 11, 131 containing only wire segments of unit length. Each square represents a logic block implementing logic functions. Each terminal in a block is located at the boundaries of the blocks and is connected to a wire segment. We call such a segment a terminal segment. A switch is shown as a black circle at each intersection involving a terminal segment. This kind of switch consists of one pass-transistor. Moreover, each path needs to use another type of switch shown as a white circle in each unit length. This switch consists of six pass-transistors that can implement a cross-over or a knock-knee. A routing example of a 3-terminal net is also shown in heavy lines. This routing uses six switches. Historically, in earlier Xilinx chips, major interconnection resources are called general interconnects, which run horizontally and vertically for unit length between two switches. Since programmable switches are located unit length apart in general interconnects, the number of programmable switches on a connecting path i s equal to the length of the path. These switches will introduce significant delay even for a path of moderate length. To improve upon the situation, the recently-developed Xilinx XC4000 series provides double-length lines which are twice as long of different lengths is still very limited. It is believed that there is a trend toward having a larger variety of wire segments of different lengths. We propose a class of routing structures that includes such a feature.
An example of the proposed routing structure is shown in Figure 2 . A routing example for this routing structure for the same 3-terminal net as in Figure 1 is also depicted. Note that only five switches are used on the connecting path, compared with six switches used in Figure 1 . A routing structure in this class consists of a two-dimensional as &de-length lines. However, the variety of wire segments Figure 1 . Conventional two-dimensional FPGA structure and a routing example symmetric array of blocks. In modeling the FPGA routing structure, each block is a logic block which can be as simple as a gate or as complex as a lookup table with latches implementing sequential circuits. There is a horizontal interconnection channel between every two rows of blocks, and a vertical interconnection channel between every two columns of blocks. Each channel contains a set of tracks with fixed wire segments of variable lengths in each track. Switches are located at some of the intersections between horizontal and vertical wire segments. As shown in Figure  2 , each switch can be programmed to connect the two orthogonal, intersecting segments. Thus, in our class of routing structures, a signal uses a programmable switch only when making a turn, whereas in the conventional FPGA routing structure one switch is used in each unit length.
Therefore, signal delay due to the programmable switches can be reduced dramatically. In addition to the fact that our class of routing structures is suitable for FPGA routing, it also contains the routing structure of FPIC chips as a special case. In modeling t h e FPIC routing structure, each block in our routing structure is reduced to becoming a point and represents an 1/0 pin of the FPIC chip. These I/O pins carry the signah of interchip nets and are connected on the FPIC. This precisely models the routing structure of the FPIC described in [l, 81. To model Actel's row-basecl routing structure, we regard each horizontal channel in our class of routing structures as a channel between two cell rows in an Actel chip. Vertical segments in our class of routing structures correspond to the vertical segments used as feed-throughs to bypass the cell rows and to connect nets spanning several cell rows in an Actel chip. According to [Si, the number of anti-fuses between the driver and each input is generally limited to two for each net. Therefore, if we impose the restriction that each net which connects a driver-input pair can use at most one horizontal segment in each horizontal channel, then our class of routing structures precisely models Actel's row-based FPGA routing structure.
Our model of an FPGA/FPIC chip consists of a twodimensional N x N lattice with fixed wire segments placed along the columns and rows of the lattice points. Terminals are assumed to be located at the lattice points. Following earlier analysis carried out for gate arrays [5, 123, we assume that the number of nets emanating from it logic block is a random variable X according to the Poisson distribution with mean A. Also, we assume that each net has two terminals. We can increase the value of the parameter X to approximate the case in which there are multiple-terminal nets. To simplify our discussion, the chip is assumed to be embedded on the surface of a torus, that is, column 1 and column N are adjacent and so are row 1 and row N. Note that our result can be extended when such an assumption is removed. We further assume that the rectilinear distance between the two terminals of a net is chosen independently according to a geometric distribution with mean y = NP, 0 < p < 1, as [5] . (The distance between two adjacent lattice points which are vertically or horizontally aligned is regarded as unit length.) Consider a net of length 1 emanating from the block (a,j). There are four configurations for this net. We introduce the restriction that each net can only be routed using %segment routing. (A routing is called a &segment rotkingif it uses at most two wire segments and one programmable switch). For a net of length 2, there are 1 2 possible configurations. In general, for a net of length l there are 81 --4 possible configurations: We assume each of these 81 -4 configurations is eqaally likely.
Since t h e number of nets emanating from each lattice point has mean X and the average length of each net is N P , t h e expected value of the total length of the nets emanating from a lattice point is ANP. Then, the total wire length for a l l N x N blocks is XN('+P). Thus, for successful routing, even in the case of a two-dimensional array without the restriction of using fixed wire segments of pre-determined lengths, the number of tracks in each channel is at least = XNP. The following theoremstates that O(XNP) tracks in each channel will, in fact, suffice even when fixed wire segments are used. The analytic results presented above describe the asymp totic behavior of the proposed routing structures. In reality, the number of tracks in a routing structure is fixed and the number of switches is also restricted due to area constraint. Thus, we propose an arrangement of wire segments in the rows and columns according to the proofs of the above theorems with some modification for a given number of tracks.
The length of the segments in each track is the same (except for the two incident to the boundary) and set to be a power of a constant, for example, 2. Given the number of tracks t in each row or column, the number of tracks with segments of length 2' is 3. That is, the number of tracks with segments of length 2 is f and the number of tracks with segments of length 4 is $, etc. In order to distribute evenly the routing resources, segments in these 5 tracks are arranged in a staggered manner such that segments in the (i+ 1)st track in a row (or column) are shifted right (or ai down) to the ith track by % columns (or rows) as shown in Figure 3 . Moreover, t h e positions of the switches are arranged such that a t the intersection of a column and a row, the number of switches located along each horizontal or vertical track is the same. Figure 4 shows an example where this number is 3. 
A Routing Algorithm
Our routing algorithm is a sequential algorithm which routes the nets one at a time according to the decreasing order of the timing-criticality of the nets. The algorithm is based on the solution of t h e single net routing problem which is formulated as that of finding a minimum weight Steiner tree in a graph with weights assigned to both vertices and edges. The single net routing problem can be . . . . . . Figure 4 .
formulated as follows. Given a routing structure R which consists of a set of horizontal and vertical wire segments and a set of programmable switches at some of the intersections of the wire segments, each wire segment is assigned a cost which is a weighted measure of the delay associated with the wire segment and how much the segment is in demand for routing, for example, the number of nets whose bounding boxes of terminals overlap with the segment. The delay in a wire segment is assumed to be proportional to its length and the number of programmable switches along it. Each programmable switch is assigned a fixed cost which is a function of the resistance and capacitance of a pass transistor. We then construct a graph to describe the relationship between the wire segments and the programmable switches as follows. Each wire segment in R is represented by a vertex with weight equal to the cost of the segment. Let Z E V be the set of vertices corresponding to the terminal segments of a net. We call Z a demand vertex set. Our problem is to construct a minimum weight tree of G that spans Z , or, an optimal tree. Note that a l l the vertices in Z must be in a connected component of G or else the problem will have no solution. Thus, hereafter we assume that G is connected. Since each vertex in Z contributes to the cost of the spanning tree, to simplify our presentation we let the weight of each demand vertex be 0 and simply try to minimize the total cost of the edges and the additional vertices in a tree that spans Z. Given a bipartite graph G, the demand vertex set 2, and the weight function w : (VU E ) + Rt U {0}, where w(e) has the same value for all e E E , we use N W S M T ( G , Z , w ) to denote an instance of our node-weighted Steiner Minimum Tree (SMT) problem. Our problem can be proven to be NP-complete by reducing the Planar Vertex Covering [7] problem to a special case of our problem in which (1) all vertices and edges are assigned the same weight; (2) G is a planar bipartite graph; (3) each internal face of G is of length 4; and (4) Z = X . Therefore, it is unlikely that there exists a polynomial time algorithm for solving our problem optimally. There is extensive literature on the conventional SMT problem. See [9] for a comprehensive survey. However, we know almost nothing about the node-weighted SMT problem. For the rest of this section, we study the single net routing problem.
3.1
Before describing our algorithm, we need some definitions.
Let w be a vertex in G. G\{v} A vertex U E V satisfying the conditions stated in Lemma 1 is called a removable vertex, Our algorithm has two phases.
In the first phase, we check to see if there is a removable vertex. We remove all such vertices if there are any. This usually creates one or more such vertices in the resultant graph. We repeat the removal process until there are no more removable vertices. If every vertex in V\Z in the remaining graph is essential, then we find a Minimum Spanning Tree (MST) of the remaining graph which will be our solution. Otherwise, we continue with the second phase. In the second phase, we apply the MST approach for solving the conventional SMT problem (see [9] for details) to the remaining graph obtained in the first phase, GI. The time complexity of the first phase is O(lVi4), since finding the vertices U and v described in Lemma 1 takes O(lV13) time and there are at most IVI -1 such vertex pairs. The time complexity of the first phase dominates that of the second phase, which is lZ((lE1 + (VlloglV().
(6, 2)-graph
In this section, we show that our algorithm produces an optimal tree if t h e graph G satisfies a certain chordality property, A chord of a cycle is an edge connecting two nonadjacent vertices in this cycle. A graph G is called an (3, t)-graphif and only if each cycle of length at least 8 in G has at least t chords. We have the following lemma:
Lemma 2 Given a problem instance NWSMT (G, Z, w) , if   G is a (6, .2)-graph, then either every It follows immediately thaS our algorithm produces an optimal tree for any ( 6 , 2)-graph by observing that any induced subgraph of a (6, 2)-graph is also a (6, 2)-graph. Theorem 3 Our algorithm produces an optimal tree for a problem instance NWSMT (G, Z, w), if G is a (6,2)-graph. A graph is a (6,1)(8,S) -graph if and only if each cycle of length at least 6 has at least one chord and each cyc€e of length at least 8 has at least 3 chords. Using a similar, but more complicated, argument, we can prove Theorem 4 For a given problem instance NWSMT(G, Z, w), ifG is a bipartite (6,1)(8,3) -gmph and the weights of all vertices and edges are the same, then our algorithm produces an optimal tree.
Performance Bound
The MST approach has been used extensively to approximate an optimal Steiner tree in conventional SMT problems encountered in VLSI routing [lo] , since the cast of an MST is guaranteed to be within two times the cost of an optimal Steiner tree. However, no result has been reported on using the MST approach to approximate a Node-Weighted SMT. shows that this bound is tight. Moreover, by reducing from the Approximating Set Covering [2] problem, we can prove the following theorem: Theorem 6 Given Q problem instance NWSMT(G, Z, w), let T ' be an optimal tree. The problem of obtaining a solution T such that * < c, for any constant c, i s NPcomplete.
Experimental Results
Our algorithm waa implemented in the C language and executed on a SPARC station. Because of the sequential nature of our algorithm, a rip-up and re-route phase is incorporated. The program was used to route five industrial circuits used in [3] . Let 1 be the number of tracks in each channel and k be the number of programmable switches allocated for each horizontal segment and vertical segment at the intersection of a horizontal channel and a vertical channel. In Figure 4 ,1= 6 and k = 3. For BUS, a bus controller, we use Bz-y to denote the case when 1 = z and k = y. For each I = 12,13,14, we found the minimum value of k such that our algorithm produces a 100% complete routing solution as shown in the second, third, and fourth columns of BUS was routed on an array of 13 x 12 blocks with 151 nets, or 392 equivalent two-terminal nets. The routing result in [3] used a total of 1616 active switches in the connecting paths. In our caae, we used less than 1210 in d l these instances. As shown in Table 1 and 2, the reduction is dramatic. It should be noted, however, that the number of tracks used in t h e routing structure in [3] is only 10 while our algorithm cannot produce a 100% complete routing using less than 12 tracks, even if we let A = 1. The reason is that in comparison with the Conventional FPGA routing structure, our class of routing structures is less flexible because we do not use switch blocks at all the intersections of horizontal and vertical channels as the conventional FPGA routing structure does. One should also point out that the additional delays in the connecting paths due to the increase in chip size are negligible since the delays due to the programmable switches dominate those due to metal wire segments [3] . The last three columns in Table 1 show that increasing the value of h decreases the number of switches used in our routing solution without much affecting the running time. However, the area of the chip will be increased accordingly.
Conclusions
In this paper, we propose a class of routing structures with fixed orthogonal wire segments. This class of routing structures offers rapid personalization using programmable switches at some intersections of the orthogonal wire segments. Two real examples of this class of routing structures are symmetric FPGA routing structure and FPIC routing structure. We prove that this class of routing structures is asymptotically as efficient as routing structures without fixed wire segments in a probabilistic model. We also study the single net routing problem, taking into account the delay in a routing path due to the programmable switches and wire segments. Experimental results demonstrate that, compared with the conventional two-dimensional FPGA routing structure, our class of routing structures significantly reduces the number of switches and, thus, the signal delays.
