The routing channels of an FPGA consist of wire segments of various types providing the tradeoff between performance and routability. In the routing architectures of recently developed FPGAs (e.g., Virtex-II), there are more versatile wire types and richer connections between them than those of the older generations of FPGAs (e.g. XC4000). To fully exploit the potential of the new routing architectures, it is beneficial to perform wire type assignment for all channels as an intermediate stage between global routing and detailed routing. In this paper, we present a wire-type assignment algorithm that is based on iteratively applying min-cost maxflow technique to simultaneously route many nets. At each stage of the network flow computation, we have guaranteed optimal result in terms of routability and delay cost. We use the routing architecture of the Virtex-II FPGAs from Xilinx as a target architecture in our experiments. Experimental results show that our algorithm outperforms the traditional sequential net-by-net approach.
INTRODUCTION
As the gate density of FPGAs grows larger, the routing architectures of FPGAs have changed significantly. The numPermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ber of routing segments per channel has greatly increased, and the structure of the switch blocks have become very complicated to accommodate more connections between the wire segments. To take advantage of the tradeoff between interconnection delay and routability, most of the recent FPGAs have several types of routing wires in which each type of wire exhibits different length and connectivity. Longer wire segments are intended for high-fanout, time-critical signal nets. On the contrary, shorter wire segments are intended for short connections to avoid wasting routing resources. These wires with versatile set of length are placed in hierarchically designed channels or placed in the same channels using different kinds of switching components [3, 14] . To utilize routing resources even more efficiently, some of the recently developed FPGA architectures adopt several different wire segment types with different connectivity even though they have the same length [14] . Because routing is the dominating factor in determining the performance of the overall FPGA system and because the routing resources take up a significant portion of the chip area, it is very important for the routing algorithms to fully exploit the potential of the new routing architectures.
Existing FPGA routing algorithms can be classified into two categories. One of them is a two-stage approach in which global routing and detailed routing are performed sequentially [4, 6, 11, 15] , and the other is a one-stage approach in which only a detailed routing algorithm is used [9, 10, 12, 13] . Detailed routers use detailed graphs to model the target architectures. Because they are applied to the detailed graphs that model each component of the routing resources directly, detailed routing algorithms can give very accurate routing results. However, these algorithms may not be able to handle big graphs, which model the routing architectures of the recent FPGAs. For example, the largest Virtex-II FPGA from Xilinx contains close to 60 million edges and 6 million nodes in the routing graph [8] , and the graph size will grow for future FPGAs. In a two-stage approach, the global router abstracts the details of the routing architecture, and the routing is performed on a coarser graph to assign a series of channels to each path used by a signal, and the detailed router routes each signal on the detailed graph along the channels determined by the global router. Although this approach may be able to handle large graphs, it is possible that the coarse routing result determined by global routing may not be accurately refined into the underlying detailed routing. This is especially due to The problem of assigning proper types of wire segments to nets was incorporated in the global routing stage in [6, 15] , and it was performed in the detailed routing stage in [4] . But in both cases, wire type assignment is applied to conventional XC4000-style routing architectures where different types of wire segments are sparsely linked, and they used net-by-net approaches, which may suffer from net ordering problems.
In this paper, we consider the wire type assignment as a procedure in a separate stage between global routing and detailed routing. We modeled our Virtex-II-style architecture with a wire type connection graph in which each node abstracts all the wire segments from each CLB (configurable logic block) and each edge represents all the connection switches between the wire types. The wire type connection graph is coarser than the graph used by typical detailed routers, and it is finer than the grid routing graph used by most of the global routers. As the nature of the graph suggests, the wire type assignment stage can play a role of reducing inconsistency between global routing and detailed routing.
With global routing result given for each net, our algorithm solves the problem of wire type assignment for all the net segments in a channel. First, we find the wire type assigned routes for all the net segments from one CLB to all the other CLBs in a channel using a polynomial-time exact algorithm. Our algorithm is based on min-cost flow computations, and it is guaranteed to find a congestion-free wire type assignment solution if one exists. Furthermore, it can find a solution with minimum total delay at the same time. Applied on each CLB in a channel iteratively, it provides a randomized polynomial-time algorithm to find the wire type assigned routes for all the net segments in a channel. The optimality of our algorithm also can be very helpful for incremental improvement of the routing result through the interaction with the global router or the detailed router. Although we targeted the Virtex-II-style routing architecture, we believe our algorithm can be used in most of the recent architectures that feature various wire types.
The rest of the paper is organized as follows. Some preliminary notation and the architectural model used in this paper are introduced in Section 2. The wire type assignment problem is defined in Section 3. In Section 4, we present the flow network graph construction scheme and our network flow based algorithm for assigning wire types to nets. Experimental results are shown in Section 5, and our conclusions are in Section 6.
PRELIMINARIES
The model for FPGAs assumed in this paper is an arraybased FPGA which is similar to the Xilinx Virtex-II architecture. It is quite different from the XC4000-style architecture used in most of the FPGA routing algorithms [4, 6, 9, 11, 13] . As shown in Figure 1 , it consists of configurable logic blocks (CLBs) and interconnection wires. A CLB consists of a switch matrix, logic blocks, and internal interconnections. The switch matrix in a CLB performs the role of a switch module and connection modules of XC4000-style architectures. The connections among global wire segments and the connections between the global wire segments and logic module pins are made within the switch matrix. Each of the vertical and horizontal channels has several types of wire segments, and all channels have the same structure. Figure 2 shows the wire types used in the Virtex-II. The internal CLB local interconnections from logic block outputs to logic block inputs are omitted because our attention is focused on the global interconnections. Different from those of conventional XC4000-style architectures, the routing resources of the Virtex-II architecture include both unidirectional and bidirectional wires. Among the global wire segments, the long lines are bidirectional wires and they span the full height and width of the device. All other wires are directional wires. Organized in a staggered pattern, directional wires can be driven only from one end. As in the case of hex lines, lines with the same length can have different connectivity to the CLBs. Unlike the switch modules of the XC4000-style architecture, the connection topology in a switch matrix of our model is quite irregular. Because our algorithm is applied to a general graph, we do not assume any particular length, direction, or connectivity of the wire types. To elaborate the results from global routing, the graph we are using for our wire type assignment algorithm is a more detailed graph than the grid routing graph, which is commonly used in most of the global routing algorithms [5, 6, 15] . At the same time, it is much coarser than the detailed routing resource graph used in detailed routing algorithms. We modeled the routing structure in a channel of the Virtex-II style architecture with the wire type connection graph Gw(Vw, Ew).
The set of nodes Vw represents wire types or CLBs. Each node, which corresponds to a wire type, encapsulates all wire segments of the same type driven from a CLB. If two wire segments have the same length, direction, and connectivity, we consider their wire type to be the same. To model the starting point and the end point of signals in a channel, each CLB is modeled with two nodes, a CLB source node and a CLB sink node. The set of edges Ew represents architecturespecific connections among the wire types or connections between CLBs and the wire types. Although the actual connections between the wires are made at a switch matrix in a CLB, the connections between the wire types in a channel are represented by edges between the nodes. An edge between a wire type node and a CLB sink node represents the connections from the wire segments of that type to the input pins of logic blocks or to the wire segments of other channels. Similarly, an edge between a CLB source node and a wire type node represents the connections to the wire segments of that type from the output pins of logic blocks or from wire segments of other channels. Figure 3 (a) shows a partial view of a routing structure in a channel, and its graph representation is shown in Figure 3 (b). For simplicity, only single lines and double lines of one direction are shown in this figure. Node ti's are the CLB sink nodes, and si's are the CLB source nodes. The single lines and double lines driven by ith CLB are represented by node wi1's and wi2's, respectively. Edges between wij's and t k 's model the connections from type j wire segments which are driven by ith CLB and connected to kth CLB. From this wire type connection graph, we construct the flow network [7] which will be used in our algorithm.
PROBLEM DEFINITION
In two-stage routing algorithms for array-based FPGAs, global routing is performed on a coarser grid routing graph where the edges represent channels and the nodes represent connections between the neighboring channels. After global routing, each net can have a global route from its source pin to sink pins in terms of the sequence of channels and switch blocks it encounters. The detailed router decides wire segments and connection switches along the global route for each path. But the selection of a proper wire type for each wire segment along the route is very important for both effective use of resources and timing performance of the final routing results.
Because all the switching blocks as well as logic blocks are included in CLBs, each globally routed portion of a net within a channel (we call it a net segment) can be expressed with an interval between the two CLBs it spans in the form of (index of source CLB node, index of target CLB node), which is denoted by a spanning interval of a net segment. Figure 4 shows an example of a global routing result for 3 nets. Each CLB is indexed from left to right horizontally, and bottom to top vertically. Net1 is routed through the (3, 2) portion of the vertical channel V 1 and the (2, 3) portion of the horizontal channel H1. Similarly, net2 is routed through the (1, 3) portion of H1 channel, and net3 is routed through the (3, 2) portion of H2 channel and the (2, 1) portion of H1 channel. In H1 channel, there are 3 net segments, and their spanning intervals are (2, 3), (1, 3) , and (2, 1). By finding a path from a CLB source node to a CLB sink node for a net segment in the wire connection graph, we can assign proper types of wires to route the net segment in a channel. We can alleviate the net ordering problem, which can be found in some net-by-net approaches, by finding paths for all the net segments from the same CLB source node simultaneously.
Because no routing resource can be shared by different nets in an FPGA, wire types should be assigned to net segments such that the number of net segments assigned to a wire type should not exceed the capacity of that wire type. To make the route feasible, connections between wire segments along the route should be available. Hence we can associate capacity to each node and edge of the wire type connection graph G w (Vw, Ew). Because all the wire segments (or connections) of a type have the same delay, we can associate delay cost to each node and edge of Gw with the delay value of corresponding routing resource. Note that we assume that all the connection switches are buffered switches, which holds true for actual Virtex-II FPGAs.
Before we define the problems of this paper, we first define the following notations for a given channel with m CLBs.
• Ss = {s1, s2, ..., sm}, where si is a CLB source node of the ith CLB.
• St = {t1, t2, ..., tm}, where ti is a CLB sink node of the ith CLB.
• N i denotes a set of all net segments from the ith CLB.
• Nij denotes a set of all net segments whose spanning interval is (i, j).
• Ii denotes a set of spanning intervals of all the net segments in Ni.
• rw(e) denotes the routing resource capacity of edge e. It corresponds to the number of available switches connecting two wire types.
• rw(v) denotes the routing resource capacity of node v. It corresponds to the number of wire segments of the type modeled by v.
• cw(e) denotes a nonnegative integer value delay cost of edge e, which is obtained by scaling actual delay at the switch modeled by e to an integer.
• cw(v) denotes a nonnegative integer value delay cost of node v, which is obtained by scaling actual delay at the wire segment modeled by v to an integer.
We define the problems of this paper as follows:
The Wire Type Assignment for One Source CLB (WTAO) Problem: Given a wire type connection graph G w (V w , Ew) with capacity costs and delay costs associated with the nodes and edges, a source CLB node si s , a set of net segments Ni s , and a set of spanning interval Ii s , find a path from si s to the end point of a spanning interval of every net segment in Ni s such that each edge and node is used no more than its capacity while the total delay cost of all net segments is minimized.
By solving the WTAO problem for all the CLBs in a channel, we can solve the following problem:
The Wire Type Assignment (WTA) Problem: Given a wire type connection graph Gw(V w , Ew) for a channel, a set of all net segments in a channel, and a set of spanning intervals of all net segments, find a set of paths connecting two end points CLBs of spanning interval of every net segment in the channel such that each edge and node is used no more than its capacity while the total delay cost of all net segments is minimized.
ALGORITHM DESCRIPTION
In this section, we describe the algorithms to solve the problems introduced in the previous section. By performing the min-cost max flow computations on the flow network which is constructed from Gw, our algorithm for the WTAO problem, WTAO NF, can solve the WTAO problem exactly in polynomial time. We also solve the WTA problem by applying WTAO NF iteratively on each CLB in a channel. To alleviate the influence of the order of the source CLB selection, we adopt a randomized iteration scheme.
The Wire type assignment for one source CLB (WTAO)
Given WTAO problem, we construct the flow network G(V, E) as follows:
1. V = Vw ∪ {s, t}, where s is a source node, and t is a sink node of G(V, E). for edges e(tj, t), r(e) = |Ni sj | for other edges e, r(e) = rw(e)
E = Ew
4. Node Capacity: r(v) = rw(v) ∀v ∈ Vw, node s and node t are incapacitated.
Edge Cost:
c(e) = cw(e), if e ∈ Ew 0, if e ∈ Es ∪ Et 6. Node Cost:
The constructed flow network for the WTAO problem is illustrated in Figure 6 . To make our problem conform to the classical network flow framework, we transformed G(V, E) to a directed graph in which only edges have capacities and costs. Note that any undirected edges, which can be formed due to bidirectional wire segments, in Gw(V w , E w ) can be transformed to a pair of directed edges with the cost and the capacity of the original undirected edge [2] . By node splitting transformation, any node i with nonzero cost and capacity is transformed into the two nodes i and i . This transformation replaces each of the original edges (j, i) and (i, k) into (j, i ) and (i , k), respectively. It also adds an edge (i , i ) with the cost and the capacity of node i. Figure 5 shows an example of the network transformations.
It can be shown that any flow in G is a wire type assignment solution for a subset of the given net segments. Each flow from s to t through si s and tj corresponds to a wire type assigned route for a net segment in Ni s j . The occupied capacity of a node in Vw is the number of used wire segments of that type, and the flow amount along an edge in Ew is the number of used connection switches. If a flow f exists and |f | = |Ni s |, then we can find a feasible solution for all the net segments in Ni s , and the cost of the flow is the cost of a solution to the WTAO problem. Since we assigned |N isj | to each edge (tj, t) ∈ Et, the total capacities of edges connected to the sink node t are
Algorithm WTAO NF
there is no feasible solution to the WTAO problem, and the min-cost maximum flow assigns wire types to the routes for as many net segments in Ni s as possible with minimum total delay costs. The following theorem shows that the wire type assignment for one source CLB problem can be exactly solved by a network flow computation on G. From a flow in G, a solution to the WTAO problem can be derived by a depth-first search from each CLB sink node to the source node in G. Figure 6 shows a flow f corresponding to a WTAO solution for 4 net segments in N2. Figure 7 summarizes our WTAO NF algorithm.
There are several polynomial-time optimal algorithms available for finding a min-cost maximum flow in a network [2] . Deriving a solution to the WTAO problem from a flow can be done in O(E) time. Thus, the WTAO problem can be solved efficiently as stated in the following theorem, if we adopt the double scaling algorithm [1] .
Theorem 2. The WTAO NF algorithm exactly solves the WTAO problem in O(V EloglogRmaxlog(V Cmax)) time, where
Rmax is the maximum value of the capacities and Cmax is the maximum value of the costs.
The time complexity of our algorithm is mainly dependent on the number of nodes and edges in G(V, E). Because we abstracted all the wire segments of the same type as one node and all the connections between wire segments between a pair of types as an edge, the number of actual wire segments only affects Rmax term in the time complexity. Therefore the runtime of our algorithm does not grow seriously with the increased number of wire segments. Assign costs and capacities 6.
Run min-cost max-flow algorithm on G(V, E) 7.
Derive the corresponding wire type assigned routes from the computed flow 8.
Adjust capacities end In this section we solve the wire type assignment problem for a channel (WTA). We apply the WTAO NF algorithm iteratively on all the CLBs in a channel to solve this problem.
Given a wire type connection graph Gw(Vw, Ew), a source CLB is selected and WTAO NF is applied to solve the WTAO problem. The flow network G is generated only once during the whole procedure. After getting solution for the WTAO problem, the capacity of each node and edge along the obtained routes is adjusted by subtracting the size of the flow from the original capacity. Then, a new CLB is selected as a source CLB and the WTAO NF algorithm is applied after updating the capacities of edges in Es and Et according to the interval information of the net segments from the new source CLB. This procedure is repeated until all the CLBs in a channel are selected as source CLBs. Note that there is no change in the nodeset and edgeset of G for each step, and only the capacities of some edges are updated.
Because the availability of the routing resources are updated over iterations, the ordering of the source CLB selection can influence the final result. To reduce the effect of this ordering, we randomly select the source CLB in every step. To enhance the result, we iterate the whole procedure several times. Due to the optimality of WTAO NF, the result obtained by the current iteration is guaranteed to be no worse than the result from previous iterations.
The optimality of WTAO NF can be very helpful for incremental improvement of the routing result, especially when WTAO NF is used interactively with the global router or the detailed router. Suppose there are nets that violate some timing or congestion constraints. After ripping up those nets and rerouting them using global router, WTA NF can be applied to resolve the violations. Figure 8 summarizes our algorithm for the wire type assignment problem for a channel, WTA NF.
EXPERIMENTAL RESULTS
We have implemented our algorithms in the C++ programming language on a SUN Sparc Ultra 5 (360MHz) with 128M memory. The experiments were performed on 7 randomly generated global routing results. They were routed on 7 different sizes of channels. The size of a channel in the smaller 6 examples are the same as that of some of FPGAs in the Virtex-II family. We added an example with 256 CLBs in a channel, and it is twice as big as the largest Virtex-II FPGA. We assumed that the number of wire segments per CLB is the same for all the FPGAs, which holds true for actual Virtex-II FPGAs. We used the same number and types of wire segments and connection switches as those in Virtex-II FPGAs.
Because there is no standalone wire type assignment algorithm available, we compared our algorithm with a net-bynet approach in which each net is randomly selected and routed along the min-cost path. We compared runtime, number of routed (and wire type assigned) nets, and the sum of the delays of all the net segments of a channel. For each test circuit, we ran the algorithms 5 times. The iteration number used in WTA NF was 4 for each run. The average results are shown in Table 1 . In all the cases, WTA NF successfully routed and assigned wire types to all the net segments. The total delay of net segments was improved significantly. Because the number of routed nets is different between the two approaches, the delay cost per net is also listed and the improvement is calculated. We could get up to 47.5% (average 33.3%) improvement on the delay cost per net. The runtime of WTA NF listed in Table 1 is the total runtime for 4 iterations.
CONCLUSIONS
In this paper, we have proposed the wire type assignment as a separate stage between global routing and detailed routing, and presented a randomized polynomial-time algorithm for wire type assignment of all net segments in a channel. By routing all the net segments from a CLB at the same time, our algorithm can greatly alleviate the net ordering problem that can be observed in net-by-net approaches. Furthermore, the total delay of the net segments is also mini-mized, which can contribute to meeting the overall timing constraints of the circuit. Although our algorithm is intended for the net segments in a channel, it can be used for assigning wire types to all the nets in an FPGA by handling all the channels. We compared our algorithm with a net-bynet algorithm and the experimental results shows that our algorithm is very effective.
