Abstract : It is well known that the solution quality of the detailed routing phase is heavily influenced by the order in which nets are routed. To alleviate this situation a number of routing strategies have been developed that ripup and reroute (R&R) previously-routed nets that "block" the current net. In the R&R approach, there is not a significant amount of control over the solution quality (e.g., length, delay) for the ripped-up nets. In this paper we propose a detailed router ROAD (bump&Refit based OptimAl Detailed router) that explores the solution space using an approach called bump-and-refit (B&R) in which the global routes of prior-routed nets are not changed but their track assignments are systematically altered in order to make space for the current net being routed. B&R thus does not have the above drawback of R&R. We start with an initial depth-first search method for this purpose that is optimal in finding a detailed routing solution with the minimum number of tracks irrespective of the net routing order. We then develop various optimality-preserving speedup methods including search space pruning based on clique detection and learning about and remembering unsuccessful search spaces, and second-level or lookahead transition costs. The combination of these methods results in an average speedup of 604 for small to medium VPR circuits and an extrapolated speedup of more than 5763 for larger circuits. Furthermore, comparison of ROAD run times to that of VPR's estimated detailed routing phase show that we are almost two times faster than VPR. This is noteworthy because an optimal detailed router is able to obtain solutions in reasonable times which are also faster than those of a non-optimal (though effective) router.
Introduction
Efficient routing is important for the purpose of reducing total wiring area and/or the lengths of critical-path nets for performance optimization. Both metrics are impacted by the detailed routing phase. A major impediment to effective detailed routing for FPGAs is the net ordering problem, wherein the realization of the global routes of nets within the routing track resources is heavily dependent on the order in which the nets are detailed-routed [9, 7, 10] . A similar problem exists for maze routing in VLSI chips [11, 8, 12] . Though a number of novel approaches using the general approach of ripup and reroute (R&R) of previously routed nets that "block" or "collide" with the current net has been proposed and developed [4, 3, 5, 6] , the problem has not yet been completely solved for either routing environments.
In this paper, we introduce a new approach to detailed routing that uses a bump-and-refit (B&R) strategy to solve the detailed routing problem for FPGAs with i-to-i switchboxes optimally (i.e., detail route a set of given global net routes of a circuit in the minimum number of tracks) irrespective of the order in which the nets are routed. The B&R approach along with a depth-first search (DFS) algorithm was proposed originally in [1, 2] for the purpose of incremental routing. A similar B&R approach can be used for complete detailed routing, since the routing of the current net in the presence of the previously routed nets is qualitatively an increThis work was supported in part by NSF grant CCR-0204097. mental routing problem. Quantitatively, however, truly incremental routing versus the application of an incremental routing model to complete routing differ significantly in the frequency of application of the basic incremental router. In the former, it may need to be applied to 1-10% of the nets, while in the latter, it is applied to almost 100% of the nets. A straightforward application of the DFS B&R incremental algorithm of [1, 2] to the complete routing problem results in slow to extremely slow solution times on a set of medium to large VPR circuits. To alleviate this problem, we develop here a suite of optimality-preserving speedup methods that results in speedups of several orders of magnitude. We thus obtain a detailed router that is optimal as well as time-efficient. It is also noticeably faster than VPR's detail routing phase [13] , which implies that the run-time of our router is quite reasonable even though it is optimal.
The rest of the paper is organized as follows. The motivation behind the B&R approach to complete routing is given in Sec. 2. Section 3 discusses the basics of the B&R methodology proposed in [1, 2] and explains how such a methodology can be used to realize an order-impervious optimal detailed router for FPGAs (in the rest of the paper we assume FPGAs with i-to-i switchboxes as also assumed by the VPR router). Optimality-preserving speedup methods for the B&R DFS algorithm are then developed in Sec. 4 . Experimental results for a set of small to large VPR benchmark circuits are given in Sec. 5 and we conclude in Sec. 6 . Figure 1 shows completed routings of nets n 1 to n 4 in a channel routing scenario, and the requirement to route another net n i that cannot be routed even with doglegs unless some of the previous net routings are undone or equivalently if the nets had been routed in a different order, say, (n 1 n 2 n 3 n i n 4 ); Fig. 1b shows a successful routing of all nets in this order. Thus to be able to route nets in an optimal and order impervious manner, a routing algorithm should be able to reverse routing decisions made for previously routed nets. Note that reversal of routing decisions on an existing net n j in the detailed routing context is equivalent to "bumping" n j from its current track to another track to make room A2   A1  B1  C1  D1   D2  C2   B3   T2  T1->T3   n3  T0->T2   n4  T2   n5  T3->T0   T0->T1   n1  T1   T1   2   3   1   2   4   D_Sp   T3   T2   T1   n6   T3   T2   T1   SP   Dynamic spare for   n6 when n2 moves   (b)   n2   T1   T3   T1  T1  T0  T3   T0   T1 T1 T0   T2   T0   T3 T2   T0   T0   T1   T2 Bumping of n2 by n1
Motivation for B&R
Figure 2: (a) Routing in an FPGA, and a B&R process for new net n 1 connecting cells A3 and C1; for simplicity pin connections of existing nets are not
shown. Nets are shown by dark or dotted lines on the tracks. Numbered arrows from existing nets show the sequence of bumpings to accommodate the routing for n 1 . (b) Searching the OG for a converging transition DAG for the O-net n 2 . Track labels at each end of an edge in the OG refers to the track on which the neighboring net corresponding to that edge lies.
for the new net(s) so that a more efficient fitting of nets to tracks is obtained. Thus, for example, the routing solution of Fig. 1b can be obtained from the configuration of Fig. 1a by new net n i bumping net n 4 which is then moved to the bottom track. Thus the ability to bump previously routed nets can circumvent the net ordering problem. Furthermore, B&R does not change the topology of existing nets while finding an optimal detailed routing solution for a new net (see Sec. 3), and thus preserves their electrical properties to a large degree 1 .
Bump-and-Refit Algorithms for FPGAs
We first define some terminology pertaining to FPGAs; Figure 2a illustrates many of these terms. We define a channel in an FPGA as the set of all track segments between two adjacent switchboxes (SBoxes) of the FPGA; see Figure 2a . Each channel has the same number t of tracks, which we denote by T 0 T 1 : : : T t 1 . The length of a track segment is the number of channels it spans before it needs to connect to another segment via a SBox. For simplicity of exposition, we describe our B&R algorithm for FPGAs with track segments of length one and SBoxes with i-to-i inter connection capabilities.
Basic B&R
For setting the stage for applying B&R to complete routing, we discuss here the basic B&R approach and relevant concepts for incremental routing. The "incremental routing problem" can be stated as follows. For each new net n i to be incrementally routed, if the required track segments decided by the detailed router is vacant, then no existing nets are disturbed. Note that the detailed router will always prefer to use vacant track segments in the channels assigned to n i . However, if this is not possible, then it may use one or more segments that are occupied by other net(s), and the routing of n i will cause a "bumping" of these set of nets called the occupying set; each net in this set is called an occupying net (O-net) . In Figure 2a , n 1 is a new net connecting cells A3 and C1. Note that it has been routed in a way to minimize bumpings and that there is no shortest route possible for n 1 that uses only vacant track segments. The given routing of n 1 bumps only one net n 2 ; it is thus the only O-net. In general, refitting solutions for each O-net can be explored independently and in any order to arrive at a feasible refitting of all bumped nets (see Theorem 1 given later). In the sequel, without loss of generality, we thus describe the B&R process for a single O-net.
The O-net needs to be moved out of its current track to make space for the new net. We use n T j i to denote a net n i on track T j . Let a transition be defined as the movement of net n i on a track T j to another track T k , and is denoted by n
. This transition may result in net n i bumping into one or more nets on track T k . These nets in turn will have to move out of their current track T k , giving rise to a transition for each of them. This transition sequence is shown in Figure 2b by dark arcs, where net n 2 initiates a set of transitions which finally terminate in "spare" track segments, which are vacant segments of appropriate total lengths into which bumped nets can move without bumping other nets. We introduce the concept of an overlap graph (OG), which is a graph representation of the circuit routing in the FPGA. The OG is an undirected graph with the circuit nets represented by the nodes of the graph. In the overlap graph OG(V,E), the set of nodes V = S f n 1 n 2 : : : n m g, where each n i is a routed net of the circuit and S is the set of "spare" track segments as described above (please note that a set of vacant segments of a track T j is a spare only with respect to specific net(s) that can be moved to T j without bumping any nets on that track). There exists an edge between n i and n j in the OG if nets n i and n j share a channel in the FPGA. Figure 2b shows the OG for the routing of Fig. 2a . In the OG, for e.g., nets n 2 and n 6 have an edge between them since they are routed through a common vertical channel to the right of cell B3.
The OG can be used to determine if the required converging T-DAG exists. Since the OG represents the circuit routing in the FPGA, a T-DAG is a DAG embedded in the OG (the undirected edges of the OG become directed arcs in the direction of the transitions; see Fig. 2b ). Thus a converging T-DAG rooted at an O-net can be determined by performing a search on the OG until all leaf nodes of the search DAG are spare nodes. This process is illustrated in Figure 2 for a small circuit and for a single new net n 1 . The corresponding O-net n 2 transits from T 1 to T 3 and bumps into n 5 . The movement of n 2 from T 1 creates a "dynamic" spare node (labeled by D Sp in Figure 2b ) for net n 6 . The bumped net n 5 then transits from T 3 to T 0 where it bumps n 6 and n 3 . n 6 then transits to the above dynamically created spare on T 1 , while n 3 transits to its spare track segments on T 2 . Thus a converging T-DAG is determined in the OG. The transition arcs are shown dark in Figure 2b and numbered chronologically in the order in which they are traversed in the search process. Figure 3 shows the final routing of the FPGA after the bumping sequence converges.
An Optimal Depth-First B&R Algorithm
A B&R incremental routing algorithm Conv-T-DAG ( Figure 4 ) that performs a depth-first based search in the OG for a converging T-DAG rooted at the O-net was developed in [2] . For a transition of the O-net to some track T k , it recursively searches for converging T-DAGs rooted at each net on T k bumped by the O-net. A depth-first path terminates in success if a spare node is reached, and in failure either when all OG nodes have been visited in that path or a cycle is detected (an ancestor along the current path is revisited). When an n T j !T k i transition fails in this manner, the search backtracks and tries an unexplored transition n
. The following result (stated slightly differently from that in [2] ) establishes the optimality of Conv-T-DAG [2] .
Theorem 1 [2] If converging T-DAGs rooted at the O-nets bumped by a new net routing exist among the currently used tracks of the FPGA, they will be found by calling Conv-T-DAG for each O-net in any order.
While an existing converging T-DAG will ultimately be found by Conv-T-DAG, it will be time-efficient if some suitable "cost" measure can be used to determine which transitions are more likely to be successful so that fewer T-DAGs are searched and backtracked. A good cost measure will consider both the "magnitude" of bumpings (total length of bumped nets) and the likelihood of convergence of these bumpings. Two transition cost (TC) measures evaluated are as follows:
, where ad j T k (n i ) are the neighbors of n i in the OG that are on track T k , and l(n j ) is the total length of n j in terms of the track segments (each of length 1) that it occupies. This heuristic is reasonable, but only considers the bumping magnitude. For example, according to it, it is equally costly to bump a net of length 9 as it is to bump 3 nets each of length 3. However, the latter case has a higher likelihood of convergence since there is greater flexibility in moving 3 bumped nets than a single net of the same total length. This leads to the next cost function.
Using such TC functions to guide the search results in time-tosolution reduction by an order of magnitude compared to a "blind" depth-first search [2] . We term the above TC functions as 1st-level TC functions. Time Complexity The worst-case complexity of Conv-T-DAG is equal to the maximum number of paths in the OG starting from any node u, since in the worst-case all paths starting from the Onet will be explored in the DFS process in the algorithm. Let b be the branching factor (average number of neighbors of a node) of the OG, m the number of nodes in it (i.e., number of nets in the circuit), and L the length of the longest path in the OG (e.g., 
L = O(m), this complexity becomes O((b 1) m ).
Note that this complexity does not take into account the speedup methods discussed in Sec. 4 that we incorporate into the DFS process. Our empirical results shown in Sec. 5.2 suggest that the average-case complexity of the fast version of Conv-T-DAG is linear (it actually shows that the average-case complexity of the detailed router Route-With-B&R discussed in Sec. 3.3 that calls the fast version of Conv-T-DAG m times is at most quadratic in m). This also shows that the speedup methods developed here are very effective in pruning fruitless search spaces during DFS in Conv-T-DAG.
Applying B&R to Complete Detailed Routing
In Fig. 5 , the pseudo code Route-With-B&R for detail routing a given set of nets using the B&R method is given. For the next net n i to be routed in the channels designated by the global router, a track T k is chosen for which n i 's TC is the least. If n i does not bump any net in T k , we are done. Otherwise, a B&R solution is obtained for every O-net n c bumped by n i on T k by calling For complete detailed routing, the time-efficiency of the DFS algorithm for B&R Conv-T-DAG (Fig. 4) becomes critical as many more prior routed nets are bumped compared to an incremental routing application, thus leading to extensive DAG searches. The crux of practically applying this optimal algorithm to the complete routing problem for large FPGA circuits is to determine significant search-space prunings that will not sacrifice optimality, as well as to develop much better DFS ordering heuristics than those given by the 1st-level TC functions of Sec. 3.2. These speedup methods are discussed next.
Optimality-Preserving Speedup Methods
We describe here our search-space pruning and lookahead DFS ordering techniques that help us to reduce the search time for the B&R DFS search process of Fig. 4 by a few orders of magnitude.
Learning-Based Search Space Pruning
We first give some useful definitions. Ancestor net (AN): Figure 6a shows a search space P rooted at net B and preceded by search path τ 1 . An AN of search space P along a search path leading to the root net B of P (τ 1 in the e.g. of Fig. 6a ) is a net preceding B in the search path. In the figure, A C D K are ANs of search space P and of its root net B. Obstacle ancestor net (OAN): An OAN of a search space P rooted at net B is an AN of P that is a neighbor on the OG of at least one net in P. OANs are so called because they do not allow nets in P overlapping them to transit to tracks in which they currently lie (in the context of the B&R algorithm of Fig. 4 which do not allow cycles in in the T-DAG search process) thus becoming "obstacles" to the movement of nets in P. In Fig. 6a , nets A D K are OANs of search space P. Regular ancestor net (RAN): An ancestor net of a search space that is not its OAN is termed a RAN of the search space. In Fig. 6a , net C is a RAN of P.
Net vector (NV):
A net vector is a two-tuple (NS;T j ), where NS is a set of nets fn 1 ,n 2 , . . . , n k g that all lie on track T j (without overlapping each other, i.e., no two nets in NS are neighbors in the OG).
Net pattern (NP): A net pattern is a set of NVs such that that no two NVs have the same track. Ancestor pattern (AP):
An ancestor pattern of a search space P rooted at net B on path τ is a NP fNV 1 NV 2 : : : NV l g such that each net in NV i , 1 i l is an AN of P and each AN of P belongs to some NV j , 1 j l. For e.g., in Fig. 6a , for search space P rooted at B on path τ 1 , the AP is AP 1 = f(fA Dg; T 0 ) (fCg;T 1 ) (fKg;T 2 )g, while in Fig. 6b , for search space Q rooted at B on path τ 2 , the AP is AP 2 = f((A D); T 3 ) ((K X); T 1 ) (C;T 0 ) (Z;T 2 )g.
Obstacle pattern (OP):
An obstacle pattern of a search space P rooted at net B on path τ is a NP fNV 1 NV 2 : : : NV l g such that each net in NV i , 1 i l is an OAN of P and each OAN of P belongs to some NV j , 1 j l. In Fig. 6a , the OP for search space P is OP 1 = f(fA Dg; T 0 ) (fKg;T 2 )g, while in Fig. 6b , the OP for search space Q is The learning-based search-space pruning method is now described in terms of the following two fundamental results.
Lemma 1 Suppose that in a search path τ 1 , net B is bumped and there does not exist a solution for this bumping (i.e., the search space P rooted at B on τ 1 is not a converging T-DAG), then if in another search path τ 2 (that may or may not have a common subpath with τ 1 ), B is again bumped and the ancestor pattern for the search space Q rooted at B on τ 2 is a superset of the obstacle pattern of P, then there does not exist a solution for bumping B in
Proof: In Fig. 6 , assume that there is no solution on the path τ 1 when net B is bumped by net K. Let OP 1 be obstacle pattern for search space P on τ 1 , and AP 2 is the ancestor pattern for search space Q on τ 2 . Some subset P 1 of P can be OANs of B on τ 2 , another subset P 2 of P can be RANs of B on τ 2 , and yet another subset P 3 of P can be a subset of Q. Hence P=P 1 P 2 P 3 . If there exists a solution for bumping B on τ 2 , then there exists a valid track position for each net in P 3 (including B). Since the track positions available for subset P 3 of P in τ 2 is a subset of track positions available for P 3 in τ 1 (because OP 1 AP 2 ), there exists the same track positions for each n i 2 P 3 in τ 1 . Further there also exists in τ 1 the same track positions as in τ 2 for the subsets P 1 and P 2 of P in τ 1 (again because OP 1 AP 2 During the DFS process of algorithm Conv-T-DAG (Fig. 4) , if we fail on bumping some net B, we note the corresponding OP and store it as an OP for B. As long as we have not increased the number of tracks in which we are trying to find a routing solution, when we bump B again in another DFS process, we have a suitable data structure that can reasonably quickly detect if any subset of the current AP of B is isomorphic to any stored OP of B (there could be multiple, non-isomorphic OPs of B discovered during various DFS searches). If an isomorphic match is found, then from Theorem 3 we know that the bumping of B in the current search path will not give us any solution and we abandon exploring the search space rooted at B, thus saving a significant amount of search time. Subsequently, B's parent tries a different track transition and the search continues along a different path.
Clique-Based Search Space Pruning
This method dynamically determines the presence of cliques in the OG among the longer nets, which gives an indication of the minimum number of distinct tracks that would be required for successful routing of all the nets in the clique. We define the term common unusable track (CUT) as follows: Track T j is a CUT for clique C in the current search path τ, if each net in C is adjacent to at least one AN of τ on track T j . For each clique we dynamically maintain the number of CUTs. This combined with the clique size can help prune unfeasible solution spaces as specified in the next theorem. Proof: The nets in clique C needs k distinct non-CUT tracks to be routed on. Since there are m CUTs, at least k + m tracks are needed for a feasible routing of nets of C after A is bumped. As the total number of tracks t < (k + m), there is no such routing possible, and hence there is no solution to bumping A in the current search path τ. 2 Fig. 7 shows an e.g. of the situation described in the above proof.
Lookahead TC Functions
As mentioned in Sec. 3.2, the 1st-level TC functions are very useful in pursuing search paths that are more likely to be successful before others that are less likely to yield solutions. However, the 1st-level TCs can still be misleading in some cases. For e.g., let us consider transitions n
and n T 0 !T 2 1 for net n 1 , which is on track T 0 . Suppose that in the T 0 ! T 1 transition, n 1 bumps a net n 2 of length 7 and in the T 0 ! T 2 transition, n 1 bumps a net n 3 of length 4. The 1st-level TCs will favor pursuing the T 0 ! T 2 transition before trying the T 0 ! T 1 one (if the former cannot find a solution) since the 1st-level TCs are based mainly on the lengths of the bumped nets. However, not counting the single channel in which n 1 overlaps n 2 , suppose that n 2 goes through the other 6 channels that are either empty except for n 2 or very sparsely occupied by short nets. Conversely, suppose that the other 3 channels (besides the one in which n 1 and n 3 overlap) occupied by n 3 are all full and occupied by very long nets. It is clear that it will be much more difficult to find a solution in which n 3 is moved from its current track position (due to n is moved from its current track position (due to n
) to some other track where it will either not bump any net or will bump very few short ones. Thus looking ahead to the next transitions of the nets bumped by n 1 gives us more accurate information on which transitions of n 1 to try first. Essentially, for a possible transition n
, this lookahead information can be captured as some function of all possible 1st-level TCs of each net bumped by n 1 on track T j . We choose the min function for this purpose as it reflects the (1st-level) cost of the transition of a net bumped by n 1 that is most likely to lead to a solution. The 2nd-level (or lookahead) TC function is then obtained by substituting in the 1st-level TC function of n T i !T j 1 , the above min function for n k in place of the length l(n k ), for each net n k bumped by n 1 . Since we have two possible first level TC functions (sum, sqrt), we get the following 4 possible 2nd-level TC functions (for the 4 combinations of the "inner" and "outer" 1st-level TC functions used to form the 2nd-level TCs):
where x is either sqrt or sum.
In particular, using the 2nd-level TC function with the sumsqrt combination yields a speedup factor of 83 over using the best 1st-level TC function sqrt (see Table 2 ).
Experimental Results
To test the efficacy of the various speedup techniques of Sec. 4, we started from the basic algorithm Route-With-B&R ( Figure 5 ) and added different speedup methods to create the next version until all methods were incorporated. We term the general routing methodology that uses Routing-With-B&R and the various speedup methods as ROAD (bump&Refit based OptimAl Detailed router Table 1 shows the benchmark circuits that we used in our experiments. The number of nets for the circuits ranges from 147 to 4286.
Extracting VPR's Global Routing Topology
Our goal in this paper is to develop an order-impervious pure detailed router (i.e, one that performs only track assignments) that, given a global routing topology, uses the B&R paradigm to obtain a solution that is optimal in the number of tracks used. To this end, we tested all versions of ROAD (referred to collectively as ROAD whenever we are not distinguishing between the different versions) on the benchmark circuits of Table 1 by extracting the global routing topologies from the results of VPR's flat routing (global and detail routing performed in an integrated manner) [13] for these circuits, and discarding all track assignment information. As it turns out, for each circuit, VPR's route causes at least one channel in the FPGA to be fully occupied. This means that if VPR returns a solution with t-tracks, then for the corresponding global topology of the nets, the optimal track assignment should also yield a t-track solution. We ran ROAD with different net orderings and always obtained the detailed routing solutions in the optimal number of tracks specified in Table 1 . Note that VPR's overall solution may not be optimal because, say, it may not be inserting hubs in the optimal matter. ROAD is, however, constrained by the hub-based topology yielded by VPR. Thus for the given net topologies, ROAD performs optimally as it is theoretically supposed to do. Table 2 shows runtimes for various ROAD versions. For some of the circuits ROAD 1 could not complete the routings in a reasonable time (3 hr wall clock time). The runtimes show that ROAD 0 is 83 times faster than ROAD 1 while ROAD is 604 times faster than ROAD 1 for the circuits that the latter could route within the above time limit. Table 2 also shows that ROAD is 61 times faster than ROAD 0 . This gives an extrapolated speedup of 61x83=5763 for ROAD over ROAD 0 for the larger circuits in Table 1 . As can be seen, runtimes are not reported for ROAD 0 for all circuits of Table 1 due to it exceeding our prespecified time limit. However, ROAD is able to obtain routings for all circuits in Table 1 and these are reported in Table 3 (discussed shortly). Figure 8 shows the plot of ROAD run-time in secs versus the number of nets (in logarithmic scale) across the circuits of Table 1 , as well as the best linear and quadratic curve fits to this data. As shown in the figure, the quadratic function is a better fit to ROAD run-time. However, note that the coefficient of the second order term in the quadratic function is very small (3 10 5 ), and thus when the number of nets is not very large (e.g., < 4000), this term does not have any appreciable effect and the function is almost linear. When the number of nets increases, say, beyond 4000, the 2nd-order term dominates and the function is quadratic. Thus for small-to medium-size circuits, the empirical average-case time complexity of the ROAD is Θ(m), while for large circuits it appears to be Θ(m 2 ), where m is the number of nets in a circuit.
Internal Comparisons

Comparisons to VPR
To get an idea about how much time VPR spends for detailed routing and thus compare it fairly to ROAD's runtime, for each and ROAD for 10 circuits. The second row of totals show the runtimes for ROAD 0 and ROAD for all circuits (ROAD 1 could not get results for these circuits in a reasonable time -NC means not completed).
circuit in Table 1 , we obtained VPR's global routing time and subtracted it from VPR's global + detail (flat) routing time (it is not possible to run VPR in detailed routing mode only). Table 3 shows these estimated times for each circuit. For most of the circuits, ROAD has significantly smaller run-times. For the circuits in Table 3, the total time for ROAD is 2202.64 sec, while for VPR it is 4098.59 sec; ROAD thus achieves a speedup of almost a factor of two over VPR. Note again that both routers yield the same number of tracks for each circuit (see Sec. 5.1). (in logarithmic scale) on the x-axis, and the best linear and quadratic curve fits to this data. The quadratic curve fits ROAD data the best, albeit with a very small coefficient for the 2nd-order term in the function.
Conclusions
We presented a new approach called ROAD to detailed routing in FPGAs that uses the bump-and-refit (B&R) paradigm. It thereby overcomes the well-known net ordering problem of detailed routing and obtains routings in the minimum number of tracks. This is an important advancement in detailed routing technology for current and future FPGA circuits that are generally interconnect dominated. An early basic B&R algorithm searches significant portions of the routing search space, and thus tends to be quite slow when used in a complete detailed router like ROAD (B&R was initially proposed for incremental routing where the search space is significantly smaller) for medium to large circuits. We thus developed a number of optimality-preserving searchspace pruning methods and lookahead transition cost functions that yield speedups of a few orders of magnitude. Furthermore, ROAD is almost twice as fast as the estimated detailed routing phase of VPR. The relevant point here is not so much the numerical value of the speedup we obtain over VPR, but that we have been able to develop an optimal routing algorithm that has reasonable runtimes that compares favorably with a non-optimal (though effective) router. Furthermore, for very large FPGA circuits, the flat routing approach of VPR may become very time-expensive, thus calling for a two-stage global followed by detailed routing approach for such circuits. In that case, ROAD would be well positioned to be a high-quality and time-efficient detailed routing phase for such a two-stage router.
