In this paper, the FPGA routing process is explored to mitigate and take advantage of the effect of delay variability due to process variation. A new method called partial rerouting is proposed in this paper to improve the timing performance based on process variation and reduce the execution time. By only rerouting a small number of critical and near-critical paths, about 6.3% timing improvement can be achieved by partial rerouting method. At the same time, partial rerouting can speed up the routing process by 9 times compared with full chipwise with 100 target FPGAs (variation maps). Moreover, the partial rerouting enables a trade-off between product yield and routing speed.
Introduction
In recent years, Field-Programmable Gate Arrays (FPGAs) have become one of the most popular implementation media for digital circuits. As the transistor size shrinks into the submicron domain, it becomes more difficult and expensive to keep the uniformity of their electrical characteristics due to process variation -a challenge that is faced by every FPGAs user [9] . To guarantee the physical timing performance and reliability of FPGA products, either the physical yield of FPGAs chips would be compromised through a more vigorous manufacturing test, or a more conservative timing model has to be used to take variation into account [1] . Therefore, A method based on placement or routing to optimize the performance of FPGA without changing the fabrication process is preferred. Such approach is expected to increase yield of FPGAs with relative low time and computational overhead in the design flow [2, 7] .
Previous work

Full chipwise variation-aware routing
Similar to full chipwise placement, performing variation-aware routing on individual device based on its delay variation, called full chipwise routing, will yield optimal timing performance for that device. To apply full chipwise routing, each component is assigned an unique delay value based on the variation map. The limitation of full chipwise routing is mainly its overhead of execution time [4] . For example, the process to route one of MCNC benchmarks, clma, cost 12 hours in our experiment with full chipwise routing. Assuming this circuit design is applied to 100 FPGAs, the execution time should last for 50 days with one computer process. Therefore, full chipwise methodologies with traditional static timing analysis are not practical when dealing with the large number of permutations of variation maps because of overhead in run time.
SSTA routing
SSTA routing is another efficient strategy used to alleviate the impact of process variation. Instead of static and deterministic timing of switches and interconnects, the delay of each element in the routing graph is defined as a distribution which is used during the SSTA routing process to evaluate the cost and path delay [3] . The distribution of process variation is modelled based on either historical data or collected variation maps. Theoretically, SSTA can achieve better timing performance by taking process variation into account compared with variation-blind routing. The results of this method are promising except that the mismatch between delay model and actual variation map can deteriorate the timing improvement provided by SSTA routing.
Motivation of partial rerouting
For most circuits, the critical and near-critical paths, which can affect the speed/delay of the circuit, occupy only a small portion of the total number of paths. Therefore, one reasonable variation-aware routing strategy is to only reroute the critical and near-critical paths in a circuit based on variation maps. The partial rerouting procedure is performed on a base configuration generated from SSTA router. Later, the critical and near-critical paths are released and rerouted with variation-aware routing algorithm. However, unless the FPGA chip is not already highly congested and significant unused routing resources are available, such partial rerouting attempts may be in vain. To avoid such potential fruitless endeavour, two strategies are employed. First, we can reserve a portion of resources during the variation-blind routing process, which are then available for the variation-aware rerouting phase as illustrated in Fig. 1 . Second, we can release a proportion of noncritical paths during the variation-aware rerouting phase, thus increasing the available routing resources for time-critical routes [7] .
An example to explain the idea of partial rerouting is illustrated in Fig. 2 . The most critical path is represented with criticality equal to 1. To apply partial rerouting, two parameters Crit T and Non Crit T are used to control the number of paths to be released. Crit T is a threshold in criticality (top horizontal dotted line in Fig. 2 ) which selects the paths with higher criticality The criticality of paths in descending order before rerouting [8] .
Fig. 3.
The criticality of paths after partial rerouting [8] .
which may affect the delay of the whole circuit. In Fig. 2 , Crit T is set to 0.8. Non Crit T is defined as a proportion of the total number of paths (right vertical dotted line in Fig. 2 ). The purpose of this definition is to avoid releasing more paths than necessary if there is a large number of paths with low criticality. In Fig. 2 , Non Crit T is set to 20% of the 20 paths and therefore the 5 least critical paths (16-20) are released and rerouted [7] . The resultant path delays after partial rerouting is illustrated in Fig. 3 . Ideally, partial rerouting method results in a decrease of critical and near critical paths delays while causing some increase in non-critical paths delays.
Experiement and results
Experiment flow
Four routing methods, variation-blind, SSTA, variation-aware and partial rerouting are tested as shown in Fig. 4 . At the beginning, an original variation-blind placement and routing is executed to produce one placement configuration find minimum required routing channel width (CW). To compare different routing strategies fairly, four routing methods are based on the same placement configuration and 1.2 times minimum CW is fixed for every router. generated; for SSTA routing, the mean delay of routing resources and standard deviation of delay across 100 variation maps [5] are calculated at first. Later, one routing configuration is generated by SSTA router; for full chipwise routing, the variation map is used directly during routing process. 100 routing configurations are produced with corresponding variation maps; for partial rerouting, SSTA routing is executed once to produce a base routing configuration with 1.0 times of CW. After that, critical path and a small number of near-critical paths identified with Crit T are released. Also, certain non-critical paths are released to provide more routing space. With reserved channel (0.2 CW) and released resources, a variation-aware routing is performed to reroute all unconnected source and sink pair. At the end, the delay of critical paths of each method are evaluated with 100 variation maps. The stages highlighted with dotted lines show the modified routing process with variation-aware algorithm. With these constraints, the timing performance and execution time of each method can be fairly compared.
Experiment setup
Instead of estimating the information of process variation, the variation maps used in this experiment are collected from 100 Cyclone III FPGAs on DE0 boards [5] . To predict the increase of process variation in the future, process variation from variation maps is amplified to make overall variation σ/μ = 30%. We have tested 20 MCNC benchmark circuits with variationblind SSTA, partial rerouting and full chipwise variation-aware routing to investigate the potential lower and upper bound of improvement. For all routing methodologies, the same placement configuration is used. For the partial rerouting method, Crit T is set to 0.9 and Non Crit T is set to 15%. The research by [6] demonstrated that the VPR router does not provide repeatable delay for the critical path. Such variation -termed "router noise"-in the algorithm may mask the delay variation due to process variability. They proposed a method called target delay search in order to reduce this algorithmic induced uncertainty. This noise reduction method is applied in our router.
Results of variation-aware partial rerouting
The results of variation-blind, SSTA, partial rerouting and full chipwise variation-aware routing are illustrated in Fig. 5 . The timing improvement provided by variation-aware and SSTA routing in terms of delay of critical path is about 7.6% and 3.6% with 95% timing yield. Our proposed partial rerouting method with Crit T = 0.9 and Non Crit T = 15% can achieve 6.3% timing improvement. As we expected, the timing improvement made by partial routing is more than that of SSTA but less than full chipwise routing, however, the execution time is much less than that of full chipwise.
A number of interesting observations can be made from Fig. 5 . Firstly, for some benchmarks, e.g. ex5p and des, the difference between variationaware routing (the upper bound) and partial rerouting is insignificant, which shows that partial rerouting is effective in dealing with process variation. Secondly, for some benchmark circuits, e.g. alu4 and apex4, the improvement provided by partial rerouting is modest which is similar to variation-blind routing. One possible reason for this result is that the critical path of this benchmark circuit is located in one region of variation map where the delay of routing elements are identical, therefore, partial rerouting can not provide further improvement. In other words, the improvement of partial rerouting is affected by the initial variation-blind routing. Lastly, the improvement made by variation-aware routing (7.6%) is not as significant as variationaware placement (11%) because there are more constraints in routing process considering process variation such as congestion problem. The density of critical path for ex5p is shown in Fig. 6 . It can be seen that the full chipwise, SSTA and the partial rerouting method achieved better critical path timing than variation-blind routing across the 100 variation maps. However, some results provided by these optimisation method are similar to variation-blind routing which are caused by the noise of router even applying noise reduction method. SSTA routing can provide better timing performance than variation-blind but worse than partial rerouting and full chipwise routing. The standard deviation of partial rerouting over 100 variation maps is smaller than variation-aware routing because all partial rerouting processes are based on the same initial variation-blind routing. All in all, this result highlights that we can achieve similar timing performance with the quicker partial rerouting method when compared with full chipwise routing.
Comparison of run time cost between chipwise routing
and partial rerouting The timing performance and execution time of partial rerouting is examined by scaling Crit T from 0.5 to 0.95. Non Crit T is fixed to 15% to provide rerouting space in this case. The benchmark ex5p which achieved most improvement by variation-aware and partial rerouting routing is used to exploit the relationship between routing performance and Crit T .
The improvement for partial rerouting is shown in Fig. 7 . Partial rerouting (plot with circle markers) with Crit T = 0.95, Crit T = 0.5, can achieve about 6.6% and 8.2% improvement respectively. In other words, increasing the number of rerouted near-critical paths (decreasing Crit T from 0.95 to 0.5) only provides about 1.6% more improvement in timing performance. This result highlights that the idea partial rerouting is sensible to improve timing performance by rerouting a small number of critical and near-critical paths. However, timing performance with Crit T = 0.8 in Fig. 7 appears to be worse than that of Crit T = 0.85, which is likely due to the remaining router noise in VPR even after noise reduction.
The execution time for partial rerouting is illustrated in Fig. 8 . The execution time of full chipwise routing is shown as the top dotted line in the plot as reference. Compared with full chipwise, rerouting with Crit T = 0.95 Fig. 7 . The delay of critical path for 95% timing yield provided by partial rerouting wiht Crit T from 0.5 to 0.95 for ex5p [8] . Fig. 8 . The execution time used by partial rerouting by scaling Crit T from 0.5 to 0.95 for ex5p [8] .
only requires about 15% of execution time of full chipwise. With decreasing Crit T , the plot of execution time for partial rerouting (plot with diamond markers) increases as expected. Due to the noise of router, the plot is not expected to be perfectly monotonic. However, it does show a general trend in decreasing execution time. Considering one complete partial rerouting procedure, a SSTA routing is required to provide a pre-defined routing solution.
The execution time of SSTA routing is about 140 seconds (lower dotted line). Therefore, for one time of the full rerouting procedure, the execution time is the sum of SSTA and rerouting (plot with circle markers), which is still less than full chipwise variation-aware routing [8] .
Moreover, consider the scenario where partial rerouting is applied to multiple FPGAs. The execution time is dramatically reduced compared with full chipwise variation-aware routing. The execution time for performing 100 chips is illustrated in Fig. 9 . To demonstrate the results clearly, the mean execution time is chosen across 100 FPGAs for full cihpwise and partial rerouting method. Since variation-blind routing configuration is universal to each FPGA, its generation process is only require once. Similar to variationblind routing, SSTA routing is only performed once but cost a little bit more time than variation-blind. For full chipwise routing, the total execution time for 100 variation-aware routing is about 2.7 × 10 4 seconds which is over 7.5 hours. By applying partial rerouting, the execution time is reduced to about 0.25 × 10 4 seconds (≈42 minutes). Although the actual execution time highly depend on the load of CPU and initial routing seeds, the results of this experiment prove that about 9 times speed up is achieved by partial rerouting method.
Conclusion
This work employs detailed delay variation information of individual FPGA chips to drive a timing-driving router in VPR in order to improve the delay of the critical paths for a given design. The results shows that the full chipwise routing can achieve about 7.6% improvement for 20 MCNC benchmarks on average. Partial rerouting with Crit T = 0.9 can achieve similar improvement (6.3%) which is better than SSTA (3.6%). For 100 FPGA, 9 times speed up of execution time is observed by the proposed variation-aware partial rerouting against full chipwise variation-aware routing.
