In FPGAs the routing resources are fixed and their usage is constrained by the location of antifuses. In addition, the antifuses affect the layout performance significantly, depending on the technology. Hence, simplistic placement level assumptions turn out to be grossly inadequate in predicting the timing and wirability behavior of a layout. There is a need, therefore, for a layout technique which changes the layout at placement level based on accurate post-layout timing analysis and net wirability. In this paper we consider such a wirability and performance driven layout flow for row-based FPGAs. Timing information from a post-layout timing analyzer and wirability information from global and channel routers are used by an incremental placer to effectively perturb the placement. A large improvement (up to 29%) in timing, has been obtained (compared to non-iterative FPGA layout) for a set of industrial designs and benchmark examples.
INTRODUCTION
The primary requirements of most digital circuit layout tools are ensuring 100% wirability and meeting the timing requirements. A class of such tools partition the layout problem into three distinct steps: placement, global routing and detailed routing. At the placement stage, wirability requirements translate into requirements to place highly-connected blocks closely so as to reduce the size of nets. Efforts are also made to estimate and *Corresponding author. reduce excess congestion. To handle timing requirements at the placement level, some placers use pre-placement critical path information [1, 2] . However, much better results can be achieved by dynamically updating the net or path timing criticalitties in the course of performing min-cut [3, 4] or other placement techniques [5, 6, 7] . A path is defined as an alternating sequence of logic and nets. And a set of paths are said to be critical if the signal delay through the paths are above a given threshold value provided by the user. At intermediate stages, path criticalities and net delays are used to determine the slacks on the nets [8, 13, 14] . Other approaches employ linear programming for dynamic path constraint determination [15] while partitioning or formulate the timing-driven placement problem as a quadratic programming problem [16, 17] .
At the placement level, wirability requirements are handled by the radiation of net spans denoted by the net bounding boxes and timing requirements are handled by keeping the critical paths short. There are two underlying assumptions for these placement level heuristics:
1. The wirability of a net is a function of its length and the congestion of its routing region. 2 . The delay accross a path is a function of its length.
In FPGA layout, however, these simple assumptions do not suffice. A sample row-based FPGA architecture [18] is shown in Figure 1 . It comprises of rows of logic modules separated by channels. The logic modules could be of different types (input/output, combinational etc.). Horizontal routing resources are available in the form of segments. These can be electrically connected to adjacent segments by programming horizontal antifuses. Vertical routing resources are available in the form of segmented feedthroughs running accross the channels. If netA is connected as shown in Figure 1 , four cross antifuses (C), two horizontal antifuses (H) and one feedthrough antifuse (F) and n (k) is the number of nodes on k paths which exceed the given threshold limit [19] . The net criticalities are generated from the interconnect delays, such that each net is assigned a value between and p, proportional to the net delay. Net delay is defined as the maximum delay from the driver pin to any input pin comprising that net.
THE INCREMENTAL PLACER
In general, it is non-trivial to change a placement incrementally. This invariably results in cell overlaps, the removal of which might cause a change in a substantial portion of the chip. However, for structured layouts, like gate-arrays and FPGAs, an incremental change in the placement can be brought about by swapping or moving modules to valid locations. This is exactly the method followed by our incremental placer. In order to arrive at a linear-time algorithm, we first select a set of r/critical cells. Then, for each of these, the best swapping candidate is selected. Finally, the best of the r/swap pairs determines the swap. This process is repeated a few times during the incremental placement phase. This is in spirit similar to the procedure for selecting the best set of swaps in a Kernighan-Lin style bipartitioning [20] .
The rest of this section describes the components of the incremental placer in detail.
Information Handling
The global and the channel routers provide the wirability information in terms of two rn by n matrices-the vertical wiring matrix (V) and the horizontal wiring matrix (H), as shown in Figure 5 . m is the number of channels and n is the number of zones per channel (each channel is divided into n zones of widths like that of logic modules). Hij represents the number of segments assigned to nets in zone j of channel i. Similarly, Vij represents the number of feedthroughs assigned to nets in zone j of channel i. For the example of Figure 5 , there are three channels and three zones. The middle channel has only one feedthrough in zone 2, while there is a segment assigned to net A in all three zones of the middle channel (shown by highlighted lines).
Therefore, the second row of matrix H has all l's.
The timing analyzer provides the path criticality of the k most critical paths, and the net criticalities based on maximum driver-to-sink delay (ca) for nets.
Selecting Critical Cells
The criterion for selecting the r/most critical cells is their predicted ability to cause a substantial change. The As a result of a move/swap, a set of net bounding boxes change. These changes are used to update the horizontal and vertical wirability matrices. The path delays are updated by using the ratios of old and new path lengths. The net criticalities are updated by the ratio of the old and new ads of the corresponding net.
Number of Moves
In every iteration, a certain number of swaps/ moves are made by the incremental placer. This number should be small in order that the wirability and the timing information should continue to provide useful information. If too few moves are made at the incremental placer, then a large number of iterations across the flow of Figure 2 will be required to get a substantial improvement. On the other hand, if too many moves are made, the wirability and timing information cease to be accurate resulting in the possibility of later moves looking good but being detrimental. To avoid the problems of these two extremes, we adopt a dynamic determination approach. In the first iteration, the number of swaps/moves is set to a small user-defined percentage (default 5%) of the total number of cells. If it is found that the performance (found after routing and timing analysis) improves then the maximum number of swaps/moves for the second iteration is increased by a user-defined percentage (default set to 10%) and if it degrades, then it is reduced by the same.
Stopping Criterion
It can be observed that the critical path delay through a circuit may increase slightly after an iteration. This is partly due to the fact that placement moves out of local minimas to move into another minima, and also partly due to the fact that though the incremental placer moves the blocks with the timing and wirability information, however, it is not absolutely coupled with the global and channel routers. Hence, the best result is stored. The number of iterations through the loop can be user defined and can be used as a stopping criteria.
IMPLEMENTATION AND RESULTS
The performance and wirability driven FPGA layout algorithms were implemented in C on an Apollo 425 workstation. We experimented with three designs presently used in the industry and two MCNC benchmark examples. Texas Instruments' TPC 1010 logic modules with corresponding channel segmentation scheme were used in our study. The logic netlist, optimized from logic synthesis, was converted to layout netlist using pinmap information for each macro. A predefined, two-dimensional layout array template that consists of rows of logic modules is used as the placement template. The TPC1010 series of FPGAs have a 44x8 template size (8 rows, each having 44 logic modules). However, the template size was varied to fit the larger designs. Table I shows the result of applying our layout scheme to some examples, bw and duke2 are the two MCNC synthesis benchmarks and the rest are industrial designs. Column The wirability considerations for the layout flow were experimented by using lesser number of tracks such that one or two nets were not routable for every design. In all the cases, the layout tool was able to perturb the placement so that routability was achieved within few iterations.
CONCLUSIONS
Fixed routing resources and the presence of antifuse severely deteriorate the validity of placement level predictions of wirability and timing in the case of FPGAs. However, the ability to effect a change in these behaviors continues to be much more at the placement level compared to the routing level. There is a need, therefore, for a flow where the post-layout information is effectively used to change the placement. Such an iterative wirability and performance improvement schemehas been developed for row-based FPGAs.
. 
