In this paper we present a simple but efficient timing-driven placement algorithm for FPGAs. The algorithm computes forces acting on a logic block in the FPGA to determine its relative location with respect to other blocks. The forces depend on the criticality of nets shared between the two blocks. Unlike other net-based approaches, timing constraints are incorporated directly into the force equations to guide the placement. Slot assignment is then used to move the blocks into valid slot locations on the FPGA chip. The assignment algorithm also makes use of the delay information of nets so that the final placement is able to meet the timing criteria specified for the circuit. The novelty of the approach lies in the formulation of the force equations and the manner in which weights of the nets are dynamically altered to influence the placement. Experiments conducted on industrial test circuits and MCNC circuits give very promising results and indicate that the algorithm succeeds in significantly reducing the maximum delay in the circuit. In addition, routability is not adversely affected and running time is low.
INTRODUCTION
The emphasis in VLSI designs today is on high circuit performance which has become synonymous with high speed of the circuits. With recent advances in the fabrication technology, delays of interconnects play a dominant role in determining the speed of a chip. This is particularly true in the case of FPGA architectures in which the logic blocks have predetermined delays and the interconnects dictate the speed of the configured chip. One of the chief motivations for this work is to study timing-based layout algorithms designed specifically for FPGAs. Although a lot of effort has been devoted to technology mapping 345 of the circuit logic onto FPGAs, layout issue has not received comparable attention. In FPGAs, in contrast to gate-array and standard-cell technologies, layout involves dealing with a lot of constraints because of limited and preset routing resources. In addition, for high-speed designs, timing constraints must be incorporated into the layout tools for regulating the timing behavior of the circuits. Most of the work in layout for FPGAs has concentrated on detailed routing, and placement is carried out using techniques such as simulated annealing which can take up a lot of time. Our aim is to incorporate timing delays of interconnects at the design layout stage and determine how they influence the speed and routability of the FPGA. For gate-array and standard-cell technologies, incorporating timing constraints in the placement algorithm has been studied extensively yielding good placement results [1, 2, 3, 4] . Efforts in timing-driven placement algorithms can be streamlined into two categories: net-based and path-based. In a net-based algorithm, higher weights are assigned to nets lying on critical paths as in [1] or stringent upper bounds are imposed on the lengths of these nets as in [5] . The net-based approach helps to improve the timing behavior without incurring high computational overhead. A very good theoretical basis in support of this approach is provided in [6] . In a path-based algorithm, the delays of all nets lying on the critical paths are controlled simultaneously [7, 8] . Most of these algorithms employ a mathematical programming approach to obtain timing-accurate solutions but are computationally intensive. However, work in the area of timing-driven layout for FPGAs has been scant, two recent papers being [9] and [10] .
There are three key points in our algorithm that distinguish it from other net-based techniques. Most net-based algorithms use some form of net-weighting to assign weights to nets depending on their criticality. Often, the weights of the nets are selected from a prespecified range and it is left to the discretion of the user to specify the exact value for each net. Our algorithm makes a significant departure from this practice. We use specified allowable delays of nets to directly determine the weights of nets which are then used in the formulation to guide the placement. The second feature of our algorithm is that it ensures that non-critical nets are not adversely affected in the placement process. The third feature is an adaptive net weighting scheme in which the weights of the nets are altered dynamically in accordance with the path delay requirements to guide the subsequent placement.
The organization of the paper is as follows. In section 2, we describe our timing-constrained placement algorithm for symmetrical FPGAs. The factors that make FPGA layout problem challenging and the underlying timing model are described first, followed by details of each step of the algorithm. Section 3 presents experimental results and performance evaluation of the placement algorithm.
TIMING-CONSTRAINED FPGA PLACEMENT
The objective of timing-based placement is to optimize locations of logic blocks on the chip for maximum speed, density, and high percentage of routability. It is worthwhile to consider at this point the issues that distinguish FPGAs from other technologies thereby making it imperative to revisit and alter the existing CAD layout tools. Since FPGA chips are not customized, it is very critical to implement circuits on them with as little wastage of resources as possible. Placement takes the longest time to complete in layout. Timing behavior and routability can be influenced much more at the placement stage than at the routing stage. This is because the flexibility of routing is limited by an already existing arrangement of blocks. In gate array, standard-cell, and custom cell designs the prime objective is to minimize the routing area and the routing resources are not predetermined. The routing resources in FPGAs typically consist of prelaid segments of interconnects of fixed lengths that can be extended using connection matrices. These matrices have capacitances associated with them and therefore contribute to the delay. As a resuit, the delay of a net is influenced by the number of segments and the number of connection matrices used by it. The CAD layout tools must take into account this effect in order to be efficient. Timingdriven placement in FPGAs is an art in tradeoffs. The main tradeoff is between a placement that meets the timing requirements and one that is routable for the FPGA architecture under consideration. The dominant and alterable part of the delays in FPGA layout comes from interconnects since the logic blocks have fixed delays. In contrast, in gate-array and standardcell technologies, the delays in the circuit can also be controlled by redesigning the logic blocks or by transistor sizing.
TIMING CONSTRAINED PLACEMENT 347 We assume a generic symmetrical FPGA architecture in this paper similar to that in [9] . It has a twodimensional array of logic blocks interconnected by vertical and horizontal routing channels. The logic blocks are the functional elements used to construct the circuit. The input-output blocks on the periphery of the chip provide interface with the logic not resident on the chip. Both these blocks are userprogrammable.
The general flow of our algorithm is outlined in figure 1 . The algorithm has three phases as indicated by the while loop of the figure. Before delving into the details of each of the phases, we describe the concepts underlying the timing model assumed in this paper.
Timing Issues
The terminology used in the rest 
Here, k is a constant and f (l) is a function of the length of the net. In case of FPGAs, the function f is governed by the number of segments and number of connection matrices used to route the net, and c is the sum of the capacitance contributed by a connection matrix and its incoming grid segment. According to equation (1), solving a placement problem with net delay constraints is same as solving a problem with constraints on the wire length of nets. Hence, we use the terms net delay and net length interchangeably in this paper.
The zero-slack algorithm [5] is used to obtain a set of upper bounds on the lengths of nets and these con-stitute timing constraints for the placement problem. Critical path information is obtained from a timing analysis done prior to placement. Zero-slack algorithm maximizes the range of wire lengths for each net and returns a set of non-unique bounds on the net lengths. If the timing constraints are to be met, all nets must have lengths less than or equal to their respective upper bounds. While the upper bounds over-constrain a lot of other net-based algorithms, we use the non-uniqueness of the upper bounds to our advantage, as will be demonstrated in the section on weight assignment.
A Force-Directed Placement Algorithm
The technique we adopt is based on the classical force-directed placement (FDP) method [11, 12] . The method uses a mathematical formulation to model the forces among the logic blocks (LBs) and solves the resulting set of equations to determine the relative placements of the LBs. The idea is based upon a physical model in which if a signal net is shared between two LBs, there is an attractive force between them; otherwise, a repulsive force comes into play between them. The attractive force tries to bring the connected LBs closer together while the repulsive force tries to move apart the LBs that do not share any nets. In the classical approach, the main objective is to obtain a placement that minimizes the total wire length. Consequently, the forces are a function of the number of signal nets between the LBs or the connectivity. This method increases routability by placing strongly connected components close together but does not ensure that timing requirements on the critical paths are met. In the formulation detailed next, instead of using a connectivity-based formulation we model the force equations to reflect the timing delays of the nets and thereby ensure that timing constraints specified for the critical paths of the circuit are satisfied. We refer to this as timing-constrained force-directed placement (TFDP).
There are three key points in favor of the forcedirected formulation that have prompted its use.
The salient feature is that it takes a global perspective of the problem. This implies that the formulation takes into account constraints on all the blocks concurrently and hence attempts to finds an arrangement that does justice to constraints on all the blocks. This is unlike other iterative methods that perturb a subset of the blocks at any time and use a cost function to evaluate the effect of the perturbation. This can adversely affect the blocks that are not perturbed if the cost function is not accurately modeled. This is particularly true of timing-based formulations since it is difficult to capture the effect of block perturbations on all path or net delays in a single cost function. The technique is relatively insensitive to initial placement, very much unlike other commonly used placement techniques such as the partitioningbased ones and those based on simulated annealing. This means that the initial random positioning of blocks has very little effect on the final placement. This is a very nice feature because it obviates the need to make several runs to get the best placement, as needs to be done in many iterative algorithms.
The number of iterations required for convergence to the final placement is dependent more on the topology of the circuit rather than on the size of the circuit. The size of the circuit primarily affects the time required for each iteration only.
In addition to these, the timing-based formulation of our algorithm has distinct advantages over other net-based placement techniques. The other techniques rely on assigning weights to nets that are deemed critical after the timing analysis. The manner in which the weights are assigned is often left to the discretion of the user. For instance, typically, a range of weights (1 to 10) to be used for the nets is given. The non-critical nets are assigned a weight of 1 while the rest of the nets are assigned weights depending on how critical they are. However, it is very difficult to assess the criticality of each net in order to assign a suitable weight to it. Our algorithm obviates this shortcoming by using the upper bounds on the net lengths obtained after the timing analysis LBs b and e, R being the repulsion factor. In general, the repulsion factor between two LBs is 0 if they share a critical net; otherwise, it is given by R which is calculated as:
Here, K is a constant (experimentally found to be between and 2), M is the number of logic blocks, and Aij is 1 if the LBs and j share a critical net and 0 otherwise. T is the total number of Ai/s that are 0.
The total force on LB b is then given by the sum of the attractive and repulsive forces acting on it due to all the other LBs. The total force on each of the other blocks can be determined in a similar fashion. Since the information about criticality of the nets is embedded in the force equations, there is a tight coupling between the blocks that share critical nets. This favors our primary objective of controlling the delays on the critical paths. Once the force-formulation has been done, the algorithm tries to find positions for the LBs such that the forces acting on them are reduced to zero; in other words, the attractive and repulsive forces are balanced. This is done by iteratively allowing the blocks to move in accordance with the forces acting upon them until either the total forces on all the blocks fall below a tolerance level or the blocks do not move significantly from their current positions. The forces are resolved in the X and Y direc-tions for this purpose resulting in 2M nonlinear equations that are solved using the quadratically conver- due to shift in location of block from its current position to slot j. The motivation behind choosing such a cost function is that we wish to reduce the increase in the net length and hence the delays on the nets as a result of moving the blocks.
Slot Assignment
Once the relative location of the blocks has been obtained at the end of the first phase, the LBs have to be assigned to fixed positions, slots, on the FPGA chip since the solution from the first phase of the algorithm assumes a continuous plane for movement of the blocks. The slots are predetermined by the architecture of the FPGA under consideration. We use a linear assignment algorithm for performing the slot assignment. In this phase also, we make use of the timing information on the nets to position the blocks. The assignment algorithm employs an M S cost matrix, S represents number of slots on the FPGA such that M --< S. Each element of the matrix represents the cost function value Lij which is the cost of moving the block from its current position to slot j.
In most of the linear assignment algorithms the cost function is chosen to be the Euclidean distance between the current position of the block and the position of the slot. But this measure minimizes the movement of the blocks only. This has the effect of moving the blocks to the nearest available slots without regard to the effect on the delays of the nets that a block shares with other blocks. To ensure that the delay bounds on nets are preserved after the first phase, we use a cost function that reflects the timing constraints imposed on the nets. In our implementa-
Adaptive Weight Assignment
As stated earlier, the attractive force between two logic blocks and j is biased by a weight W 0 which is dependent on the timing requirement of the critical path. An appropriate choice of weights is very important since it dictates how well the timing requirements will be met. It also affects the convergence of the force-directed placement algorithm since it affects the attractive forces. Although the zero-slack algorithm gives a set of upper bounds on the net lengths, which when satisfied, guarantee that the circuit will meet the timing constraints specified, it can also overconstrain the problem. To circumvent this, it is noteworthy here that the set of upper bounds on net lengths is not a unique set, which means that it is possible that some other combination of net lengths would satisfy the timing constraints just as well. Consequently, it is possible that while some net lengths along a critical path are in excess of the upper bounds specified for them, other nets Reevaluate W o as the minimum of the weights of all nets shared between LBs and j. This has the effect of increasing the attractive force between the blocks whose nets have delays longer than desired without adversely affecting the other nets.
The first two phases of the algorithmmforce-directed placement and slot assignment are repeated with the new set of weights. This is an iterative process which is carried out until there is no significant improvement in the path lengths that violate their respective upper bounds. We refer to each such iteration as a pass through the entire algorithm. In the end, limited and local pairwise exchange is carried out to reduce the delays on paths that do not meet the constraints, if any such paths are left. It needs mention here that when weights are decremented, they are not allowed to fall below a certain threshold to ensure that the attractive forces do not become numerically large. Another important point is that since the net weights are always decremented, it does not in any way affect the other paths of which these nets may be a part. This is in contrast to other net-based approaches in which, often, the non-critical paths become critical due to the manner in which net weights are assigned.
EXPERIMENTAL RESULTS
For the experiments, we tested eight circuits [14] that represent a good mix of industrial circuits and circuits chosen from the MCNC suite of benchmarks. Having evaluated our algorithm with respect to timing performance, we now discuss the impact of timing-driven placement on metrics related to wireability. Total Wire Length (TWL) is an estimate of the sum of wire lengths of all the nets in the circuit.
While it is difficult to establish a closed form relationship between the estimated wire length and the routability of a circuit layout, it is well known that minimizing the total wire length helps to reduce the routing area. A single-trunk steiner tree is used to approximate the length of a net. Since timing-constrained placement is to minimize path delays in the circuit and not the TWL, the latter tends to be larger than in the case when the objective of the algorithm is to minimize TWL. However, it is not desirable to satisfy the delay requirements by compromising a great deal on total wire length. We measure the fractional increase in TWL due to timing-constraint consideration over the TWL obtained when the aim is to minimize TWL. The results for these are summarized in Table 2 A placement that meets all timing constraints but results in great deal of wire congestion is not desirable. Because of the limited interconnection resources of the FPGAs, routability assumes greater importance than in other technologies. The number of unrouted nets serves as a yardstick of the routability of the given placement. As can be seen in Table 2 , the number of unrouted nets in placements obtained from FDP and TFDP are about the same in most cases. The last important metric deals with the computational resource required for the timing-driven placement algorithm. Since FPGAs are geared towards fast design turnaround times, it is essential to ensure that the timing-driven placement algorithm does not become a computational bottleneck. In Table 2 , TFDP requires more CPU time than FDP. This is because of the additional computation required in the weight assignment phase of our algorithm and also due to the additional passes required as a result to ensure that timing constraints are completely satisfied. TFDP takes at least twice as long as FDP as seen in circuit 4. However, our algorithm is faster than simulated annealing based placement algorithm used in XAPR. For circuit 7, TFDP takes 408s less than XAPR and for circuit 5, TFDP takes 20s less than XAPR.
From the above discussion, it is clear that our timing-constrained force-directed algorithm succeeds in attaining the primary objective of meeting the timing constraints specified for the circuit without seriously degrading the wireability of the resulting placement and without incurring excessive computational time. The results also demonstrate that reassigning weights to nets that lie on paths that do not meet timing constraints leads to very good timing results.
CONCLUSIONS
In this paper, we have presented a timing-driven placement algorithm for FPGAs that uses forces on the logic blocks to determine their placement. The algorithm employs a novel dynamic weight assignment technique for the nets to ensure that timing constraints for the circuit are met. The algorithm is evaluated comprehensively on eight test circuits and the results demonstrate that the algorithm performs very well in meeting the timing constraints, leads to circuit designs with higher speeds, and does not adversely affect the wireability or running time. All these factors are extremely important when implementing high-performance circuits using FPGAs.
