Due to increasing design complexity, routing congestion has become a critical problem in VLSI designs. This paper introduces a distributed metric to predict routing congestion for a premapped netlist and applies it to technology mapping that targets area optimization. Our technology mapping algorithm is guided by a probabilistic congestion map for the subject graph to identify the congested regions. Experimental results on the benchmark circuits in a 90nm technology show that congestion-aware mapping results in a reduction of 37%, on an average, in track overflows as compared to conventional technology mapping.
INTRODUCTION
With increasing design complexity, designs are increasingly becoming wire-limited [1] , thus aggravating the problem of routing congestion. Although exact routing congestion information is known only after global routing, failure to address congestion prior to this £ This work was supported in part by SRC under contract 2002-TJ-1092 and those under award NSF CCR-0098117.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. point implies that the designer is left with few degrees of freedom. Moving one step back, to placement, provides greater flexibilities, but is still not enough and it is known that this can still lead to significant design iterations.
It is imperative to address congestion issues early in the design process to allow for more freedom to reduce congestion, and our work addresses this issue during the synthesis step. Technology mapping, which nowadays is interleaved with physical design for better delay estimation, provides powerful capabilities for absorbing long interconnect wires into internal connections within complex gates, or for splitting complex gates into simpler gates, thus helping to alter the overall distribution of wires in the layout. During this step, the routing congestion problem may be attacked with relatively more freedom (albeit relatively less information) than during placement and routing.
While congestion is an important consideration for technology mapping, the overriding objectives continue to be metrics such as area or delay or power. Therefore, it is more appropriate to use congestion as a constraint rather than as an objective. While optimizing for area and delays, it is desirable to ensure that the final netlist does not have congested spots, so that long detours are avoided and the netlist remains routable. Typically, very few places in the circuit (ideally, zero) should have congestion values that are greater than some threshold, and the final netlist should be well optimized from the area and/or delay perspectives. Some related work in the past is summarized as follows. Stok et al. proposed a clustering of closely placed cells during technology mapping so that the matching choices covering distantly placed cells in the subject graph are ruled out [2] . This approach may result in long wires in the final netlist, and more importantly, may be so limiting as to leave a significant portion of the design space unexplored. Pandini et al. proposed wirelength as a metric to be minimized during technology mapping in order to reduce the congestion [3] . Although large wirelength may be correlated with high congestion, the correlation is rather poor, and therefore, this may not result in an effective optimization. This observation has been borne out by recent work by the same authors [4] , who state that such a metric, when considered during technology mapping employing a traditional cost function (Ã½¢Area·Ã¾¢Delay·Ã¿¢Wirelength, where Ã½, Ã¾, and Ã¿ are constants), may not result in decreased congestion. As pointed out by them, congestion is a local property that varies from bin to bin, and it is difficult to capture its effects using a global metric like wirelength. This observation led them to the conclusion that congestion can only be targeted using iterative placement and technology mapping. However, such a conclusion is valid only when the congestion optimization is performed using an indirect global metric in a traditional fashion. Instead of trying to absorb the congestion information into a single metric, we work with information about the distribution of congestion over the entire layout. The contributions of our work can be summarized as follows.
Using empirical data obtained from several benchmarks, using different scripts, placement algorithms, and libraries, we show the fidelity between the congestion maps for the subject graph and the mapped netlists, and then exploit this fidelity during technology mapping.
Instead of using an indirect metric such as wirelength, we use probabilistic congestion estimates to guide our technology mapping; these were shown in [5] to have good fidelity with post-routing congestion.
The congestion cost function is defined such that the mapper chooses area-optimal matches when the corresponding wires are likely to pass through a sparsely congested region, while congestion-optimal matches are chosen when the corresponding wires are likely to pass through a densely congested region. Thus, different optimization modes are applied at different places in the circuit depending on the context. To the best of our knowledge, such selective optimization has not been applied during technology mapping in the published literature.
PROBLEM DEFINITION
Routing congestion depends on the following factors: the connectivity of the network, its placement, and the routing solution. Since there is relatively less freedom to attack routing congestion during the placement and routing stages, we concentrate on the first factor in this paper. The technology mapping step makes crucial decisions regarding the connectivity of the network, since the mapping of primitive gates to the library cells determines the set of wires that will be present in the circuit netlist. Traditionally, this has been carried out without any placement information. Although this has changed in recent physical synthesis vendor offerings, most approaches focus on the prediction of wirelength based on bounding box estimates that ignore congestion. The estimation of routing congestion without a placement for a network is, if not impossible, liable to be highly inaccurate, and one may have to rely on high level metrics such as adhesion [6] . However, this is a very new metric and several open questions about it remain unanswered: for example, whether it can be measured in a computationally efficient manner, and whether its fidelity is valid for mapped netlists. On the other hand, probabilistic congestion estimation [5] used after the placement of a mapped network has been demonstrated to correlate well with the congestion map generated after the routing, on both academic and industrial benchmark circuits. The estimation method divides the layout into bins and computes the congestion for a given bin under all possible routes for a given net. The congestion is defined as follows. We use the probabilistic method of [5] to guide our technology mapping algorithm. However, even such a method is difficult to adapt, since only the premapped netlist is available prior to technology mapping, and the level of correlation between the probabilistic congestion maps of the premapped netlist and the mapped netlist has not been studied in the past. One contribution of this work is to perform such a study. From empirical evidence obtained using different logic synthesis scripts and placement algorithms on a variety of benchmarks, we show a good congestion correlation between premapped and mapped netlists. Once we establish the congestion correlation between the premapped and mapped netlist, the problem of congestion-aware technology mapping can be defined as follows.
PROBLEM DEFINITION 
CONGESTION FIDELITY
This section explores the level of fidelity between the congestion estimates before and after technology mapping for any given circuit. For a given circuit, a netlist before technology mapping contains primitive gates such as 2-input NANDs, and technology mapping creates a netlist consisting of a set of gates from the given library. We refer to a netlist before technology mapping as a subject graph or a premapped netlist and that after technology mapping as a mapped netlist. Intuitively, the premapped and mapped netlist for a given circuit share the same global connectivity since the mapper absorbs some wires of the subject graph into the internal nodes of library cells, leaving other wires untouched. This points towards the possibility of good fidelity between congestion maps for premapped and mapped netlists. However, congestion also depends on the placement of elements (primitive gates or gates in the library) in the netlist. Placement algorithms used by commercial tools and in academia are typically based either on recursive multi-level bisectioning or force-directed quadratic programming. It would be useful to understand, even empirically, whether these placement algorithms react to the same global connectivity and block area constraints in a similar way. If so, there may be a good congestion correlation between premapped and mapped netlist. We explore this issue by performing a set of experiments using a variety of placers, logic synthesis scripts, libraries, and benchmarks.
Experimental setup
To verify the fidelity between congestion estimates before and after technology mapping, we placed several premapped netlists, and the corresponding mapped netlists using the same block area and the same placement of input/output terminals. Two different placement algorithms were used -a recursive bisectioning based algorithm in a publicly available tool, Capo [7] , and a force-directed quadratic algorithm, Kraftwerk [8] , implemented in a proprietary industrial placer. Different scripts, such as rugged, boolean, algebraic, espresso, and speedup in SIS [9] were used for preprocessing the netlists before technology mapping employing different libraries in SIS as well as an industrial library used for high performance microprocessor designs. Mapping was performed in SIS using the map -s -n 0 -AFG -p command that performs area and fanout optimization. No layout information was used to guide this technology mapping. Placement using Capo [7] was performed with default options to minimize the total wirelength based on half perimeter bounding box estimates, while placement using Kraftwerk [8] was performed to minimize total wirelength as well as congestion.
The premapped netlist is an abstract Boolean network. Since the number of nodes in this netlist is large, the area of primitive gates must be scaled by a certain factor to present the same white space constraints for the placement as the mapped netlist. This factor is computed a priori as a ratio of the targeted gate area to the area of premapped network. Note that this factor is readily available given the block area, the percentage area utilization, the premapped netlist, and the cell library, and does not require any testcase-specific tuning.
Experimental results
We show results for a few representative benchmarks: C432, C6288, C7552, and an industrial circuit containing an instruction decoder (IDC) in a high-performance microprocessor. Similar results are observed on other circuits, but are not shown here due to space limitations. Apart from the vastly different functionalities, the sizes of these benchmarks also vary from a few hundred cells to a few thousand cells. Figures 1 (a) and (b) show congestion maps for the benchmark IDC for the mapped and premapped netlists, respectively. The placement of both the networks is performed using Kraftwerk. In these plots, the XY plane shows the two dimensions of the layout area, while the Z-axis depicts the congestion. Visually, one can conclude that the distribution shown in Figure 1 (b) is similar in nature to the congestion map shown in Figure 1 (a).
Representative results for some ISCAS'85 benchmarks and the IDC circuit using different scripts, libraries, and placers are shown in Table 1 . Columns 2, 3, and 4 show the scripts used, the number of cells in the mapped netlists, and placement tools used, respectively. Technology mapping in SIS [9] is performed using the area and fanout optimization option, employing the lib2.genlib library in SIS and an industrial library. It is worth noting that the mapped netlist is fanout-optimized, which possibly restructures the network after the mapping and may affect the global connectivity adversely. Columns 5 (6) and 7 (8) in the table show the average and maximum horizontal (vertical) congestion, respectively, while columns 9 and 10 show the statistical correlation between the congestion in premapped and mapped netlist. The correlation is defined as ´ µ´ µ , where is the expectation, is the mean, is the standard deviation; in our case, and correspond to the congestion in the premapped and mapped netlists, respectively. A correlation value closer to 1 (-1) means that two random variables are strongly positively (negatively) correlated, while a value close to 0 means that variables are weakly correlated [10] .
Justification based on experimental results
In spite of fanout optimization that may affect the global connectivity and hence congestion fidelity, the congestion correlation between subject graph and mapped netlist is always greater than 0.6, and is often quite close to 1, for all the netlists. One may deduce the following based on these experimental results.
Across different libraries, scripts, benchmarks, fanout optimization, and placement algorithms, a good correlation exists between the congestion map for the subject graph and congestion map for a mapped netlist.
The reasons for the congestion correlation are likely to be the similarities in the global connectivity in the subject graph and the mapped netlist, the similar block area and I/O terminal constraints, and the way any reasonable placement algorithms react to such resemblances in global connectivity and the block area constraints.
CONGESTION-AWARE MAPPING
For the purposes of congestion-aware technology mapping, the sparsely congested and densely congested regions must be identified. From the above experiments that demonstrate the congestion correlation between a subject graph and its mapped netlist, we can conclude that the former netlist is accurate enough for this purpose. The primary objective of our congestion-aware technology mapper is area minimization, and we employ a variation of a dynamic programming-based technology mapping algorithm [11] . The technology mapping procedure involves the matching and covering phases: the former comprises storing the set of optimal matches at each node, while the latter involves constructing the network by selecting from the matches stored during the matching.
Example
A pure area minimization objective during technology mapping can result in poor congestion, and Figure 2 illustrates a case where suboptimal area matches may reduce congestion. Assume that all of the bins, shown as dashed squares in the figure, are congested and a match for the AOI33 function is considered. The inputs to the match enter through top and bottom bins on the left, while the output leaves from the middle bin on the right. Figures 2(b) and (c) are 20 and 15, respectively. The latter also happens to be the minimum over all placements for the area-optimal AOI33 match. It is clear that the match in Figure 2 (a) distributes the logic and therefore, creates lower congestion. This example also highlights limitations of the placement in alleviating congestion, when area-optimal matches are chosen.
The cost of wires depends on the context: wires are inexpensive in sparsely congested regions, but are expensive in densely congested regions due to possible detours and hampered routability. One way to reduce this cost in densely congested zones without penalizing the design excessively is to account for their congestion contributions only in those zones. Our congestion-aware mapping heuristic serves this purpose well: in densely congested spots, it considers probabilistic routes based on the center-of-gravity locations for all possible matches and chooses the match that minimizes the congestion, while in sparsely congested spots, it chooses areaoptimal matches. The congestion-aware mapping heuristic requires the assignment of a congestion cost, along with an area cost, to each match. The congestion cost depends on the total congestion caused due to the nets subsumed by a match, its fanin nets and its fanout nets. Specifically, it is given by,
Congestion cost computation
where, Ó×Ø Å Ø is the congestion cost of the match, Ó×Ø Ò Ø Ö Ø ( Ó×Ø Ò Ø ×Ù ×ÙÑ ) is the congestion cost of the nets created (subsumed) by the match. For example, for a 3-input NAND match shown in Figure 3 (b) corresponding to the subject graph shown in Figure 3 (a), the congestion cost is as follows:
The nets Û½ ¼ , Û¾ ¼ , Û¿ ¼ , and Û ¼ correspond to the new location of the match and the fanins and fanouts of the match; we compute the new location of a match as the center of gravity of the locations of its fanin and fanout gates. Multi-terminal nets are modeled using cliques for the congestion computation, and congestion contribution of each edge is scaled by a factor of ¾ Ò, where Ò is the number of edges.
The congestion cost of a wire depends on the route and the congestion in the bins that the route passes through. Probabilistically, all of the routes in the bounding box of the net are assumed to be equally possible 1 [5] . If a congestion (say 0.4) in a bin in the bounding box of the net is small as compared to the threshold congestion ½ This assumption may not always be true. Typically, routers try to minimize vias and therefore, for two terminal nets only L and Z routes are considered. Such information can be taken into account while generating the congestion map. (say 1.0, for instance), then the congestion contribution of that net for that bin is assumed to be 0. This is because a small value of the congestion metric corresponds to the availability of numerous tracks, and the routability of the net through the bin is unaffected. However, if the bin is congested, then the probabilistic congestion contribution of the net to that bin must be considered as its routability is hampered. In case of Figure 4 , wires w1 and w2 will have different congestion costs even though the shortest routes in both the cases may have the same length; the congestion cost of w1 will be zero and the congestion cost of w2 will have a positive value as its bounding box contains congested bins. The following equation captures this causality relation between routability and congestion while computing the congestion cost of a net, Ó×Ø Ò Ø , Ó×Ø Ò Ø ¦ Ò¾ ÓÙÒ Ò ÓÜ´Ò Øµ ´ Òµ Ñ Ü Ò Ò Ø (3) where ´ Òµ is the congestion in a bin, Ñ Ü is the threshold 
) corresponds to maximum (average), while H (V) corresponds to horizontal (vertical).
congestion, and Ò Ò Ø is the congestion due to the specific net within the bin. It is easily seen that this definition filters out the contributions of uncongested bins from the congestion cost. The bounding box for a two-terminal net is shown in Figure 5 . It contains 16 bins, and the congestion value associated with each bin is shown in the figure. For the net connecting terminals 1 and 2, six possible L-and Z-shaped routes are shown for the purpose of illustration 2 . To compute the congestion cost, if the threshold value of congestion ( Ñ Ü) is set to 1.0, then we consider only the congested bins for which congestion value is greater than 1.0, i.e., bins for which the congestion metric is 1.1 and 1.2. Three routes (route 1, 4, and 5) pass through the bin with congestion 1.1, while two routes (route 3 and 5) pass through the bin with congestion 1.2. Assuming all the routes to be equally possible, the demand (the ratio of number of paths passing through the bin to the total number of paths) for tracks in the latter bin is ¾ . Similarly, the demand for tracks in the former bin is ¿ . Using the definition 2.1, con-¾ In practice, we use probabilistic congestion estimates that consider river routes as well.
gestion contribution of the net for these bins can be computed by dividing the demands by the number of available tracks (AE ÌÖ × ).
Employing Equation 3, the congestion cost of the net is given by
The congestion cost for a match can be calculated from that of its incident nets. A positive cost implies that it may increase the congestion beyond the threshold value in some bins, while a negative cost implies that it may decrease the congestion in some of the bins where congestion exceeds the threshold value. Algorithm 1 shows the pseudo-code for choosing the best match at a node during the matching phase of the mapping algorithm. The triplet´ µ denotes the congestion cost, area cost, and delay cost associated with match . The function CongestionMatch() is called for every match at a node during the matching phase to decide the best one to be stored at the node. The congestion cost is given priority over the area and delay only in congested regions, and area-optimal matches will be chosen for the nodes in the sparsely congested regions, as stated by the following proposition.
Description of the algorithm

PROPOSITION 1. If bins in bounding boxes of all of the nets, corresponding to all of the matches at a node, have congestion values that are smaller than the threshold congestion, then an areaoptimal match will be stored as the best match at that node.
PROOF. This is a direct consequence from the fact that the congestion cost for all nets corresponding to all of the matches for such a case is zero from Equation (3), and the pseudocode shows that under this scenario, the area-optimal match is always chosen. REMARK 1. The above result is important for congestion-aware mapping, since previous work in [4] has shown that the traditional way of considering the cost, (Ã½ ¢ Area · Ã¾ ¢ Wirelength) during technology mapping requires different values of Ã¾ in the different regions in the circuit as a single value of Ã¾ fails to capture the importance of congestion in different regions. Choosing a single value of Ã¾ may correspond to the case in which entire circuit is uniformly congested with a single congestion value. In reality, the congestion in the circuit varies continuously from 0 to 1, or is even 1, while the routability changes in a discrete manner: in case of a bin with congestion value 1, at least, some nets are detoured, or are unroutable, while routability of all the nets is unaffected when the congestion for the bin is 1. Assigning the congestion cost to the nets in the congested bins accounts for this discrete nature of routability and also allows the mapper to select area-optimal matches in the sparsely congested regions. Both of these purposes are critical and are served by our algorithm, while previous approaches [2, 3] have not addressed these.
The time complexity of our congestion-aware technology mapping is almost unchanged from that of a conventional technology mapping. The congestion cost computation of a match takes Ç´ AE Ø × Å Ø ¢ AE Ò×µ, where AE Ø × Å Ø is the number of nets associated with a match and AE Ò×is the number of bins over entire layout; AE Ò×is a constant for a given layout. Therefore, congestion cost computation takes Ç´ AE Ø × Å Ø µ time, the same as that required for structural matching used in the SIS mapper [9] .
Pre-routed blockages in the design can be incorporated into our congestion cost by reducing the appropriate number of tracks in the corresponding bins. Most placers are adequate at handling blockages. Therefore, subject graph nodes or mapped cells are not placed in blocked areas. While long wires may require repeaters that are not visible in the subject graph, observe that these buffers do not change the congestion cost.
Limitations of the algorithm
Since this technology mapping procedure is applied to tree structures after the initial subject graph generation and the decomposition of DAG's into trees, the algorithm does not have any control over high fanout nets, or over the fanout nets created due to matches at the roots 3 of the trees. The congestion due to these high fanout nets is controlled by the structure of initial network and fanout optimization. The effectiveness of the congestion-aware mapper proposed here is influenced by the scripts used for technology independent optimization, technology decomposition, and fanout optimization after technology mapping.
In our current implementation, we do not update the congestion map dynamically during technology mapping. However, this update can be easily carried out during the covering phase, thus allowing a more accurate selection of the best match stored at a node, provided multiple congestion-aware matches are stored in addition to an area-optimal one.
¿ All of the nodes in the tree have a fanout of 1 but for the root. The probabilistic congestion estimation algorithm from [5] and the congestion-aware technology mapper were implemented in C/C++ and incorporated in SIS [9] . The subject graphs were created by running script.rugged followed by tech decomp -o 2 in SIS [9] . We present a set of experimental results obtained using a force-directed quadratic placer, Kraftwerk [8] , and a proprietary industrial maze router. The experimental flow used in our experiments is as shown in Figure 6 . For congestion-aware mapping, a subject graph was first created and placed using Kraftwerk. The congestion map for the subject graph was then generated and used in our congestion-aware mapper. After technology mapping, the circuits were placed using Kraftwerk followed by global routing using proprietary router. In all of our experiments, a bin-size of ¢ m ¾ was used. Table 2 shows the post-routing results obtained using our inhouse Kraftwerk placer and maze router for conventionally mapped and congestion-aware netlists. Technology mapping is performed employing a proprietary cell library used in high-performance microprocessor designs. Our experiments use a 90nm technology and allow the router to use 4 metal layers 4 : metal 1 with no preferred direction, metals 2 and 4 for the horizontal direction, and metal 3 for the vertical direction. The entries of the form 'a / b' in the Columns 3 through 7 mean 'a' ('b') corresponds to conventionally (congestion-aware) mapped netlist. The block area shown in Column 2 is used for both of these netlists for the benchmarks shown in Column 1. Since the same block area is used for both the netlists, there is no area penalty. Columns 3, 4, and 5 show the average row utilization, the overflow after global routing, and the number of bins with congestion more than 1.0, respectively, while Columns 6 and 7 show the maximum and average congestion, respectively. For small benchmarks such as C1355, C432, and C880, a small number of bins are congested in the conventionally mapped netlists while none of the bins is congested in the congestion-aware mapped netlists. This shows that congestion problem for a few bins can be easily resolved by congestion-aware mapped netlist without any area penalty. C499 and C1908 show zero routing track overflows, while other small benchmarks have only a few congested bins, indicating that routing congestion is not a critical problem for designs up to a few hundred cells. As the design size grows beyond a thousand cells, routing congestion starts becoming a critical problem, as indicated by increased track overflows for benchmarks such as IDC, C6288, and C7552. In these cases, the congestionaware mapped netlists have been able to reduce the track overflows by 87%, 43%, and 29% while the number of congested bins has decreased by 81%, 65%, and 25%, respectively. Based on the increase in average congestion for all of the benchmarks, accompanied by a While 90 nm and subsequent process generations have a large number of metal layers, the upper layers are usually reserved for global signal, global clock and power distributions, leaving block synthesis to operate in the lower layers [12] . reduction in the number of congested bins and the number of track overflows, we see that congestion-aware mapping tends to map the logic so as to distribute the congestion from densely congested regions to the sparsely congested regions.
EXPERIMENTAL RESULTS
Subject graph
Congestion
For large benchmarks, the wiring distributions obtained after global routing showed significant improvements as a result of our congestionaware technology mapping flow. The improvement in the wiring distribution is best exemplified by a reduction in the incidence of detours on the routes, where we define the detour of a route as the difference between its actual length and the total size of its minimum spanning tree (MST 5 ). ure represents the number of nets in the conventional (congestionaware) netlist for a given detour range. It can be observed that for shorter detour ranges number of nets in the congestion-aware netlist dominates their conventional counterpart, while as the detour length increases, the number of nets from the conventional netlist dominates that in congestion-aware netlist. Although the total number of wires increases in the congestion-aware case, most of this increase occurs at short wire lengths, as seen from the figure. Figure 8 shows plot of net-length vs. detour length for all the nets in congestion-aware and conventionally mapped netlist for IDC. In the figure, the symbols '+' and '¢' indicate the actual length, in m, of a net belonging to the corresponding detour range, in m, specified on the X-axis, for the congestion-aware and conventionally mapped netlist, respectively. In the figure, a '¢' corresponding to 230 m on the Y-axis and in the column for 70 m on the X-axis implies that there is a net of length 230 m whose detour length lies between 67.5 to 72.5 m in the conventional netlist. It can be seen from the figure that the conventional netlist tends to have longer detours than the congestion-aware netlist, especially on its longer wires. The congestion-aware technology mapping not only tends to reduce the length of the long wires, but also tends to route them with smaller detours (hence, making them more predictable prior to the routing). Figure 9 shows the nets whose length is greater than 100 m, since these are the nets that are usually responsible for the routing problems; '+' and '¢' have the same meaning as in Figure 8 . Congestion-aware mapping tends to reduce the length of the longest wires, as is apparent from a larger population of '¢' as compared to '+' in the figure. This is achieved by allowing the shorter wires to have slightly longer detours as compared to conventional mapping. However, since the predictability of the short wires is usually not a problem, the increased detours of the short wires do not impact the design convergence adversely. Furthermore, the reduction in the detours of the wires under congestion-aware mapping also improves the predictability of their length, delay, load, and repeater requirements prior to routing.
CONCLUSION
We have introduced a distributed metric to predict routing congestion at the logic synthesis stage and demonstrated its fidelity to post-mapping routing congestion. Based on the congestion correlations between premapped and mapped netlists, we have performed congestion-aware technology mapping considering congestion information on the subject graph. Experimental results on a set of benchmarks show a consistent improvement in the congestion and better wiring distributions.
