We empirically assess the implications of fixed terminals for hypergraph partitioning heuristics. Our experimental testbed incorporates a leading-edge multilevel hypergraph partitioner [ 141 [3] and IBM-internal circuits that have recently been released as part of the ISPD-98 Benchmark Suite 12, 13. We find that the presence of fixed terminals can make a partitioning instance considerably easier (possibly to the point of being "trivial"): much less effort is needed to stably reach solution qualities that are near bestachievable. Toward development of partitioning heuristics specific to the fixed-terminals regime, we study the pass statistics of flat FM-based partitioning heuristics. Our data suggest that with more fixed terminals, the improvements in a pass are more likely to occur near the beginning of the pass. Restricting the length of passes -which degrades solution quality in the classic (free-hypergraph) context -is relatively safe for the fixed-terminals regime and considerably reduces run time of our FM-based heuristic implementations. We believe that the distinct nature of partitioning in the fixed-terminals regime has deep implications (i) for the design and use of partitioners in top-down placement, (ii) for the context in which VLSI hypegraph partitioning research is pursued, and (iii) for the development of new benchmark instances for the research community.
Introduction
Hypergraph partitioning research in VLSI CAD has been primarily motivated by the gate-level top-down placement context, which in modern ASIC design methodology can demand extremely efficient and high-quality solutions for netlist sizes exceeding 1 million vertices. New heuristics for hypergraph partitioning are typically evaluated in the context of free hypergraphs, where all vertices are free to move into any partition [4, 21. Every benchmark, and every benchmark result reported in the literature, is for the freehypergraph context. Even when I/O pad locations are specified in the .vpnr or .yal source for early ACMISIGDA benchmarks, the partitioning benchmarks (in .net/.are format; see [l] ) do not indicate how these pads correspond to fixed vertices in partitions.
Our study is motivated by the following observation: In topdown placement, the input to the partitioner is never a free hypergraph. Rather, the input containsfied terminals that arise from the chip I/Os or from the propagated terminals of other sub-problems in the partitioning hierarchy [6, 161. The number of these fixed terminals can be estimated from Rent's rule [15, 51, which states that in a layout with Rent parameter p , on average a block of C cells will have T = k . CP propagated or external terminals. This corresponds
Research supported by a grant from Cadence Design Systems, Inc.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to distribute to lists, requires prior specific permission and/or a fee. Table 1 : Block sizes below which the expected number of fixed vertices due to propagated terminals will exceed a specified percentage (5%, 10% or 20%) of the total number of vertices in a top-down placement when the design has given Rent parameter p . We assume that the average pins per cell in the design is k = 3.5.
to a partitioning instance of C + T vertices, of which T are fixed.
Here, k is a constant equal to the average number of pins per cell, and is approximately 3.5 for modern designs; Rent parameter values for modern designs have been estimated at around 0.68 [5, 171. Table 1 shows the maximum block sizes below which we expect all blocks (in a design with Rent parameter p ) to have a given percentage of their vertices fixed.' We observe that even rather sizable sub-blocks of the design can be expected to have a high proportion of fixed terminals. With this paper, we bring attention to the problem of partitioning with fixed terminals, and demonstrate that unique aspects of the fixed-terminals regime may require new partitioning heuristics Hence, the nature of partitioning in the fixed-terminals regime can have deep implications (i) for the design and use of partitioners in top-down placement, (ii) for the context in which VLSI hypergraph partitioning research is pursued, and (iii) for the development of new benchmark instances for the research community.
In Section 2 below, we empirically assess the implications of fixed terminals for hypergraph partitioning heuristics. Our experimental testbed incorporates a leading-edge multilevel [ 141 [3] hypergraph partitioner and IBM-internal circuits that have recently been released as part of the ISPD-98 Benchmark Suite [2, 11. We conclude that the presence of fixed terminals can make a partitioning instance considerably easier (possibly to the point of being "trivial"): much less effort is needed to stably reach solution qualities that are near best-achievable. Section 3 presents early studies aimed at developing partitioning heuristics specific to the fixed-terminals regime. We study the pass statistics of flat FMbased partitioning heuristics, and demonstrate that with more fixed terminals, the improvements in a pass are more likely to occur near the beginning of the pass. A heuristic that restricts the length of passes -which would degrade solution quality in the classic (free-hypergraph) context -is relatively safe for the fixed-terminals regime and considerably reduces runtime of our FM-based implementations. Section 4 concludes with directions for future work.
Effect of Fixed Terminals on Instance Difficulty
We start by posing our motivating experimental questions, then describe our experimental testbed and protocol, followed by empirical results.
'This assumes that the blocks are in "Region 1" of the Rent parameter fit [IS].
2.1
1.
2.

3.
Experimental Questions
By how much can the presence of fixed terminals affect the solution quality achieved by modern partitioning heuristics? Do partitioning instances with fixed terminals require less effort to "solve well" (with modem partitioning heuristics) than similar-complexity instances without fixed terminals?
Can guidelines be inferred as to the necessary effort required to achieve good partitioning solutions when a given proportion of the hypergraph vertices are fixed?
Experimental Testbed
Our experimental testbed contains implementations of common FM-based partitioning heuristics and standard partitioning benchmarks.
Partitioner
We use an internally developed partitioning engine that implements the multilevel FM approach described in [3] and [ 141. Implementation details generally follow the parameters established in [3] (use of CLIP [7] , heavy-edge matching, clustering ratio, etc.). The partitioning engine does not perform V-cycling as in [14] , since Vcycling was determined to be a net loss in terms of overall costruntime profile of our partitioner.
The partitioning engine achieves solution quality and runtimes on a per-start basis that are somewhat better than those reported for MLc [3] and hMetis [14] in the 1998 paper of Alpert [2] and on Alpert's web page [I] . This is confirmed by the experimental data reported in the next section.
Test Data
We have run experiments with the IBMOl through IBMO5 test cases from the ISPD-98 Benchmark Suite developed by Alpert [2, 11 . We use actual areas of cells, and a 2% balance constraint. Because the cell areas vary considerably in the IBM benchmarks (there are often individual cells that occupy several percent of the total area [l]), there there is little point in doing unit-area studies for the real-life placement context. Moreover, tight balance constraints are more appropriate to the top-down cell placement application.
Experimental Protocol
In our experiments we choose vertices to fix at random from the set of all vertices in the netlist. We either (i) fix the chosen vertices independently into random partitions ("rand" in Figures 1 and 2 ), or (ii) fix the chosen vertices according to where they are assigned in the best min-cut solution we could find for the instance when no vertices were fixed ("good"). For each of the resulting four regimes we fix a number of vertices equal to 0%, 0.1%, OS%, 1.0%, 2.0%, 5%. 10%. 15%, 20%. 30%, 40% and 50% of the total number of vertices in the instance.2 We apply the multilevel CLIP FM engine noted in the previous subsection. A single trial applies this partitioner to the given partitioning instance for 1 , 2 , 4 or 8 independent starts, and returns the best cutsize obtained as well as the number of CPU seconds used. (All CPU times are for a I40MHz Sun Ultra-1 workstation running Solaris2.6.) All of our data represent averages of 50 trials. ' In generating these instances, we incrernenrully fix additional vertices, e.g., all vertices fixed at 1 .O% are also fixed at 2.0%. In the "good" regime, the normalization is to a single constant value (since all instances have fixed vertices consistent with the same good solution), so the shape of the traces is similar to the plot of raw solution costs. However, in the "rand" regime, the raw solution costs increase drastically with the percentage of randomly chosedfixed vertices, and each percentage of fixed vertices corresponds to a distinct partitioning instance. Thus, for each instance in the "rand" regime, we normalize solution costs to the best solution cost seen over all We performed similar experiments where fixed vertices are chosen randomly from the set of identified I/Os (pads) in the netlisL3 However, we do not discuss these results, for several reasons. First, the number of I/Os is typically very small (less than one percent of all vertices). Second, for those percentages of fixed vertices that could be chosen from VOs we could find no difference in any experiment between fixing identified I/Os and fixing random vertices. Finally, for the vast majority of hierarchical block partitioning instances in top-down placement, the fixed terminals do not correspond to chip I/O pads anyway.
Experimental Results
Figures
We also do not show results for the IBM02, IBMW and IBMO5 test cases. This is because the data looks essentially identical to what we have shown for IBMOl and IBM03. Indeed, we have been agreeably surprised by the consistency of our experimental results.
From the Figures, we make the following observations.
e The raw solution costs4 indicate that as more fixed vertices are (randomly) selected and assigned to partitions, the achievable solution cost increases rapidly. This addresses the first experimental question: the presence of fixed vertices matters.
e The normalized solution costs indicate that if the netlist has many terminals fixed in partitions (which is is what we believe distinguishes real-life partitioning instances generated during top-down placement), then the partitioning problem is indeed "easy".
"hen the fixed vertices are chosen from pads in the netlist, the percentage is 4Note that these raw solution costs suggest that our multilevel partitioner is (at limited by the total number of pads and we do not fix any further vertices.
least) on par with [3] [I41 in terms of solution quality.
TCUC;r\r
-When 0% of the vertices are fixed in partitions, more starts (e.g., 4 or 8) are required for the average best cutsize to approach the value that the multilevel partitioner is capable of achieving for the given instance.
-When larger percentages of the vertices are fixed in partitions, fewer starts (e.g., 1 or 2) are required for the average best cutsize to approach the "good solution cost".
-In the normalized traces, the curves are "flatter" (and there is less difference between the 1 -start and 8-start traces) as the percentage of fixed vertices increases.
-In all of our experiments, an instance with 20% or more vertices fixed is essentially solvable to very high quality in one or two starts, i.e., further starts are unnecessary. This suggests that most hierarchical block partitioning instances in placement are easy; recall Table I from Section I.' 0 Runtimes decrease substantially when the percentage of fixed vertices increases; this is expected since the partitioner has less freedom and a smaller number of movable vertices.
0 Solution quality for "good" instances, and runtime for "rand" instances, are non-monotone in the percentage of fixed vertices. We suspect that this indicates "relatively overconstrained" instances where the inflexibility of the instance hurts the ability of the partitioner to find "trajectories to good solutions" more than it helps the partitioner by reducing the solution space. An interesting direction for future work is to attempt to demonstrate this effect. As discussed below, the data also suggest that current partitioning technology is not well-tuned to the fixed-terminals regime. 
Toward Partitioners for the Fixed-Terminals Regime
We now describe preliminary explorations into partitioning methods designed specifically for the fixed-terminals regime. The first subsection presents motivating studies of pass-length statistics. The second subsection gives runtime and cutsize results for a new heuristic variant, using the same fixed-terminals instances described above.
3.1
A motivating observation is that in the absence of sufficient fixed terminals, FM may occasionally produce passes in which nearly every node is moved. Recall that during an FM pass, nodes are moved one at a time until each node has been moved; for bipartitioning, all nodes have been "flipped" when the end of the pass is reached. Then, the best solution found during the pass (i.e., best prefix of
FM Pass Structure With Fixed Terminals
"e benefit from additional starts decreases more noticeably in the "rand" regime than in the "good" regime. Since propagated terminals are not likely assigned to their ideal locations, the benefit from starts in the top-down placement context is likely somewhere between the "rand and "good portraits.
the move sequence) is restored. Any move "undone" in this process has essentially been wasted. Without terminals, FM will occasionally produce passes in which almost no moves are wastedthe pass flips almost all nodes between partition 0 and partition 1.
However, if there are sufficiently many nodes adjacent to fixed terminals, such a "near-flip" is very unlikely to be improving. Table 2 , documents the average number of passes per run, and the average percentage of nodes moved per pass: increasingly higher percentages of the moves in the FM passes are wasted as the proportion of fixed terminals increases. This strongly suggests that in the fixedterminals regime, FM-style heuristics can profitably impose a hard cut-off on pass lengths.
FM Variants With Early-Stop Passes
Since the first FM pass traditionally begins with a random partitioning, many nodes will be moved, regardless of the number of fixed terminals, However, we may limit the number of moves per passafter the first pass -in order to reduce overhead when the best solution found is near the beginning of the pass. Table 3 documents the effects on average cutsize and average CPU time for single LIFO FM starts, when FM passes are cut off after 50%, 25%, 10% or 5% of the moves have been made. For instances without sufficient terminals, early stopping has a detrimental effect on solution quality, but with sufficient terminals no effect on solution quality is seen. In all cases, limiting the number of moves in a pass improves runtime. solution quality. A surprising observation is that current partitioners appear to struggle when faced with only a small proportion (e.g., 5% or 10%) of fixed terminals. Because all terminals are fixed in a "good" location, and because fixed terminals are added only to produce problems with a higher percentage of fixed nodes, any solution for the cases of 20% or 0% fixed is rlso feasible for the case of 10% fixed. The fact that the partitioner produces better results for both the 20% and the 0% cases than for the 10% case may point to a failing of current partitioners.
Conclusions and Open Problems
We have empirically demonstrated a mismatch between the topdown placement context and current directions in VLSI CAD hypergraph partitioning research and benchmarking. We point out how easy the partitioning problem becomes when fixed terminals are present, and how strong this effect is. We believe that there is a great deal of work remaining to be done in the area of extremely fast partitioning for the fixed-terminals regime, i.e., the real-world placement context. Our early efforts have entailed per-pass analyses of j u t FMbased partitioning heuristics, confirming that the presence of fixed terminals limits the improvements in a pass to moves made at the beginning of the pass. Hard cut-offs on pass length -which degrades solution quality in the classical "free-hypergraph" context -is relatively safe in the presence of terminals, and considerably reduces runtime of our FM-based implementations.
An open and rather pragmatic issue is whether faster algorithms can be developed that exploit the presence of fixed terminals in applications such as top-down placement. Further analysis of the effects of fixed terminals may be useful. In particular, it is not yet clear how to measure the "strength" of fixed terminals, or altematively the "degree of constraint" in particular problem instances. While our experiments fix random terminals from known hypergraphs where most vertices have low degree, it is always possible to fix vertices of very high degree to yield qualitatively different problem instances with similar numbers of fixed terminals. Indeed, a bipartitioning instance with arbitrary numberlpercent of fixed terminals can be represented by an equivalent instance with only two terminals, by clustering all terminals fixed in a given partition into correspond to 1,2,4 and 8 starts of the multilevel partitioner. We report raw best solution costs (left column), normalized best solution costs (middle column) and total CPU times (right column) for both the "good" (upper row) and "rand"
(lower row) regimes. In all plots, the given parameter is plotted against the percentage of fixed vertices in the instance. Experimental results for IBM03 test case, actual cell areas, 2% balance tolerance. The four traces in each plot correspond to 1,2,4 and 8 starts of the multilevel partitioner. We report raw best solution costs (left column), normalized best solution costs (middle column) and total CPU times (right column) for both the "good" (upper row) and "rand"
(lower row) regimes. In all plots, the given parameter is plotted against the percentage of fixed vertices in the instance. 
