High quality placement results are always produced at the cost of significant runtimes. In this paper, we study the tradeuff between the overall quality and the runtime for standard-cell placement problem. We implemented and studied a class of schemes to achieve the mntime vs. quality trade-off. We developed a new tradeuff oriented placement tool (TOOP) which is controlled by decision trees. TOOP can adjust itself based on user's requests and netlist properties. Compared to Cadence QPlace, even the fastest mode of TOOP (lowest quality) can produce placements with similar or better layout. TOOP also shows much stronger ability to produce routable placement when compared to Capo.
INTRODUCTION
Standard-cell placement is a fundamental problem in the VLSl physical design area. It has been drawing massive attentions in the VLSI CAD field for more than twenty years. Even the most classical placement problem (minimizing total wirelength) is still a very active topic among researchers [ l , 2, 3, 4, 5, 71. Being an NF-hard problem, the placement problem is unlikely to be solved optimally within a reasonable amount of time. On the other hand, the problem size keeps the exponentially increasing trend, which makes old placement heuristics less and less effective. In order to handle current multimillion gates design, a state-of-the-art placement twl typically consists of several heuristics with each of them focused on a special subproblem. There are two essential aspects to consider in a VLSI design process: quality and time to complete the design. Ideally, designers would love to have designs with the highest quality obtained within the shortest amount of time. However, as the problem itself k i n g NP-hard, quality and time inevitably trades off each other. A good placement tool should be very flexible to adapt itself based on different requests from designers.
Almost all existing placement research works focused on improving the quality orland speeding up the runtime over an existing algorithm. Almost none has been documented in a way of adaptively controlling the trade-off between quality and runtime. In this paper, we develop a trade-off oriented placement tool (TOOP). We seek to establish an adaptive way for our placer to perform placement based on both requests from users and the properties of the netlist.
The actual placement process is controlled by decision trees.
The rest of the paper is organized as follows. Section 2 briefly describes the framework of our trade-off oriented placement tool. In section 3, several blocking schemes are introduced and the decision trees are described to control our placement tool. The experimental results are shown in section 4 by comparing our placement tool with Cadence QPlace and an academic placement tool Capo [2] . Section 5 is the conclusion.
FRAMEWORK OF TOOP
We use a high-quality academic placement tool. Dragon [SI as the barebones of our trade-off oriented placement system. Components and controls will be added to Dragon to achieve the desired trade-off between quality and runtime. Dragon consists of several parts including partitioning, clustering, simulated-annealing based optimization and local greedy improvement, etc. Since we use Dragon to construct our trade-off oriented placement tool, it is worthwhile to spend a paragraph here to briefly review how Dragon works.
Dragon works in a top-down fashion. At each hierarchical level, quadrisection is performed topologically on the netlist as well as physically on the layout area. At the top level, the layout area is divided into four areas and we call these areas "bins". Meanwhile, the netlist is partitioned into four cell clusters with similar size to minimize the interconnections between them. The next step is to put clusters into bins. Simulated annealing is used to minimize the total wirelength. This phase is called "bin annealing".
These steps are recursively performed until each cluster only contains a small number of cells. The "cell annealing" phase is performed after bin annealing. It moves single cells around bins to further reduce wirelength. Finally, the detailed placement phase removes overlaps between cells and greedily improves the final wirelength. Our new trade-off placement tool is constructed on top of Dragon. As shown in Fig. 1 All results shown are obtained by running experiments on a SUN Ultra10 workstation with a 400MH.z CPU. TOOP is tested on selected benchmarks from the IBM placement benchmark suite. The placer reads LEF/DEF tiles and outputs placement results in the DEF format. The Cadence WarpRouter is used to read the placement and perform the routing. The properties of the benchmarks we used are summarized in Table 1 .
Table 1. Properties of circuit used in experiments
In most top-down pa-titioning based placement algorithms, interactions between partitions is prohibited.
Once a cell is assigned to a certain partition, it will stay there throughout the whole placement process. This helps to solve the original problem in a true divide-and-conquer manner. However, an obvious disadvantage for this approach is that a cell's position is restricted by a decision which is made early in the process. To address this issue, Dragon allows cells be moved between partitions.
-- 
interconnection-Aware Blocking Scheme
The pre-fixed blocking scheme can e.ffectively reduce the size of the solution space for each cluster placement problem. On the other band, it blindly posts artificial regulations on clusters to restrict where they can be moved. A better way would be determining the block size and shape dynamically to minimize the interconnection between them. When minimizing interconnections, we could have arbitrary shaped blocks. To validate this idea, we implement the interconnection-aware scheme by making two simplifications: I). All the blocks are rectangular. 2). each block will be crossly cut (meaning one horizontal cut followed by one vertical cut) to get smaller blocks at the next hierarchical level. This approach is illustrated in Fig.  3 . When the horizontal cut is performed, we check all possible horizontal cuts in the current block and pick the one which results in the fewest interconnections at the cutline. The same procedure is performed to do the vertical cut. minimum interconnection Table 3 . On average, interconnection-aware blocking scheme can speed up Dragon by a factor of 1.1 with a quality loss of 7%.
Congestion-Aware Blocking Scheme
Congestion is one of the most important metrics in modern placement problem. In some high-utilized designs, we need to especially be aware of congestion during placement. In this subsection, we propose a congestionaware blocking scheme to help reduce location congestion during placement.
Table 3. Final placement comparison between Dragon and the interconnection-aware blocking scheme
The Cheng'S bounding box model is selected to evaluate congestion during placement in our placement tool [61. To identify the congested area, we define the average congestion in a bin Mi, j) as 
. y=---(-+ -
Where C, , , and Caugh are vertical and horizontal crossings interconnections for each edge of the bin, respectively. They can be obtained by
The purpose of using the congestion-aware blocking scheme is to help placer budgeting congestion distribution well to avoid local congested spot in the final placement. We set the blocks in a way to balance bin congestion. Similar to what we did in the interconneetion-aware blocking scheme, we cut the current block twice (one horizontal cut followed by one vertical cut). The criterion for each cut is to balance the average bin congestion on both sides of the cut-line.
Decision Tree Based Trade-off Oriented Placement Tool
In previous subsections, we introduced several blocking schemes (pre-fixed, interconnection-aware and congestionaware). Each of them has their own focuses and tradeoffs.
To briefly summarize, the pre-fixed blocking scheme is the fastest among these three. The interconnection-aware blocking scheme produces the best placement in terms of final wirelength and is relatively fast. The wngestionaware scheme focuses on solving the local congestion problem in the final placement, but it may lose some placement quality. Based on these properties, we can construct a "Blocking Decision Tree (BDT)" to decide which individual blocking scheme to use for each block at a hierarchical level.
As shown in Fig. 4 , we start traversing BDT by looking at congestion distribution among all bins at this level. If the total amount of congestion exceed a certain threshold, we declare this block as congested and use congestion-aware blocking scheme on this block. If the current block is not congested, we look at the number of interconnections for each bin inside this block. If the total number of interconnections exceeds a certain threshold, we use interconnection-aware blocking scheme, otherwise we use the fastest pre-fixed blocking scheme. Basically, BDT tries to use the fastest blocking scheme (pre-fixed) on blocks it deems as ''easy'' for placer to handle and the most complicated blocking scheme (congestion-aware) on "difticult" blocks. represent each different approaches we used in this tool to achieve the trade-off between quality and runtime.
In our placement tool, there are four possible values for Q (Excellent, Good, Average, Ok) and three possible values for T (Fast, Average, Ok). GDT will pick which approach to use based on the input vector IQ, T).To achieve the trade-off between the quality and runtime, the number of blocks at each hierarchical level is another important factor. The fewer blocks each level has, the longer the runtime is and the better the placement quality is.
Starting at the top node, GDT branches left or right depending on the input vector (Q, T). When Q is set to Excellent, we do not use any blocking schemes to achieve the best placement quality. When Q is set to Good, approach set B is selected. Approach set B makes use of several blocking schemes described in the previous section to optimize both wirelength and congestion. Specifically, the pre-fixed blocking scheme is used to get the first level blocks. Starting from the second hierarchical level, BDT will be used to automatically select which blocking scheme to use. Each individual approach in the set B (Bl, B2, B3) differs from each other by the number of blocks it has at each hierarchical level. The more blocks an approach has, the faster the approach is. When Q is set to Average, approach set C is selected.
Approach set C uses the pre-fixed blocking scheme at both the first and the second hierarchical level. BDT is used after the sewnd hierarchical level. Similar to the approach B set, different approaches in the C set differs themselves by the number of blacks at each hierarchical level. When Q is set to Ok, approach D is selected. Approach D is to use the pre-fixed blocking scheme at all the hierarchical levels and each block only contain one bin. It is essentially the same algorithm as the classical top-down partitioning placement algorithm.
To verify the trade-off between quality and runtime exists in our decision tree, we tested our placer with all input vectors on IBM placement benchmarks. Fig. 6 shows the runtime vs. wirelength curve for each circuit we tested.
The x axis is runtime in a unit of 100 seconds; the y axis is the total wirelength. From Fig. 6 we can see that different approaches used in our placement twls can indeed control the trade-off between runtime and quality. The approach which runs longer produces a final placement with the highest quality (smallest total wirelength). . Compared to Capo (Table 5) , TOOP shows significantly stronger ability to produce routable placement. Capo fails to produce routable placement (without routing violation)
for ibmOl, ibm07 and ibm08. For other tested circuits, even our lowest quality mode in TOOP produces placement with better routing wirelength (by 1%) and fewer number of vias (by 4%) 
