A new technique, named SHARP, is presented for the partitioning of VU1 integrated circuits. SHARP is a hillclimbing heuristic that is designed to be incorporated into a partitioning-based placement algorithm.
INTRODUCTION
The physical design process for VLSI circuits is often one of hierarchical decomposition. At all levels of,the hierarchy, an important design step is partitioning the atomic circuit elements that compose the functional unit into a physical package. The physical package is realized typically as a collection of subpackages or modules that are chosen such that together they optimize some predetermined figures of merit. The principal figures of merit are usually concerned with one or more of the following values: number of modules, size of modules, number of external connections required by any module, system delay [6,131.
Circuit partitioning research has concentrated primarily on the mincut partitioning problem which divides a circuit into two roughly equal-sized partitions in a manner that minimizes the inter-module connections. These research investigations have produced a variety of circuit element migration techniques that iteratively transform a given solution [5,6,7l. While these solution methods are primarily p x d y heuristics, they all use hill-climbing techniques to varying extents.
Min-cut partitioning methods have proven to be quite effective for their traditional application of circuit packaging, where a package is characterized by its size and its number of external terminals. This success led researchers to apply the method to other physical design problem -most notably VLSI circuit placement [ 1,4,9,10].
The goal of the placement step is to optimally position circuit elements onto a layout surface. The positioning of a circuit element has two basic components -one is to determine its location, and the other is to specify its orientation. An optimal assignment is typically one which allows the interconnection activity to automatically achieve its goals. These goals are often over-constrained and almost always include minimizing the total wire length and layout surface. Although placement is clearly a problem of at least two dimensions, min-cut partioning methods can produce a solution in the following manner.
Apply the partitioning method to construct two partitions. Elements in different partitions are constrained to lie in different halves of the package.
The algorithm is applied recursively and separately to the two partitions. The recursion terminates when no partition has more than one circuit element in it. The technique is depicted graphically in Figure 1 .
Placement researchers realized that the above method is too simple. To achieve acceptable solutions, the subproblems cannot be dealt with in isolation. Suaris and Kedem's experiments with quadrisection indicate that the method's solution quality is competitive with TIMBERWOLF, while running in only a tenth of the t i m e . Yet, in spite of their generalization, some problems n?main. For example, simultaneous congestion balancing is limited to quadrants in different halves although the preference in practice is to balance cuts on opposite sides of the same half (e.g., a standard cell channel has uniform height in most design methodologies). As another example, the estimate of the muting m a required by a net remains aude as the coarseness of the partitioning allows only straight-line, single bend, and horseshoe connections (with rotations) to be considered.
To overcome these problems, we propose a new partitioning method that is more strongly influenced by the geometry of the layout surface. It is tuned for intra-package connections rather than inter-package
The method is named, SHARP, as the layout circuit surface is decomposed geometrically into nine regions in a manner that resembles a musical '#. This is demonstrated graphically in Figure 2 
THE SHARP-LOOKING PHILOSOPHY
The SHARP decomposition was selected as it is the smallest, nontrivial, symmetric decomposition that allows contiguous regions of the circuit surface that share similar routing features and problems (e.g. congestion) to be processed as unit. This property ensures that all its computations are readily tractable. For example, in determining the preferred interconnection given a net's block decomposition, every minimum-length Steiner tree form can be considered for the net. There are on average less than five such Steiner forms per decomposition and no net decomposition requires the consideration of more than 192 different Steiner tree forms. Similarly, in determining favorable, alternative decompositions for a net after moving one or more of its circuit elements from one block to another, there are on average only two new Steiner tree forms that need be considered. Also, since the total number of minimum-length Steiiter tree forms is less than three thousand, these forms can bc precomputed once and used via a hashing or an appropriate indexing scheme.
The trees given in Figure 3 are the six possible minimum length Steiner tme forms corresponding to the given block distribution of terminals.
As most partitioning algorithms are one-dimensional in nature, their optimization function consists of a single criterion, and as noted above, partitioningbased placement algorithms use the min-cut criterion. However, the true principal figure of merit for evaluating placement quality is layout surface size. Once the circuit elements have been chosen, this d u c e s to minimizing the routing region. For most design methodologies, minimizing the routing region has two primary components: minimizing total wire length and minimizing channel height. Therefore, it is these two criteria that SHARP uses to evaluate parti- Just as SHARPS optimization function is more complete than the min-cut criterion, so is the SHARP solution itself. Besides returning an assignment of circuit elements to partition blocks as a standard partitioning algorithms does, SHARP also returns a suggested Steiner trpe form for each net to achieve the optimal expected use of the layout surface. This additional information makes it easier for a SHARP-based placer to incorporate a global router.
In the section below, we describe in further detail a partitioning algorithm based on the above SHARP concepts.
SHARP PARTITIONING
The basic SHARP algorithm is given in Figure 4 . As shown the=, the algorithm is a greedy one that essentially alternates between improving the two wire usage components. We found that this alternation stmngthened SHARP'S hillclimbing abilities.
The initial partition is constructed using a simplified clustering algorithm 121. However, we are also
ClO.
algorithm compute Steiner tree forms construct initial partition for each net U do assign to U one of its minimum length Steiner trees end while partition quality is improving do perform net length minimizing circuit element movements perform congestion reduction through alternative minimum Steiner tlre selection perform congestion reduction through circuit element movements end perform congestion reduction through alternative Steiner tree selection end considering alternative constructions using techniques such as a genetic algorithm [31. The initial Steiner tree form is selected randomly from one of minimum length forms.
Net length minimizations are performed iteratively. During each iteration, the circuit element cluster, E, whose inter-block movement induces the greatest reduction in wire length is relocated to the desired block. The circuit elements in ' E are then frozen in that block for the remainder of the step. In addition, non-frozen circuit elements that share a net with a circuit element in 2: have their inter-block preferences updated.
During the next two steps, the congestion map is examined to see if better balancing can be achieved. In the first of these two steps, alternative minimum length Steiner tm forms are considered for the various nets. Since no module movement is being done here and since only minimal length Steiner trees are considered, the effect on the wire usage is limited to improving congestion (i.e., thee is no increase in the wire length component). As in the net length minimization step, a priority ordering is established -nets are examined in an ordering based on the amount of possible congestion improvement. The second congestion reducing step uses circuit element movement to improve solution quality. As in the net length minimization step, the circuit elements are examined in priority order, and are frozen for the rbep once they have been moved. However, in this the c h i t elements are selected with respect to possible congestion improvement rather wire length Qnovement. Since circuit element moves am being made with respect to congestion improvement, this can increase total wire length. Similarly, the net hgth minimization step can i n m s e the total congestion. During both circuit element movement steps, it may be the case that the cumntly most desirable circuit element move would cause a partition block to be overloaded. Such overloading is initially permitted, but the amount of overloading is reduced with each pess of the loop. As a further hillclimbing feature, SHARP can be configured to use multiple priority queues so that the best feasible circuit element move is performed. It can also be configured to the find the best feasible pair or even the best feasible chain of circuit element moves.
The final step of the algorithm also attempts to improve (reduce) the congestion. Unlike the previous omgestion improvement steps, SHARP does not quire that the alternative Steiner tree forms be of nrinimum length. Although the number of such tree brms increases, the number remains practical and de computation cost is worth the increase in solution quality. For example, on average there are less than 5 minimum length Steiner tree forms and approximately 50 non-minimum length distinct Steiner tree farms per net block decomposition with a maximum number of 192 distinct forms per decomposition.
The running time of the partitioning algorithm is dominated by the cost of the while loop. Since this bop only iterates several times in practice, the expected running time of the algorithm is proportional to the cost of a single pass. While it is true that IY) more than m movements can be made in either of the h i t element movement steps, where rn is the number of circuit elements, the priority of a circuit dement can change multiple times. Using analysis similar to Fiduccia and Mattheyses [51, we can demarstrate that the total number of priority queue operafions is on the order of p, where p is the total number 04 t e n n i~l pins. Since the maximum number of minimum length Steiner trees per net block decomposition is independent of the circuit instance (i.e., a constant), the total work performed as a result of circuit tlement movement or alternative Steiner tree selection also remains proportional to p. Thus, the running time of a loop iteration is proportional to the taaor p log m, since priority queue manipulations (e.g., insertions, deletions) are readily done in logarithmic time. 
Method

SMC SHARP
