A new k-way partitioning approach for multichip modules (MCM) is described. We apply an analytical technique combined with a problem-specific multi-way mtio cut method. Our method considers fixed MCM pad positions on the substrate border and assigns the cells to regularly armnged chips on the substrate. For the first time, k-way partitioning results of benchmark circuits with up to 100,000 cells are presented. They show an excellent solution quality in terms of cut nets as well as a low maximum and average number of required chiplevel pads for each chip. The avemge improvement of the number of cut nets compared to a recently published eigenvector-tabu-search approach is about 25%.
Introduction
Circuit partitioning is a fundamental problem in VLSI circuit design since the complexity of electronic systems is rapidly increasing. Physical design is faced with the problem of limited maximum chip size due to manufacturing technology. Thus, the entire system often has to be partitioned into a set of subsystems. These subsystems may correspond to chips, and the entire system can be realized as a MCM, PCB, or FPGA board. A partitioning approach has to consider the special requirements of the target architecture, which are to minimize the cutsize of each partition and to ensure subsystem sizes within specified ranges. The existing partitioning methods can be grouped into two-and k-way partitioning algorithms.
The two-way partitioning algorithms include iterative improvement procedures and analytical techniques. The iterative improvement procedures contain the mincut method [1, 2] , which starts from an initial partition and iteratively reduces the cutsize by exchanging or moving cells or groups of cells. The ratio cut method [3] considers the balance of the partitions in the objective function when selecting the cell to be moved. Analytical techniques were proposed early for partitioning [4] as well as for placement [5, 6] . They include spectral methods [7] and placement-based techniques [8] . These algorithms keep the global view and obtain impressive partitioning quality.
Early k-way partitioning approaches applied recursive min-cut [l] . Later, the method of Fiduccia and Mattheyses was generalized and improved for k-way partitioning [9] . A four-way min-cut technique called Quadrisection [lo] was applied in cell placement. Spectral methods were extended to k-way partitioning [ll, 121 by calculating several eigenvectors. Hybrid k-way partitioning algorithms combine eigenvector methods and tabu search [13] .
To apply these approaches to MCM partitioning is problematic, since some important constraints have to be considered. Many approaches [7, 8, 11, 12, 13] cannot handle fixed module-level pads, which connect the MCM to external signals. Furthermore, many approaches [7, 8, 11, 12, 13] assign the cells to chips, but the problem of placing the chips on the substrate is not solved. Neglecting the wire length between the module-level pads and the chips as well as the wire length between the chips may result in long chip interconnections with excessive signal delays.
In this paper, we will focus on the problem of MCM partitioning. Our intention is to create a partitioning method which considers the connections between logic cells and fixed module-level pads on the substrate border during partitioning. Our algorithm is driven by the specified target architecture, defined by the number and the arrangement of the chips on the substrate. This is achieved by assigning the cells to chips, which have a priori known locations on the substrate. Our approach reduces the cutsize of all chips and guarantees chip sizes within specified tolerances.
We propose a k-way partitioning approach iteratively applying the following three steps. First, we calculate a 2-dimensional embedding for all cells by minimizing a linear objective. Next, each partition to be divided is initially partitioned using an adapted ratio cut metric. In the final step, the current partitioning is improved applying a multi-way ratio cut method with a problem-specific objective function. These three steps are repeated, until the number of obtained partitions is equal to the number of chips on the target architecture.
The remainder of our paper is organized as follows. The next section presents the addressed target architecture. Section 3 describes our new k-way partitioning approach, which applies an analytical technique combined with a multi-way ratio cut method. In Section 4, results of benchmark circuits with up to 100,000 cells are presented and discussed.
The Target Architecture
Our approach centers on the design of r x c matrix type MCM target architectures with r rows and c columns of chips of almost the same size, regularly arranged on the substrate. In Figure 1 We assume that the module-level pads are fixed and located on the substrate border. The chips do not necessarily have to be squares but can be of any rectangular shape providing approximately the same area. The chip sizes may vary between a certain designer-specified size deviation.
K-Way Part it ioning

Overview
On each level of our approach, we calculate a 2-dimensional embedding and do an initial 2-way partitioning of all existing partitions which have to be further partitioned. The obtained partitioning is improved by a new multi-way ratio cut procedure with a special objective function for MCM design. All levels of a possible partitioning scheme are shown in Figure 2 for a 3 x 3 target architecture. After calculating the Next, the solution quality is improved by applying the multiway ratio cut method. On the 2nd level, the partitions of the 1st level are maintained during the calculation of the embedding and the remaining partitions are partitioned again. These three steps are repeated, until each partition corresponds to one chip of the target architecture.
Calculating the Embedding
We model the circuit as a hypergraph H = (VI E') with vertices V representing the cells and hyperedges E' representing the nets. This hypergraph is transformed into a graph G = (VI E ) by mapping each hyperedge in the set E' into a set of binary edges. To perform this m a p ping, we apply the well-known clique model [7] . The obtained graph G may be described by an n x n adjacency matrix M = [mij] , n being the number of cells of the circuit. The matrix elements m;j are calculated as the sum of the edge weights of all edges connecting the vertices i and j . Let e;; denote the degree of vertex i, (i.e., the sum of the weights of all edges incident to vertex i ) e;j = 0 for all i # j , and we obtain the n x n diagonal degree matrix E. Now the Laplacian C is given by C = E -M.
Since the objective function we want to minimize can be separated in x-and y-direction, we consider only the 2-component in the following. The vector x =
[xl, .. . , q, xjI . . . I xnIT E R" contains a coordinate for each cell i . We formulate our objective function:
(1) To minimize a linear objective function we adapt matrix C during the optimization process according to the adaption scheme proposed by Sigl et al. [14] . Since our problem considers fixed module-level pads on the substrate border, the coordinates of the fixed module-level pads in the vector x are constant and some quadratic terms in equation (1) become linear or constant. By omitting the constant term, we obtain:
Matrix C' is equivalent to matrix C except that the elements of the linear and constant terms are removed. In the subsequent partitioning steps, the cells are assigned to partitions. To consider these partitions during the calculation of the next embedding, the center of gravity of all cells of each partition is fixed to the center of each partition. The centers of the partitions on the lch level form the CO By solving this problem for x-and y-coordinates, we obtain a 2-dimensional embedding of all cells. Since on all levels the embedding of all cells is calculated simultaneously, the cells of different partitions can influence each other.
Initial Partitioning
After the embedding is calculated, each partition containing cells for more than one chip is divided into two parts. This is done in that direction (z-or y-direction) the partition has to be further partitioned, whereby both directions may be allowed. Possible regions where a partition may be cut, are located between the rows or the columns of the chips in the partition. The centers of these cut regions are defined by the cut lines which would lead to absolutely equisized partitions. Tolerance intervals, which specify a certain permissible deviation U (in percent) from equisized partitions, establish cut regions around these cut lines This original ratio cut objective (4) favors balanced partition sizes. This is achieved by the parabola
( 5 )
with A = A,, + A,, = const., in the denominator, which reaches its maximum when the partit.ion sizes are absolutely equal. In our approach however, intermediate partitions do not have to meet this relation but may be quite unbalanced. To provide equal chances for all possible cut regions, we move the vertex of the parabola in the denominator to the point defined by for the desired size ratio a12 = # c h i p s ( p , ) +~c~p s ( p 2~ #chips P I each tolerance interval. The number of chips included in partition p is denoted by #chips(p). We divide Dorg(Ap1.) into two parts with one common point and gradient in the vertex at a12 . A , we get the following continuously differentiable den om i n a t or, and obtain the adapted ratio cut objective ( A R C ) :
Finally, the set of cells V, is partitioned where the minimal adapted ratio cut value within all cut regions of all possible cut directions is obtained. Figure 3 presents a partition to be cut in y-direction containing cells for 3 chips, showing the mapped cut regions, two ratio cut diagrams, and the minimal adapted ratio cut. 
Multi-Way Ratio Cut
On all levels of our approach, the partitions obtained after initial partitioning are improved by applying our new multi-way ratio cut algorithm. This approach is an adaptation of the two-way ratio cut [3] and the Quadrisection algorithm [lo] to MCM design.
The algorithm starts by selecting the partition with the highest cutsize as central partition. All partitions adjacent to the central partition are called neighbor partitions. Two partitions are adjacent if they have a common border. Cells can either be moved from the central partition to any neighbor partition or vice versa. We perform a simultaneous ratio cut with all participating partitions. As proposed for the two-way ratio cut [3], we use a bucket list data structure [2] to maintain cell gains. The gain of a cell is the number of nets by which the cutsize would decrease if the cell is moved from its current partition to a destination partition. From all cells with the highest gain in the bucket lists of the current move directions, the cell with the highest ratio cut gain ( R C G ) is selected, preliminarily moved to its destination partition, and locked, if the size tolerances of the partitions can be maintained. The ratio cut gain is calculated by a special problem-specific objective function as described in Section 3.4.2. Updating the bucket lists and repeating the previous step, we obtain a sequence of cells to be moved from their source partition to a destination partition. When either all cells are locked or any further movement would violate the partition size specifications, the cells of that sequence which achieves the minimum cutsize are actually moved to their destination partitions. This moving of cell groups is applied until no further improvement can be obtained [2] . Subsequently, from all remaining partitions, the partition with the highest cutsize is selected as the next central partition. The iterative improvement procedure is finished , when each partition has been selected as central partition exactly once.
The New Multi-Way Ratio Cut
Finally, the center of gravity of all cells in each new partition is assigned to the center of the partition to maintain the partitioning during the next embedding step. One step of the algorithm is shown in Figure 4 .
neighbor partitions Overview of the multi-way ratio cut The advantage of this approach is, that not only the cutsize of two partitions just obtained from one initial partitioning step can be reduced, but also previously divided partitions can exchange cells, if they are adjacent. This combines the advantages of a k-way partitioning algorithm [lo] and the ratio cut objective [3].
A New Problem-Specific Objective
The cell with the highest ratio cut gain ( R C G ) is selected to be moved to its destination partition. The ratio cut gain is calculated by a new problem-specific objective function, derived from the adapted ratio cut objective ( 6 ) .
Since the total size A = A,, + A,, of two adjacent partitions may differ for each participating pair of partitions the adapted ratio cut ( A R C ) will be different for a cell to be moved to partitions of different size.
This effect is illustrated in Figure 5 . p2 or moved to p 3 . Then, the adapted ratio cut for a move of the cell to partition p 3 will be smaller than the adapted ratio cut for the move to partition p2, as the denominator D(A,, ) has different maximum values for the two move directions, since the total size A = A,, + A,, of the partitions p l and p2 is much smaller than the total size A = A,, + A,, of the partitions p l and p 3 . To provide equal chances for all move directions, we scale the denominator with A2 and obtain the scaled ratio cut objective ( S R C ) : In the left example the cutsize will decrease by one and the number of chiplevel pads will also decrease by one if one cell is moved to the other partition. In contrast to that, in the right example, the cutsize will also decrease by one, if one of the cells is moved to the other partition, but the number of chiplevel pads will decrease by two. Thus, the reduction of the cutsize is the same in both cases, but the reduction of the number of chiplevel pads is higher in the right case. To favor a reduction of the number of chiplevel pads, we substitute the cutsize in the numerator of equation (7) by the sum of the number of chiplevel pads (sum-of-clpads) in the source and destination partition scaled by the sum of the number of chips (#chips(pl) + #chips(p2)) currently contained in these two partitions and obtain the pad oriented scaled ratio cut objective ( P S R C ) :
cutsize and the number of chiplevel pads Finally, we calculate the ratio cut gain ( R C G ) by subtracting the density considering pad oriented scaled ratio cut ( D P S R C A ) after the cell is moved from the DPSRCB before the cell is moved and obtain:
This objective is used by our approach to select the cell to be moved with the maximal RCG from all cells in the bucket list data structure with the highest gain. Areibi and Vannelli presented results for two-way, fourway, and six-way partitioning allowing each partition to have up to 10% more or less than the equipartitioned number of cells. To be comparable, we generated results which also meet this size constraint. As our multiway ratio cut approach differs from traditional ratio cut for at least three or more participating partitions, we computed results for 2 x 2 and larger target architec- For all circuits and both target architectures our approach outperforms EIG-TS in seven of eight cases. The results range from 11.67% deterioration up to 65.63% improvement. For four-way partitioning our approach is always superior to EIG-TS with an avera.ge improvement of about 25%. For six-way partitioning the result of SEAPART is worse for the smallest circuit primaryl, but for the other circuits SEA- PART obtains better results and we obtain an average improvement of about 26%. Although our approach takes fixed module-level pads positions and predefined chip locations into consideration, on the overall average, our approach yields about 25% improvement. This outweighs the increased computational effort. A closer look reveals that the results produced by our approach tend to get better with increasing circuit size. Thus, our approach seems to be promising for the partitioning problem of very complex designs.
Results for Large Circuits and Addit ional Target Architectures
Since our intention is to deal with very complex circuits, we additionally tested our approach on the largest benchmark circuits available. As in MCM design the number of chiplevel pads required for each chip is the most restricted resource, the number of cut nets (NCN) is of minor significance. Hence, we examined our approach with respect to the number of chiplevel pads required for the partition with the highest cutsize (MAX) and the average number of required chiplevel pads per chip (AVG). Furthermore, we selected some additional target architectures to evaluate the performance of our approach. The maximum size deviation is still set to 10%. Tables 2 and 3 show the partitioning results for 2 x 2, 2 x 3, 3 x 3, 3 x 4, and 4 x 4 type target architectures. Obviously, the maximum number of required chiplevel pads rises for larger designs, except for the two auq circuits, where the results seem to be surprisingly good.
Compared to the average number of required chiplevel pads for each partition, the maximum is usually close to the average, at least for the smaller architectures.
To evaluate the efficiency of the multi-way ratio cut method, we compared the results of SEAPART to our approach without the multi-way ratio cut procedure.
SEAPART obtains an average improvement of 13.3% for the number of cut nets. The maximum number of required chiplevel pads is reduced by 12.4% and the average number of required chiplevel pads per chip by 11.9%. Thus, on an overall average, the multi-way ratio cut method improves the initial partitioning by 12.5%. The computational effort for the iterative improvement procedure is about 10% of the total computation time.
Thus, the computation time for the improvement procedure is moderate compared t o the computation time to calculate the 2-dimensional embedding and the initial partitioning. Total computation time ranges from a few seconds for small circuits up to about 14 hours for the 100,000 cell design and the 4 x 4 target architecture.
Conclusions
We developed an efficient partitioning method for kway partitioning. The partitioning is based on the calculation of a 2-dimensional embedding. After initial partitioning, the quality of the partitioning results is further improved by applying a multi-way ratio cut
