Survey of partitioning techniques in silicon compilation by Wu, Allen C.H.
UC Irvine
ICS Technical Reports
Title
Survey of partitioning techniques in silicon compilation
Permalink
https://escholarship.org/uc/item/1hv4d2k1
Author
Wu, Allen C.H.
Publication Date
1991
 
Peer reviewed
eScholarship.org Powered by the California Digital Library
University of California
Notice: This Material 
may be protected 
by Copyright Law 
(Title 17 U.S.C.) 
Survey of Partitioning Techniques in Silicon <Jompilatio~ 
by 
Allen C. H. Wu 
Technical Report 91-15 
Information and Computer Science Department 
U niversity of California, Irvine 
Irvine, CA. 92717 
Abstract 
In the silicon compilation design process, partitioning is usually the first 
problem to be investigated because partitioning algorithms form the backbone of 
many algorithms including: system synthesis, processor synthesis, floorplanning, 
and placement. In this survey, several partitioning techniques will be examined. 
In addition, this paper will review the partitioning algorithms used by synthesis 
systems at different design levels. 
)J(\,11 
TABLE OF CONTENTS 
J. Int1·,>duction .......................................................................................................... 1 
2. Defi11it.ion of Problem ........................................................................................... 3 
:). Tecl111iques and Algorithms .................................................................................. 4 
:).1 Ch·crview of Partitioning Methods ................................................................... 4 
:3.2 ( 'onstructive J\.tiethods ...................................................................................... 5 
3 .'2 .1 C~luster Growth ....................................................................................... ..... 5 
:3.2.2 Hierarchical Clustering ............................................................................... 6 
:3.3 It.era.tive Improved Methods ........................................... :................................ 10 
:3.3. l Pairwise interchange ................................................................................... 10 
3.3.2 C~roup migration ......................................................................................... 12 
3.:3.2.1 The Kernighan-Lin Min-Cut Algorithm ............................................... 12 
3.3.:3 Simulated annealing ................................................................................... 16 
4. P a.rtitioning Algorithms in Silicon Compilation ................................................ 17 
4.1 System Synthesis .............................................................................................. 17 
4.2 Processor Synthesis .......................................................................................... 19 
4 .3 Floorplanning .. .. .. .. .. .. .. .. .. .. .. .. . . . . . . . . .. . . . . .. . . . . . . . . .. . . . .. . .. . . .. .. .. . . .. .. . . .. . . .. .. . . . . . .. . .. .. .. .. . 22 
4.4 Placement ......................................................................................................... 26 
5. Conclusions ............................................................................................................ 29 
6. References ....................................................... ~..................................................... 30 
Page i 
LIST OF FIGURES 
Figure l. The Y-chart rerresenta.tion of silicon compilation .................................. 2 
Figure 2. The cluster tree formation . .. ...... ........ .... ...... .................... ...................... ... 7 
Figure 3. Divisive clustering method ........................................................................ 9 
Figure 4. Multi-stage clustering ................................................................................ 11 
Figure 5. Two-way parti tion ..... .. . . .. . . ........ ............ ...................... ............................. 13 
Figure 6. Local search strategy . .. . ... . . ........ .................. .................................. ........ ... 15 
Figure 7. Data closeness calculation ......... .... .............. ...... ....................................... 18 
Figure 8. An example of cluster .. .. .... .. ...... ............................................................... 21 
Figure 9. The slicing tree formation ......................................................................... 25 
Figure 10. Partitioning with external consideration ................................................ 27 
Figure 11. Terminal propagation ........ ..................................................................... 28 
Page ii 
l. Introduction 
Silicon compilation was introduced by Dave Johannsen (Joha79] far assembling 
parameterized modules of layout. The term silicon compilation can be extended to 
define a translation process that starts with a behavioral description and ends with a set 
of fabrication masks. 
To represent the design process of silicon compilation, Gajski and Kuhn [Gajs83] 
in trod uced a. Y-chart represen tation of silicon compilation (Figure 1 ). In the Y-chart, 
the design process is classified into three different dimensions: (i) behavioral, (ii) 
structural, and (iii) georretrical. The translation from behavior to structure is called 
synthesis. There are four levels in the synthesis processes: (i) system, (ii) processor, (iii) 
m:xlule, and (iv) cell [Gajs88]. In system synthesis, the program is decomposed into a 
set of communicating processes. In processor syn thesis, each process is decomposed in to 
a set of microarchitecture modules. In module synthesis, each module is then generated 
in to a set of cells. Finally, in module syn thesis, each cell is decomposed in to gates, 
transistors, and fabrication masks. To obtain information about the physical domain of 
a· design, each synthesis is followed by a translation from the structure into geometry. 
This translation is called :floorplanning, placement and routing, or cell generation. 
In the silicon compilation design process, partitioning is the first problem to be 
in.vestigated because partitioning algorithms form the backbone of many algorithms, 
including those far system synthesis, processor synthesis, :floorplanning, and placement. 
Far example, in system synthesis, designers partition the design into a set of chips 
according to constraints such. as power dissipation, number of pins per chip, chip size, 
and speed. In processor synthesis, designers decompose chips into a set of datapaths 
Page 1 
Structural 
repruentatlon 
Floor planning 
Floor planning ___.,,__.._, 
Module layout___...__.,....___, 
System synthesis 
Proeessor synthesis - - _ 
Cell layout ___.~___.~__.~~' 
Processor ftoor plans 
System floor plans 
Geometrlc 
repruentatlon 
Behavloral 
reprnentatlon 
Figure l. The Y-chart representation of silicon compilation. 
and control units according to different layout architectures and constraints. At the 
placement level, partitioning is used to decompose thousands of objects (gates) in to 
manageable clusters that can decrease the complexity of placement. If the partitioning 
is done effectively, the design process is simplified without sacrificing the overall 
performance. 
In this survey, several partitioning techniques will be examined. In addition, this 
paper will review the partitioning algorithms used by synthesis systems at different 
design levels. Section 2 describes the definition· of the partitioning problem. Section 3 
presents basic partitioning techniques and algorithms. Section 4· describes the 
Page 2 
algorithms that have been used m different synthesis systems. Section 5 contains the 
conclusion. 
2. Definition of Problem 
Pa.rtitioning is the task of decomposing a given set of objects into subsets so that 
(i) each subset contains objects that equal to or less than the given size constraints and 
(ii) the wire connections crossing subsets are minimized. 
Let 
G = (V,E) where V={vJ, i= l..n, be a set of nades and E={e), j= l..m, be a set of 
edges; 
V ={v.} be a subset p of nades; p 1 
S(~) be the number of nades in subset p and S(V) be the total number of nades in 
y. 
' 
C( V) be the size constrain t of V in terms of n umber of nodes; p p 
e . . is the net that connects nade nin V to nade m in V .. 
ns,mJ • J 
The problem of a k-way partition of Gis to partition Vinto k subsets, {V1,V2, ... ,v,J 
which can be formulated as follows: 
k 
and minimizing :E eni,mj 
neV meV 
1, ) 
The partitioning problem was shown to be NP-complete by [Gare79]. Consider the 
problem of partitioning graph G of n nodes into k subsets of equal size m, where km=n. 
There are (n) ways of choosing the :first subset, e- m) ways for the second, and so on. 
m · m 
The number of choices for such a partition is: 
Page 3 
For example, for n=:30, m=lO, and k=3, 101 '.! computations are required to perform 
exhaustive .. search. Beca use the computation time grows at an exponential rate ( with 
the size of partitioning problem), it is impractical to perform an exhaustive search to 
find the optimal partition. Over the years, numerous heuristic methods have been 
proposed for solving the partitioning problem. Often in practical applications, heuristic 
methods can produce good results in a reasonable amount of time. 
3. Techniques and Algorithms 
3.1. Overview of Partitioning Methods 
There are two basic methodologies: (i) The constructive :nEthod and (ii) The 
iterative improve~nt :nEthod [Sang87, Dona88, PrKa88]. The constructive method 
uses a clustering,(aggregation) strategy [Ever74,Spat80,Rome84] to assign one node ata 
time to a partition. The cluster growth process is based on closeness measurements of 
nodes, such as functions and interconnections. Several clustering approaches are 
reported in [Kurt65, HaKu72, Kodr72, ScU172, Schw76, CoCa80, Kang83, OdHa87]. A 
variation of the traditional clustering algorithm is hierarchical clustering 
[Snea57,John67]. Hierarchical clustering uses different closeness measurements to group 
the objects in to clusters at different levels. U sing hierarchical clustering, a cluster tree 
is formed for further analysis. 
The iterative improvement method starts with sorne initial partition, then improves 
the results . by moving nodes between partitions. There are three major iterative 
improvement algorithms: (i) Pairwise interchange, (ii) Group migration, and (iii) 
Simulated annealing. 
Page4 
3.2. Constructive Methocls 
3.2.1. CTuster Growth 
Cluster growth is a constructive method that <loes not need a. given initial 
partition. This approach starts with a nonpartitioned set and operates by selecting 
unplacecl objects and adding them to the proper clusters. Consider partitioning a set S 
n 
of n nodes into two subsets {Sl' S2} such that 1S1 1=1 s2 I = -. Assume tha.t the sizes of 
2 
nodes a.nd weights of nets are equal. The cost function according to a given partition of 
C(S) = ~ enl,m2 
nt.S 
1 
mt.S 
2 
where enl,m2 is the net that connects nade nin sl to node m in s2 
The first step of cluster growth is to select the seed of the partitions. The seed 
nades may be chosen randomly, by the user, or determined algorithmically 
[Schw76,WuGa90] in arder to guide the partitioning process. Next, unplaced nades are 
selected according to cost function C(n), where n is an unplaced node. The internal 
cost I(n) denotes the number of nets between node nin S1 and other nodes in S1. The 
external cost E(n) denotes the the number of nets between node nin S1 and other nodes 
in S2 • C(n) is the difference between externa! and internal costs of nade n such that 
C(n) = F.(n) - I(n). The unpl~ced node with minimal C(n) is chosen and clustered into 
S1. The process is repeated until the number of nades assigned to S1 reaches its size 
constraint. Finally, ali of the unplaced nades are assigned to s;. 
Page 5 
The cluster growth algorithm is easy to implement and fast. However, each 
clustering decision must be made based on the curren t existing cluster information 
without taking into account the global consideration. Therefore, it often produces poor 
results. The cluster growth algorithm is mostly used as an initial partition for an 
iterative improvement method. 
3.2.2. Hierarchical Clustering 
In hierarchical clustering [Snea57,.John67), the objects are clustered into groups. 
The process will be executed repeatedly at different levels to form a tree. Hierarchical 
techniques can be divided into two methods: (i) a.gglomerative and (ii) divisive. 
The agglomerative method proceeds by successively fusing objects into groups. In 
general, cluster analysis consists of three steps: (i) computing the closeness matrix, (ii) 
executing the clustering method, and (iii) rearranging the closeness matrix. First, a 
closeness matrix X = (x .. ) is formed, i=j=l..n, where x .. is the closeness coefficient IJ IJ 
between object i and j (Figure 2(a)). Since the G_loseness between two objects is 
symmetric, only the lower-left half of the matrix contains values. Second, the procedure 
foses individuals or groups of objects which are most similar or closest (Figure 2(b )). 
Third, the closeness matrix is rearranged according to the new cluster configuration. 
Steps (ii) and (iii) are executed repeatedly until no more clusters can be merged (Figure 
2( e)). U sing this method, the clustering tree is formed in a bottom-up fashion. 
The divisive method partitions the set of objects into clusters. The first task of the 
divisive method is to split the initial set of objects into two sub-sets. For a set of n 
objects, there are 2n- 1-1 possible ways to divide n objects into two sub-sets. It is 
impractical to perform exhaustive procedure to find the partitions. One of the most 
Page 6 
a 
a b e d e 
a .. 
e b 5 
e 4 1 .. 
d o @o n 
d e e o 3 o o - a b d e e 
(a) 
a (b,d) e e 
e a 
- fíl e e (b,d) {) e 4 1 
e o 3 o -e 
(b) 
-- .1 
(a,b,d) e e 1 
(a,b,d) 
e © 
e 3 o 
- e e 
(e) 
Figure 2. The cluster tree formation 
Page 7 
feasible of divisive techniques was proposed by MacN aughton-Smith et al. (MacN a64]. 
MacN aughton-Smith introduced a tech11ique of dissirnilarity analysis which measures 
the dissimilarity between each object and the other objects in the group. 
Considera closeness matrix D={dij 1 i=l..nj=l..n} where nis the number of objects 
and dij is the closeness between object i a.nd j. For example in Figure 3( a), there are 7 
objects to be divided into two groups according to the dissimilarity analysis. The initial 
partition is based on the average closeness between each object with the remaining 
objects. For example, the av-erage closeness between groups {1} and {2,3,4,5,6,7} is 
calculated as (10+7+30+29+38+42)/6=26. The average closeness between objects 1, 
2, 3, 4, 5, 6, and 7, and {2,3,4,5,6,7}, {l,3,4,5,6,7}, .. ,{1,2,3,4,5,6}, are 26, 22.5, 20.7, 17.3, 
18.5, 22.2, and 25.5 respectively. Thus, the initial two groups are {1} and {2,3,4,5,6,7}. 
Next, the average closeness of objects in the two groups are calculated as shown in 
Figure 3(b ). For exa~ple, consider object #2 at row 1 in Figure 3(b ), the average 
closeness between {1} and {2} is 10 and between {2} and {3,4,5,6, 7} is 
(7+23+25+34+36)/5=25. The difference of merging {2} with {1} and {3,4,5,6,7} is 15. 
After calculating the average closeness, the maximum difference is 16.4 for object 3. 
Therefore, object 3 is merged with object l. The new groups are {1,2} and {3,4,5,6,7}. 
Repeating the analysis gives the result based on groups {1,3,2} and {4,5,6,7} as shown in 
Figure 3( e). As all the differences are now negative, the partHioning in to two sub-sets 
is completed. The process continues until no more objects can be split. Using this 
method, the clustering tree is formed in a top-clown fashion. 
A variation of hierarchical clustering is multi-stage clustering [DiTh89]. Multi-
stage clustering was proposed by Dirkes Lagnese and Thomas for solving the large scale 
Page8 
Page 9 
1 
1 o 
2 10 
3 7 D: 4 30 
5 29 
6 38 
7 42 
Individual 
2 
3 
4 
5 
6 
7 
Individual 
4 
5 
6 
7 
2 3 4 5 6 7 
10 7 30 29 38 42 
o 7 23 25 34 36 
7 o 21 22 31 36 
23 21 o 7 10 13 
25 22 7 o 11 17 
34 31 10 11 o 9 
36 36 13 17 9 o 
(a) 
Average closeness 
(1) (2) (2-1) 
10.0 25.0 15.0 
7.0 23.4 16.4 
30.0 14.8 -15.2 
29.0 16.4 -12.6 
38.0 19.0 -19.0 
42.0 22.2 -19.8 
(b) 
Average closeness 
(1) 
24.3 
25.3 
34.3 
38.0 
(e) 
(2) (2-1) 
10.0 -14.3 
11.7 -13.6 
10.0 -24.3 
13.0 -25.0 
Figure 3. Divisive clustering rrethod. 
problem of architectural partitioning. This approach performs clustering processes in 
several sta.ges. Each clustering stage is allowed to use a differen t closeness criteria. The 
clustering stages are consecu ti ve and each stage builds on the results of the previous 
stage. A two-stage clustering example is shown in Figure 4. In the first clustering 
stage, the objects are clustered together according to their closeness measurements on 
criterion A. As a result, it produces 4 clusters indicated as a, b, e, and din Figure 4(a). 
In the second stage, the clusters a, b, e, and d cut from the first stage are clustered 
further according to closeness cri terion B (Figure 4(b)). 
The multi-stage clustering approach has two advantages over the traditional 
clustering approach. The first advantage is that this approach decouples the clustering 
criteria over several stages. Hence, it provides a hierarchy of criteria that can apply the 
most important criterion first to ensur~ that the constraints are satisfied. However, it is 
difficult to determine the proper weighting far the various criteria to ensure good 
clustering results. The second advantage is that this approach allows objects to be 
considered as groups rather than as individual objects. Thus, this approach can cluster 
objects using more global considerations. 
3.3. Iterative Improved Methods 
3.3.1. Pairwis.e interchange 
In the pairwise interchange algorithm, a pair of nades are selected from different 
partitions. A cost function determines the e:ffect of interchanging the two nades. If.the 
partitioning cost function is improved, then these two nodes are interchanged. 
Otherwise, the nades remain in their previous partitions. This algorithm results in n( n-
1 )/2 trial exchanges which contributes o(n2) complexity, where n is the number of 
Page 10 
cut-line 
d 
1 2 3 4 5 6 7 8 
(a). Clustering with criterion A 
a b e d 
(b). Clustering with criterion e 
Figure 4. Multi-stage clustering. 
nodes. Several layout systems· (HaWo76, Schw76, IoKi83] have used a pairwise 
interchange algorithm for placement. Pairwise interchange is a simple heuristic method, 
and it is not guaranteed to find even a local optima! solution. 
Page 11 
3.3.2. Group migration 
The group rnigration algorithm is also known as the Kernighan-Lin algorithm (KeLi70). 
The Kernighan-Lin algorithm uses the solution of two-way uniform partitions as the 
basis for solving general partitioning problems. The basic idea is to interchange the 
group of nodes that contribute the maximal partitioning improvement between two 
groups. The Kernighan-Lin algorithm is guaran teed to find a local optimal solu tion. 
Several group migra.tion algorithms have been reported in (FiMa82, Kris84, ScKe72, 
BhHi88, SaRa90). Because group migration algorithms can produce excellent results 
using a small amount of CPU time (linear complexity), this a.lgorithm is widely used in 
many applications. 
Another approach used to solve the partitioning problem is network fiow 
algorithms [ChKu84, WeCh89]. This algorithm is based on the Ford and Fulkerson 
maximum fiow mínimum cut algorithm [FoFu62) for finding a minimum cut between 
two partitions in a network. The major difficulty of using this algorithm is the inability 
to constrain the cut-set sizes. In practice, this algorithm usually generates subsets with 
greatly uneven sizes and hence its applications. are lirnited. However, it <loes find the 
minimal cost for unconstrained (without size constraints for subsets) two-way partitions 
that can be used as lower bound for solutions produced by any partitioning method. 
3.3.2.1. The Kernighan-Lin Min-Cut Algorithm 
The original Kernighan-Lin algorithm finds a minimal-cost partition of a given 
graph of n nodes connected by edges in to two equal subsets of n/2 nodes (Figure 5 ). 
This two-way uniform partitioning algorithm uses heuristics to obtain the minimal 
uniform partition. It interchanges the nodes in cut-set A and cut-set B, and then 
Page 12 
Cut-set A Cut-set B 
cutline 
Figure 5. Two-way partition. 
performs a local search to find a sequence of favorable swaps of nodes from the two 
subsets. 
Consider the partitioning of a set S of size n into two subsets {Sl' S2} with equal 
s1ze of n/2. eab denotes the net connected between node a and node b. Let us define 
the externa! cost Eª where a E S1 by 
E= ~e 
a L..4 ax 
u.S 
l 
and the in ternal cost by 
1 =~e 
a L..4 ay 
Page 13 
Similarly, define Ez,, lb for each b E S2 • Let 
be the difference between externa! and in ternal cost for ali i E S. 
Let Cab denote a double counting correction coefficient if node a and node b 
conne'Ct to the same net. The partition gain ,gainab' by interchanging nades a and b is 
gainab = Da + Db - 2Cab 
The idea of group migration is to search for a favorable group of swaps rather than 
to search for one favorable swap. The Kernighan-Lin algorithm first generates a 
sequence of gains as foliows: 
(1) Calculate D¡ for ali nades E S. 
(2) Choose the pair (a, b), a E S1 and b E S2 , that generates the maximal gain. 
(3) Swap a and b and recompute the D values for ali unswapped nades. 
( 4) Repeat steps 2 and 3, obtaining a sequence of swapped pairs {a,_,b1}, ... ,{an, bn}. 
Once a pair is swapped, this pair will be locked and can not be considered for 
swapping again. 
As a result, the algorithm generates a sequence of gaiils g~, ... ,gainn. The total 
gain from interchanging the set A= {a1 , ... ,ak} with B = {b1, ... ,bk} is 
k 
GAIN(k) = :E gain, 
N ext, the algorithm uses local search to choose a k that maximizes the partial sum 
of GAIN(k). If GAIN(k)> O, the algorithm interchanges the corresponding sets A and B 
and starts the process over again from step l. If GAIN(k):SO, the algorithm stops. 
Based on the sequence of gain" the algorithm produces the swapping-gain function 
Page 14 
f( GAIN(k)) as shown in Figure 6. In t his example, the peak of the total gain is k = 7. 
Thus the algorithm will interchange the first 7 nodes into sets A and B. This local 
search strategy allows the algorithrn to climb out of the local mínima. 
Kernighan and Lin also extended their two-wa.y partitioning technique to perform 
rnulti-way partitions. Consider the problem of partitioning a set S into k subsets such 
k 
that 1 S 1 = I; S¡. The multi-way partitioning algorithrn executes two-way partitioning 
í•l 
n 
repeatedly based on the cut-set size of (CA, C8 ) where CA =S¡ and C8 = ~ Si for 1 < i 
f(GAIN(k)) 
10 
5 
1 2 3 4 5 6 7 8 9 10 
Figure 6. Local search strategy. 
Page 15 
< n-1. Frequently, CA and CE are not equal; therefore, a set of dummy nades are 
added to the original set to allow un balanced partitioning. 
The complexity of the original Kernighan and Lin two-way unifarm partitioning 
algorithm is O(n2logn). However, Fiduccia and Mattheyses [FiMa82] used a clever 
implementation to achieve linear complexity in terms of the number of pins. Dunlop 
and Kernighan [DuKe85] ha ve· compared the Kernighan/Lin and Fiduccia/Mattheyses 
algorithms. They found that the results of Fiduccia and Mattheyses are not quite as 
good as those of Kernighan and Lin bu t that the execution time is substan tially 
reduced. In addition, Krishnamurthy [Kris84] has proposed a look-ahead strategy to 
guide the heuristics achieving more optimal partitions. 
3.3.3. Simulated annealing 
Simulated annealing was proposed by Kirkpatrick [KiGe83] far solving combinatorial 
optimization problems. U sing this technique, it is possible to extricate from a local 
optimal solution and move to a global optimal solution. In physics, an annealing 
process starts by melting a solid and then slowly lowering the temperature to find the 
minimal energy state where a crystal is formed. This same idea can be applied to 
combinatorial optimization. A simulated annealing algorithm can generate moves 
randomly and calculate the new configuration cost ~cii far a move from configuration i 
to j. If ~cij < O then a lower energy level is achieved and the move is accepted. If ~cij 
- de 
IJ 
~ O then the move is accepted with probability e t • As the simulated temperature t 
decreases, the probability of move acceptances decreases. 
Page 16 
Theoretical studies (RoSa85] have shown that simulated annealing ca.n climb out of 
local rninima and find the globally optimal solution. However, it is impractical to find 
the optimal solution by performing an infinite number of iterations atea.ch temperature. 
Severa! heuristics [KiGe83, RoSa84, HuRo86] have been developed to reduce run time. 
Sirnulated annealing usually produces very good results; however, it suffers from very 
long run times. 
4. Partitioning Algorithms in Silicon C-Ompilation 
4.1. System Synthesis 
APARTY (DiTh89] is an architectural partition.er. It uses a multi-stage clustering 
algorithm to extract high level structural information from the behavioral description. 
The high level structure reflects physical design considerations such as interconnects. 
APARTY attempts to examine physical considerations in the early design stage so that 
the syn thesis tools can choose a better design in terms of are a. 
The multi-stage clustering consists of three majar clustering stages: (i) control 
clustering, (ii) data clustering, and (iii) schedule clustering. 
Control clustering. Control clustering groups the operators in the same control path 
together so that the control flow passing between two clusters can be reduced. Control 
flow between clusters is measured based on the probability that control will be passed 
from one operator to another. The con trol closeness between two operators a and b is: 
ce = P( OPb 1 OPª) 
a,b 
where P(OPb 1 OPª). denotes the probability that OPb will be activated by OPª. For 
example, if operators a and b are in the same path without branches then P(OPb 1 
Page 17 
OPª)=l. On the other hand, if operators a and b are on the different paths then P(OPb 
1 OPª)=O. 
Data clustering. Data clustering considers data similarities between in di vidual 
operators. The goal of this stage is to reduce the amount of data passed between two 
clusters. The closeness of the number of common values between two operator clusters 
is: 
Common(a,b) 
= 
V(a)+ V(b) 
where Common( a ,b) measures the n umber of common values between cluster a and 
cluster b. V( a) is the n umber of values flowing to and from a. For example in Figure 7, 
the "+" and "-" operators each ha ve 2 out of 3 connections (B and C) in common. 
A B e B 
e o 
Figure 7. Data closeness calculation. 
Page 18 
Thus the data similarity is: 
= 
3+3 
Schedule clustering. Since the data clustering merges operators according to data 
similarities only, it may prevent sorne operators that share hardware from being 
scheduled simultaneously. As a res1tlt, this leads to a poor schedule. Schedule 
clustering considers the potential low level parallelism to ensure a reasonable schedule 
with minimum hardware. The schedule closeness calculation for two clusters a and bis: 
Cs = CD X (l-INCa,b) 
a ,b a ,b 
where INCa,b is the incompatibility for the clusters a and b. INCa,b measures the 
incompatibility of all the operators in a as compared with the operators in b. If any two 
operators are incompatible, the penalty is calculated as the excessive hardware for 
putting two incompatible operators into same partitions for maintaining the same 
schedule. The data clustering tends to group operators that share data. In the mean 
time, the incompatible measurement tends to push incompatible operators into different 
partitions. 
Based on the information obtained from these clustering stages, the area and delay 
of different designs can be estimated for guiding scheduling, datapath allocation, and 
the selection of busses. The choice of which stages and the order to be run is decided 
by the user. 
4.2. Processor Synthesis 
BUD [McFa86, McKo90] uses a bottom-up analysis of the synthesis process in two 
ways. First, it obtains the physical and logical information about the primitives 
Page 19 
available for use in the design from a data.base [\Volf8G). Second, the data operations 
are partitioned into clusters [McFa83), using a metric that takes into account functional 
unit sharing, interconnect, and parallelism. Each cluster represents a portion of the 
chip. A leaf cluster contains one or more function units. N onterminal clusters contain 
a set of leaf clusters and nonterminal clusters with the interconnects among them. The 
database offers size and timing information for the individual module. For example, the 
fioorplan of Figure 8( a) is shown in Figure 8(b ). Clusters 1 and 2 are leaf clusters which 
consist of a functional unit and storage interconnected by busses. Clusters 3 and 4 are 
non terminal clusters which consist of lower level clusters plus the wires interconnecting 
them. 
BUD groups operations using a matrix that measures the closeness of putting two 
operations into same cluster. The closeness between operations depends on three 
factors: their common functionality, degree of interconnection, and potential 
parallelism. The closeness between operations x and y is defined as follows: 
closeness(x,y) = -S1 X fprox(x,y) - S2 X cprox(x,y) + N X S3 X par(x,y) 
where 
fcost(x)+ fcost(y)- fcost(x,y) 
fprox(x,y) = 
fcost(x,y) 
commconn(x,y) 
cprox( x,y) = 
totalconn(x ,y) 
par(x,y) = 1 if x and y can be done in parallel, and O otherwise 
Here fcost(x,y,z, .. ) denotes the mínimum number of function units required to perform 
ali the operations x,y,z,.. in the list. fprox(x,y) denotes the ratio of the shared 
functionality of x and y. cprox(x,y) is the ratio oí the datafl.ow connections shared by x 
Page 20 
X 1 
A B 
(a) 
r---------------1 
B X ADA 1 
M ~ F ,~ 
+ 
,,.CJ 1 
3 4 " 
..... - - - - - - - - - .J 
(b) 
Figure 8. An example of cluster. 
Page 21 
and y. The S1 and S2 factors are the ratio of the area of the function unit required to 
do operatioii x1 and x 2 to the total area of the design. S3 is the probability of either x1 
or x2 being executed in one major cycle. N denotes the relative weight given by the 
user to speed. 
Based on this closeness function, a closeness matrix is computed. Then, a 
hierarchical clustering tree is formed from these closeness data. Different configurations 
are formed by cutting the tree at d1ifferent levels. Each configuration represents a 
particular hardware configuration. The cluster tree guides the search of the design 
space. The cut line starts at the root and moves toward the lea ves. Each time a new 
cut line is formed, a new design configuration is evaluated in terms of are a and delay. 
The configuration that best meets the design objectives is chosen as the final design. 
This partitioning approach leads to a simple method for systematically exploring the 
space of possible designs to find the optima! design. 
4.3. Floorplanning 
Floorplanning is the first step of VLSI chip design. Designers first partition the 
chip in to macro modules. N ext they determine the areas, relative positions, aspect 
ratios, and I/O pin locations of these modules and try to optimize the overall area 
utilization, power dissipations, and delays along critical paths. 
Many partitioning approaches have been proposed for solving floorplanning 
problems. These approaches can be divided into three groups: (i) cluster growth, (ii) 
connectivity clustering, and (iii) partitioning and slicing. 
Cluster growth. The cluster growth floorplanning method operates in a bottom-up 
fashion. Preas [Prva79] used a clustering method to estímate and define cell shapes. 
Page 22 
Horng and Líe (HoLi81] build the fl.oorplan by starting in the lower left comer and 
clustering cells toward the upper right comer. The cluster growth floorplanning method 
is easy to implement. However, the layout quality is not as good as other methods. 
C.Onnectivity clustering. Dai and Kuh [DaKu86,DaEs89] introduced a connectivity 
clustering method which provides a simultaneous solution of floorplanning and global 
rou ting. Their approach consists of two steps: bottom-up clustering and top-down 
space allocation. 
In the bottom-up phase, modules are hierarchically clustered according to their 
size and connectivity. The cluster size for each level is limited to five for two reasons: 
(i) five is the minimal number of elements necessary to form a non-slicing floorplan 
topology and (ii) the number of different fl.oorplans for five components is 92 which can 
be examined exhaustively. During the clustering stage, the optima! shape, aspect ratio, 
and the information about connectivity among clusters are passed up along the cluster 
tree. At the final cluster level, the chip has at most five components which contain 
clusters formed at previous steps. 
In the top-down phase, ali different :floorplans are evaluated starting at the top 
level of the hierarchy. Since the number of components in each cluster level is limited 
to five, all possible :floorplans can be exhaustively examirred. The fl.oorplan with 
minimal total area is chosen as the final design. This approach demonstrates that 
hierarchical decomposition can simplify the fl.oorplanning problem and produce high 
quality results. 
Partitioning and slicing. Lauther [Laut79] first applied the min-cut partitioning 
approach to place general cells. Later, LaPotin and Director (LaDi86] applied the min-
Page 23 
cut method to salve the floorplanning problem. Using a min-cut method, the 
rectangular chip area is first decomposed to form a slicing tree. To take 
interconnectivity into account, LaPotin and Director proposed an in-place partitioning 
method that is identical to the terminal propagation algorithm [DuKe85) which will be 
described in the next section. The purpose of forming the slicing tree is to represent the 
partitioning hierarchy. A min-.cut partitioning and slicing tree formation example of 
five modules is shown in Figure 9. The slicing tree determines the relative positions of 
modules. After forming the slicing tree, a two-phase traversa! is performed to determine 
the absolute position of the modules. In the first phase, a postorder traversa! is used to 
determine a set of possible floorplan dimensions. In the second phase, preorder 
traversa! is performed to determine the aspect ratio and location of each module in the 
slicing tree. 
Another alternative partitioning approach for floorplanning is: Multi-Terrain 
Partitioning [LuDe89a,LuDe89b ). There are three types of terrains for a datapath chip: 
random logic, datapath stack, and large macros. The multi-terrain partitioning 
approach uses a min-cut algorithm to partition the objects into terrains. Then, it 
evaluates all possible terrain configurations and selects the optimal floorplan. 
Another approach is termed Capacity-Based Partitioning [WuGa90). This 
approach dissects the layout area into area blocks according to the given constraints. 
The algorithm estimates the transistor capacity for each area block, then uses a seed-
based multiway partitioning strategy to assign glue-logic components into area blocks. 
The algorithm runs iteratively and selects the partition with the minimum total area as 
the final fl.oorplan. 
Page 24 
D 3 
~ 
1 
4 I~ 
(a) . Min-cut partitioning 
(b). Slicing tree 
Figure 9. The slicing tree forma.tion. 
Page 25 
4.4. Placerrnnt 
The goal of placement is to determine the positions of components on a layout. To 
place hundreds or thousands of components and successfully satisfy a set of given 
constraints is a very complex problem. To reduce the complexity of placement, 
partitioning approaches are widely used for solving placement problems. Kernighan 
and Lin [KeLi70] developed a two-way min-cut partitioning scheme for general graph 
partitioning. Schweikert and Kernighan (ScKe72] extended the rnin-cut algorithm to 
take in to accou n t special properties of ele e trie al circuits su ch as multi-net connections. 
Based on this min-cut partitioning foundation, man y 
[Sch w76,Breu 77 :Corr79,Laut79,Burs82,DuKe85,SuKe87 ,BhHi88,Hill88] 
reported for solving cell placement problems. 
algorithms 
ha ve be en 
The basic concept of min-cut placement is to partition components into two 
clusters so that the number of interconnections crossing the cut is minimized. The 
min-cut algorithm is executed recursively until each cluster contains only a few cells. 
Using min-cut partitioning, it is not adequate to simply partition components into 
clusters withou t considering the external connections. For instan ce, by swapping no de 
X and node Y (Figure 10), the partitioning cost increases by 1 if the external 
connections between the nodes in the block A and the node X in block B are not taken 
into account. However, the actual partitioning cost decreases by 1 when the external 
connections are taken into account. 
To solve this problem, Dunlop and Kernighan [DuKe85] have proposed a 
modification of the Kernighan an.d Lin min-cut algorithm. They introduced a terminal 
propagation strategy to take the externa! connections into account more accurately. 
Page 26 
block A block B block e 
after partitioning 1 current partitioning sets 
external boundary 
Figure 10. Partitioning with externa! consideration. 
For example, a penalty external cost PEcoat can be· added to calculate the partitioning 
cost. PEcoat can be (i) zero, (ii) negative, or (iii) positive. In case (i) (Figure 11( a)), 
block B and block C are adjacent to block A. Thus, PEco•t can be set to zero by 
swapping node X and· node Y. In case (ii) (Figure ll(b )), if node X connects to block A 
and block D is not adjacent to block A, PEcoat will be made negative by swapping node 
X and node Y (it needs one more extra vertical routing track). In case (iii) (Figure 
11( c) ), if element X .connects to block B and bloc}\ D is adjacent to block A, PEcoat will 
be made positive by swapping node X and node Y (it reduces one vertical routing 
Page 27 
track). 
one improvement to rnin-cut placement is: (i) Quadrisection [SuKe87]. Instead of 
using a bi-partitioning approach, quadrisection partitions the given set along horizontal 
and vertical division lines into four partitions simultaneously. This approach obtains 
results comparable to the simulated annealing approach bu t with a much shorter run 
time and (ii) The Mn-Cut Shuffie [BhHi88) approach tha.t takes in to accoun t of the 
A 
A 
Page 28 
e 
e 
(a) 
c ..... -----. 
D 
(b) 
y 
C..------. 
D 
(e) 
Figure 11. Terminal propagation. 
order of partitioning to achieve a solution with global considerations. 
5. Conclusion 
This survey paper has presented the partitioning techniques used in different VLSI 
design processes. From the variety of partitioning implementations, it demonstrates 
that partitioning methods are suitable for solving la.rge sea.le prohlems. f n the past 
years, partitioning techniques ha ve been widely used in layout syn thesis. However, the 
usage of partitioning techniques in processor and system syn thesis levels is still in the 
germinan t stage. Differen t approaches for system and processor pa.rtitioning need to be 
investigated further. Thus, system and processor partitioning will become one of the 
most active research are as in the years to come. 
Page 29 
6. References 
[BhHi88] 
[Breu77] 
[Burs82] 
[ChKu84] 
[CoCa80] 
[Corr79] 
[DaEs89] 
[DiTh89] 
[Dona88] 
[DuKe85] 
[Ever74] 
[FiMa82] 
[FoFu62] 
[Gajs88] 
[Gajs83] 
[Gare79] 
[HaKu72] 
[HaWo76] 
[Hi1188] 
Page 30 
Bhandarl, I., Hirsch, :vI., and Siewlorek, D., "The Min-Cut Shuffle: Toward a 
Solution for the Global Effect Problem of Min-Cut Placement," Proc. 2.)th 
DAC, pp.681-685, 1988. 
Breuer, M. A., "A Class of Min-Cut Placement Algorithms," Proc. of the 
14th DAC., pp.284-290, 1977. 
Burstein, M., "Partitioning of VLSI Networks," Proc. 19th DAC, 1979. 
Chen, C. K., and Kuh, E. S., "Module Placement Based on Resist.ive 
Network Optimization," IEEE Trans. on CAD, Vol. CAD3, No. :3,pp.:218-
225, 1984. 
Cox, G. W., and Carroll, B. D., "The Standard Transistor Array (star), part 
II: Automatic Cell Placement Techniques," Proc. 17th DAC, pp.451-4.57, 
1980. 
Corrigan, L. I., "A Placement Capability Based on Partitioning," Proc. lGth 
DAC, 1979. 
Dai, W. M., Eschermann, B., Kuh, E. S., and Pedram, M., "Hierarchical 
Placement and Floorplanning in BEAR," IEEE Trans. on CAD, Vol. 8, No. 
12, pp.1335-1349, 1989. 
Dirkes Lagnese, E., and Thomas, D. E., "Architectural Partitioning far 
System Level Design," Proc. 26th DAC, pp.62-67, 1989. 
Donath, W. E., "Logic Partitioning" in Physical Design Automation of 
VLSI Systems (Preas, B. T., and Lorenzetti, M editors), 
Benjarnin/ Cumming, 1988. 
Dunlop, A. E., and Kernighan B. W., "A Procedure for Placement of 
Standard-Cell VLSI Circuits," IEEE Trans. on CAD, Vol. CAD-4, No. 1, 
pp. 92-98, January, 1985. 
Everitt, B., Cluster Analysis, Heinemann Educational Books Ltd., 1974. 
Fiduccia, C. M., and Mattheyses, R. M., "A Linear-Time Heuristic far 
Improving Network Partitions," Proc. 19th DAC., pp.175-181, 1982. 
Ford, L. R., and Fulkerson, D.R., "Flows in Networks," Princeton University 
Press, 1962. 
Gajski, D. D., Silicon Compilation, Addison-Wesley Publishing Company, 
1988. 
Gajski, D. D. and Kuhn, R. H., "New VLSI Tools," Computer, Vol. 16, no. 
12, pp.11-14, December, 1983. 
Garey, M. R., and Johnson, D. S., Computers and Intractability, A Cuide to 
the Theory of NP Completeness, W. H. Freeman and Co., San Francisco, 
Califonia, pp.209-210, 1979. 
Hanan, M., and Kurtzberg, J. M., "Placement Techniques" in Design 
Automation of Digital Systems (Breuer, M. A. editor), Prentice-Hall, Inc., 
Englewood Cliffs, New Jersey, pp.213-282, 1972. 
Hanan, M., and Wolff, Sr., P. K., and Agule, B. J., "Sorne Experimental 
Results on Placement Techniques," Proc. 13th DAC, pp.214-224, 1976. 
Hill, D. D., "Alternative Strategies for Applying Min-Cut to VLSI 
Placement," Proc. ICCD, pp.440-444, 1988. 
[HoLi81] Horng, C., and Lie, NI., "An Automatic/Interactive Layout Planning System 
for Arbitrarily-Sized Rectangular Building Blocks," Proc. 18th DAC, 
pp.293-300, 1981. 
[HuRo86) Huang, M. D., Romeo, F., and Sangiovanni-Vincentelli, A., "An Efficient 
General Cooling Schedule for Simulated Annealing," Proc. ICCAD,pp. 381-
384, 1986. . 
(IoKi83) Iosupovici, A., King, C., and Breuer, M. A., "A Module Interchange 
Placement Machine," Proc. 20th DAC, pp.171-174, 1983. 
[Joha79] Johannsen, D., "Bristle Blocks: A Silicon Compiler," Proc. 16th DAC, 
pp.310-313, 1979. 
[John67] Johnson, S. C., "Hierarchical Clustering Schemes," Psychometrika, pp.241-
254, September, 1967. 
[Kang83] Hang, S., "Linear Ordering and Application to Placement," Proc. 20th DAC, 
pp.457-464, 1983. 
[KeLi70] Kernighan, K. H., and Lin, S., "An Efficient Heuristic Procedure far 
Partitioning Graphs," Bell System Technical Journal, vol. 49, no. 2, pp.291-
307, February, 1970. 
[KiGe83] Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P., "Optimization by 
Simulated Annealing," Science, vol. 220, no. 4598, pp.671-680, 1983. 
(Kodr72] Kodres, U. R., "Partitioning and Card Selection" in Design Automation of 
Digital Systems (Breuer, M. A. editor), Prentice-Hall, Inc., Englewood 
Cliffs, New Jersey, pp.173-212, 1972. 
[Kris84] Krishnamurthy, B., "An Improved Min-Cut Algorithm far Partitioning VLSI 
Networks," IEEE Trans. on CAD, vol. C-33, pp.438-446, May, 1984. 
[Kurt65] Kurtzberg, J. M., "Algorithms far Backplane Formation" in Microelectronics 
in Large Systems, Spartan Books, pp.51-76, 1965. 
[LaDi86] La Potin, D. P., and Director, S. W., "Masan: A Global Floorplanning 
Approach far VLSI Design," IEEE Trans. on CAD, vol. CAD-5, no. 4, 
pp.4 77-489, October, 1986. 
[Laut79] Lauther, U., "A Min-Cut Placement Algorithm far General Cell Assemblies 
Based on A Graph Representation," Proc. 16th DAC., pp.1-10, 1979. 
[LuDe89a] Luk, W. K., and Dean, A. A., "Multi-Stack Optimization for Data-Path 
Chip (Microprocessor) Layout," Proc. 26th DAC, pp.110-115, 1989. 
[LuDe89a] Luk, W. K., Dean, A. A., and Mathews, J. W., "Multi-Terrain Partitioning 
and Floorplanning for Data-Path Chip (Microprocessor) Layout," Proc. 
ICCAD'89, pp.492-495, 1989. 
(McFa83] McFarland, S.J. M. C., "Computer-Aided Partitioning of Behavioral 
Hardware Descriptions," Proc. 20th DAC, pp.472-478, 1983. 
[McFa86] McFarland, S.J. M. C., "Using Bottom-Up Design Techniques in the 
Synthesis of Digital Hardware from Abstract Behavioral Descriptions," Proc. 
23rd DAC, pp.330-336, 1988. 
[McKo90] McFarland, S.J. M. C. and Kowalski, T. J., "Incorporating Bottom-Up 
Design into Hardware Synthesis," IEEE Trans. on CAD, vol. 9, no. 9, 
pp.938-950, September, 1990. 
(MacNa64] MacNaughton-Smith, P., Williams, W. T., Dale, N. B., and Mockett, L. G., 
"Dissimilarity Analysis," Nature, Lond., 202, 1034-1035, 1964. 
Page 31 
[OdHa87] 
[PrKa88] 
[Prva79] 
[Rome84) 
[RoSa84] 
(RoSa85] 
(Sang87] 
(SaRa90] 
(Schw76] 
(ScKe72] 
(ScU172] 
[Snea57] 
(Spat80] 
(SuKe87] 
(WeCh89] 
(Wolf86] 
(WuGa90] 
Page 32 
Odawara, G., Hamuro, T., lijima, K., Yoshino, T., and Dai, Y., "A Rule-
based Placement System for Printed Wiring Boards," Proc. 22rd DAC, 
pp-.777-785, 1987. 
Preas, B. T., and Karger, P. G., "Placement, Assignment and 
Floorplanning" in Physical Design Automation of VLSI Systems (Preas, B. 
T., and Lorenzetti, M editors), Benjamin/Cumming, 1988. 
Preas, B. T., and vanCleemput, W. M., "Placement Algorithms for 
Arbitrarily Shaped Blocks," Proc. 16th DAC, pp.474-480, 1979. 
Romesburg, H. C., Cluster Analysis for Researchers, Wadsworth, Inc., 
1984. 
Romeo, F., Sangiovanni-Vincentelli, A., and Sechen, C., "Research on 
Simulated Annealing at Berkeley," Proc. of the Intl. Conf. on Computer 
Design, pp.652-657, 1984. 
Romeo, F., and Sangiovanni-Vincentelli, A., "Probabilistic Hill Climbing 
Algorithms: Properties and applications," Proc. of the 1985 Chapel Hill 
Conf. on VLSI, PP.393-417, 1985. 
Sangiovanni-Vincentelli, A., "Automatic Layout of Integrated Circuits" in 
Design Systems for VLSI Circuits: Logic Synthesis and Silicon 
Compilation (DeMicheli, Sangiovanni-Vincentelli, Antognetti, editors), 
Kluwer Academic Publishers, 1987. 
Saab, Y. G., and Rao, V. B., "Fast Effective Heuristics for the Graph 
Bisectioning Problem," IEEE on CAD, vol.9, no.1, pp.91-98, January, 1990. 
Schweikert, D. G., "A 2-Dimensional Placement Algorithm for The Layout of 
Electrical Circuits," Proc. 13th DAC, pp.408-416, 1976. 
Schweikert, D. G., and Kernighan,_ B. W., "A Proper Model for the 
Partitioning of Electrical Circuits," Proc. 9th DAC, pp.56-62, 1972. 
Schuler, D. M., and Ulrich, E. G., "Clustering and Linear Placement," Proc. 
9th DAC, pp.475-481, 1982. 
Sneath, P.H.A., "The Application of Computers to Taxonomy," J. gen, 
Microbiol., 17, 201-226, 1957. 
Spath, H., Cluster Analysis Algorithms, Ellis Horwood Ltd., 1980. 
Suaris, P. R., and Kedem, G., "Quadrisection: A N ew Approach to Standard 
Cell Layout," Proc. ICCAD'87, pp.474-477, 1987. 
Wei, Y-C., and Cheng, C-K., "Towards Efficient Hierarchical Designs by 
Ratio Cut Partitioning," Proc. ICCAD'89, pp.298-301, 1989. 
Wolf. W., "An Object-Oriented, Procedural Database for VLSI Chip 
Planning," Proc. 23rd DAC, 1986. 
Wu, A. C. H. and Gajski, D., "Partitioning Algorithms for Layout Synthesis 
from Register-Transfer Netlists," Proc. ICCAD'90, 1990. 
