ENHANCING PERFORMANCE OF ITERATIVE HEURISTICS FOR VLSI NETLIST PARTITIONING by Sait, Sadiq M. et al.
ENHANCING PERFORMANCE OF ITERATIVE HEURISTICS 
FOR VLSI NETLIST PARTITIONING 
Sadiq M. Sait, A i m n  H. El-Maleh and Raslan H. Al-Abnji 
King Fahd University of Petroleum and Minerals, Computer Engineering, 
Dhahran 31261, Saudi Arabia 
{ sadiq,aimane,raslan} @ccse.kfupm.edu.sa 
ABSTRACT 
In this paper we, present a new heuristic called PowerFM which 
is a modification of the well-known Fidducia Mattheyeses al- 
gorithm for VLSI netlist partitioning. PowerFM considers the 
minimization of power consumption due to the nets cut. The ad- 
vantages of using PowerFM as an initial solution generator for 
other iterative algorithms, in panicular Genetic Algorithm (GA) 
and Tabu Search (TS), for multiobjective optimization is inves- 
tigated. A series of experiments are conducted on ISCAS-85/89 
benchmark circuits to evaluate the efficiency of the PawerFM al- 
gorithm. Results suggest that this heuristic would provide a good 
starting solution for multiobjective optimization using iterative 
algorithms. 
1. INTRODUCTION 
In recent years, the focus of pariable devices has shifted from 
low throughput devices (e.g., watches, calculators) to high per- 
formance devices like notebook computers, cellular phones, etc. 
Minimizing power is the primary concern for these battery-pawered 
products as for such products longer battery life translates to ex- 
tended use and better marketability. Exploring the tradeoffs be- 
tween power, performance, and other objectives during synthesis 
and physical design is thus demanding more attention. 
The optimization for power consumption can be performed 
at various levels of VLSI design including behavioral level, ar- 
chitectural level, logic level, and physical level. Another com- 
pelling reason for the desire of low power consumption is the in- 
creasing density of VLSI circuits. The present technology allows 
integration of tens of millions of transistors on a single chip and 
the still advancing technology is allowing funher high integra- 
tion. The excessive power consumption of high density circuiu 
results in heating and thus becoming a hindrance towards high 
integration and hence the feasible packaging of circuits [ I ,  21. 
Also, circuits are operating at much higher clock frequency than 
before. Therefore, the power dissipation which is a function of 
clock frequency, is getting significantly prominent. Tkis phe- 
nomenon is offering an obstacle in funher increase of clock fre- 
quency. Due to these reasons, there is an emerging need for min- 
imizing the power requirement of VLSI circuits. For the part- 
tioning phase, two low-power oriented techniques based an Sim- 
ulated Annealing (SA) algorithm have recently been presented in 
[3]. An enumerative optimal delay partitioning algorithm target- 
ing low power is proposed by Vainshav et al. in [4]. 
1.1. FM Partitioning Heuristic 
The FM heuristic is a modification of the Kernighan-Lin group 
migration method for circuit pmitioning. In the FM algorithm, 
all nodes initially in the free set are arranged into a bucket ar- 
ray data structure, in which each bucket contains nodes with the 
same gain. For each move, the node with the highest gain is 
ALGORITHM FM 
=m 
szep1: compvle gains Of cells: 
srrpz: i = 1; 
Selgt 'base celv aod call i t  ci; 
If no base ccll Then Exit E d l $  
A base cell i s  6% one which 
(i) has marimurn gain: 
(ii) satisfies Mancecnlerion: 
I F  tie Then use sire nitenon m 
Endll; 
Infernal connections; 
Step3: Lockcell ci: 
Sfepl: IF freerells # 4 
updalegains ofceua OftbOW flecled critical n e :  
Tkni = i + 1: 
select next bas= cell ea: 
Gota slep I 
End. 
Figure 1. Fiduccia-Mattheyeses hipartitioning algorithm [5 ] .  
considered as the primary candidate to he moved from its cur- 
rent block (From black) to its complementary block (To block). 
The candidate node must satisfy the balance criterion, used to 
control the size of subcircuits. If the candidate node does not 
meet the balance criterion. the node with the next hiehest rain 
is selected from the free nodes subset and moved. b e  m&ed 
node IS locked and elmnated from the bucket array. The move 
is completed by modifying the gains of all nodes connected to 
the critical new. At the end of a pass, all cells are freed and the 
process is repeated until we reach a position where no  funher 
gain can he achieved. The best panition encountered during the 
pass is taken as the output of the pass. The number of cells to 
move is given by the value of k which yields maximum positive 
gain Gk, where Gk = Er=, gi. Only the cells given by the best 
sequence, that is C I , C Z ,  ..., c*. an: permanently moved to their 
complementary blocks. Then all cells are freed and the proce- 
dure is repeated from the beginning. A general description of the 
heuristic is given in Fig. 1. The best candidate node is defined 
according to the highest cut-gain associated with moving a node 
from one subcircuit to another. Th8:y are measured using the ner- 
cur model [SI. A net is called a cut ner if it belongs to the current 
cut set' otherwise the net is refened to as a mcur net. A net is 
called Eritical if it'is a cur net that, as a result of moving a single 
node, can become a nocur net, or vice versa. 
The basic concept of min-cut gain calculation provided with the 
net-cut model can be explained as follows. Let node io he con- 
nected to n critical cut nets and lo rn critical nmut nets. The 
gain associated with the reassignment of a node io is defined as 
the difference: 
(1) Gi. n n - m 
0-7803-8163-7/03/$17.00 0 2003 IE!3E 
507 
ICECS-Zo03 
In this work, an extension to the FM algorithm which con- 
siders optimizing power as the main objective of Partitioning is 
presented. 
2. PROBLEM FORMULATION AND COST FUNCTIONS 
This work addresses the problem of VLSI netlist partitioning 
with the objectives of optimizing power consumption, timing 
performance (delay), and cut-set while considering the Balance 
constraint (same as area constraint, as unit area is assumed for 
every gate). Formally, the problem can be stated as follows: 
Given a set of modules V = { U I ,  v2, ..., U"}, the purpose 
of partitioning is to assign the modules to a specified number 
of clusters k (two in our case) satisfying prescribed properties. 
In general, a circuit can have multi-pin connections (nets) apart 
from two-pin. Our task is to divide V into 2 subsets (blocks) VO 
and VI in such a way that the objectives are optimized, subject to 
some consuaints. 
Cutsize The cutsiz cost function can be written as follows : 
Minimize f = w(.) (2) 
where $ c E denotes the set of off-chip wires. The weight w(e) 
on the edge e represents the cost of wiring the corresponding 
connection as an external wire. 
Delay In the general delay model where gate delay d(u) and con- 
stant inter-chip wire delay are considered, d, >> d(u) where d ,  
is actually due to the off-chip capacitance denoted as C,ff. Let 
the delay of node vi E V be d(ui) and the delay of net ex E E 
which is cut bed,. Given a partition : (V.; VB), the path de- 
lay d(p ; j )  between nodes vi and uj is the sum of the node delays 
d(wi) E V(pij) and the delay of nets which are cut, that is : 
st* 
Minimize d(pij) = d(v.) + d, x ncut(p,,) (3) 
"iE"(P<,) 
Power The average dynamic power consumed by a CMOS logic 
gate in a synchronous circuit is given by: 
where c:Oad is the load capacitance, Vdd is the supply voltage, 
T,,,I, is the global clock period, and Ni is the number of gate 
output transitions per clock cycle. Ni is calculated using the 
symbolic simulation technique of [61 under a Lem delay model. 
in Eq. 4 consists of two components: C y "  which ac- 
counts for the load capacitances driven hy a gate before circuit 
partitioning, and the exua load C:'"a which accounts for the 
additional load capacitance due to the external connections of the 
net after circuit partitioning. Then, the total power dissipation of 
any circuit C is: 
C!ood. 
where 0. is the number of cells in partition i and 6 is the total 
number of cells in the circuit, and the balance factor a (0.5 < 
a < 1.0). 
2.1. Overall Fuzzy Cost Function 
In order to solve the multiobjective partitioning problem, linguis- 
tic variables are defined as: cut-set, power dissipation, delay and 
balance. The following fuzzy rule is used to combine the con- 
flicting objectives: 
IF a solution has 
Small cut-scr AND 
Low power conrumption AND 
Shon delay AND 
Good Balance 
THEN it is a GOOD solution. 
01 
Figure 2. Membership functions 
The above rule is translated to and-like OWA fuzzy operator 
[71 and the membership p(x) of a solution x in fuzzy set good 
solution is given as: 
&&) = 0- x m in(^^(^),^^(=),^^(^),^^(=)) + 
( I -@? Y. t A ' ; ( = )  (7) 
j = n . d . r , l  
where p'(x) is the membership of solution x in fuzzy set of ac- 
ceptable solutions, pidcb(x) is the membership value in the fuzzy 
sets of " within acceptable power", "within acceptable delay", 
<'within acceptable cut-set'' and "within acceptable balance" re- 
spectively. p' is the constant in the range [O,  11, the superscript 
c represents the cost. In this paper, p"(x)  is used as the aggre- 
gating function. The solution that results in maximum value of 
p'(x) is reported as the best solution found by the search heuris- 
tic. 
The membership functions for fuzzy sets Low power consump- 
tion, Shon deluy, Small cut-ref, are shown in Fig. 2(a) We can 
vary the preference of an objective j in the overall membership 
function by changing the value of 9, which represents the relative 
acceptable limits for each objective whrere gj 2 1.0. Fig. 2@) 
represents the membership functions for fuzzy set good Balance. 
0, is the estimate of lower bound on the cost of an individual i, 
and C, is the actual cost of i. 0,'s are independent of iteration, 
therefore, these are estimated only in the beginning. Whereas, 
C, has to be calculated in every iteration for every element, 
3. POWERFM HEURISTIC 
The PowerFM is a modification of the FM algorithm which seeks 
minimization of the power consumption due to the cut. All con- 
cepts of the FM are maintained, the major difference is that we 
are calculating the gain due to the sum of the switching probabil- 
ities of the cut nets. Also some other necessary modifications are 
done in some parts of the Algorithm that we will discuss in what 
follows 
3.1. Power Gain Calculation 
The power gain for a cell i is calculated using Eqn. 8. Xi is the 
set of critical cut nets. U, is the set of critical uncut net. 
508 
ALGORITHM Compute Cell gains; 
wk each free cell i Do 
$i) - 0 ;  
+From block ofeel l i  
T c T o  block of cell e 
FOR each nef n on cell i DO 
If F(n) = 1 
Then g ( i )  = g ( i )  + (C,,, Y Sw prob of driving net)  
(Cell i is  &e a d v  cell io the Fmmm block c o n a c v d  Lo oet 0.) 
IfT(n) = 0 
Then g(t) = g(%) - (Celt x Sw p m b  of drrvzng net) 
(All of lhe cells connected ID nef n ae m h e  Fmm block ) 
EndFor 
Figure 3. Procedure to compute gains of free cells 
EndFar 
End 
In each pass, the gain of every free cell is ipdated according 
to the Compute Gain Algorithm shown in Fig. 3. Let F ( n )  be 
the number of cells connected to net n in the From block (current 
block) of the moved cell i. Let T(n) be the number of cells con- 
nected to net n in the To block (destination block) of the moved 
cell i. When computing the gain we consider only the critical 
nefs: A net is critical if it has a cell which if moved will change 
the initial solution is chosen from random or provided by Pow- 
erFM for both GA and TS algorithms. Table I shows also the 
results obtained from PowerFM when used on its own. Paug 
refers to the average power of the results obtained from 100 runs 
of the PowerFM. The notation in Tahlel is as follows: D(ps)  
stands for Delay and it is measured in pico-seconds, Cut is the 
number of nets cut, P ( s p )  is tbe power dissipation measured in 
terms of switching probability, T(,s)  is the total time taken by the 
whole run for PowerFM. 
When starting from random solution it was observed that 
TS outperforms GA in terms of final solution costs and execu- 
tion time. These hvo algorithms are complex, and relatively lake 
more execution time than PowerFM. The idea of using the Pow- 
erFM as a starting solution for iterative algorithms is relevant 
because PowerFM proved to bear. extremely fast algorithm com- 
pared to GA and TS (at least 1CQ times faster), with reasonable 
performance. This will save a lot of time for algorithms like GA 
and TS where the converging rati: is slow. Furthermore results, 
showed that GA and TS were able to improve solutions provided 
by PowerFM. Fig. 4 and Fig. 5 show the performance of GA and 
TS respectively when applied to the circuit s1488 when start- 
ing from an initial solution provided by PowerFM. In Fig. 4 and 
Fig. 5, (a) shows the number of nets cut, (b) shows the longest 
path delay of the clrcuit in pico-seconds, (c) shows the power 
dissipation, (d) shows the cell difference between the two parti- 
if F(n)  = 1 then moving cell i will increase the gain by C,jf x 
Sw r o b  of driving net, and if T(n) = 0 then moving the cell 
i will decrease the gain by C,ff x Sw prob of driving net. 
3.2. GAand TS 
Genetic Algorithm is an elegant search technique that emulates 
the process of natural evolution as a means of progressing to- 
wards the optimal solution. The algorithm starts with a set of 
initial solutions called populnrion that is generated randomly. 
In each iteration (Inown LIS generarim in CA rerminology), all 
the individual chmmosomes in the population are evaluated us- 
ing afrnessfrmcrion. Then, in the selecrion step, two of the 
above chromosomes at a time are selected from the population. 
The individuals having higher fitness values are more likely to 
be selected. After the selection step, different operators namely 
CTOSSOYIT, muorion act on the selected individuals for evolving 
new individuals called offsprings. 
Tabu Seareh starts from an initial feasible solution and carries 
out its search by making a sequence of random moves or penw- 
batians. A Tabu list is maintained which stores the attributes of 
a number of previous moves. This list prevents taking the search 
process back to recently visited states. In each iteration, a subset 
of neighbor solutions is generated by making a certain number of 
moves and the best move (.the move that resulted in the best so- 
lution) is accepted, provided it is not in the Tabu list. Othemise, 
if the said move is in the Tabu list, it is accepted only if it leads 
to a solution better than the best solution found so far (aspiration 
criterion). Thus, the aspiration criterion can override the Tabu 
list restrictions. The solution encoding and initialization steps 
are similar to those described above for GA. These two multiob- 
jective optimization iterative algorithms (CA and TS) for VLSI 
Par!itioning were proposed in [XI, [9], [IO]. 
4. EXPERIMENTAL RESULTS 
of the overall quality of solution 
Figure 4. Genetic Algorithm starting from PowerFM for circuit 
~ 1 4 8 8 .  
It can be noted that for most of the circuits when using a 
starting solution provided by PowerFM the results are better than 
when starting with random solution in terms of quality of solu- 
tion. An important point to notice also is that although when 
starting from random TS performed better than GA; when star- 
ing from PowerFM GA proves lob:  more efficient than TS. This 
is due to the fact that GA starting from a good solution has the 
ability to inherent the good characteristics and improve on it and 
proved to be able to benefit mor? than TS when starting from a 
good solution provided from PowerFM. This is noted when the 
results of TS (starting from PowerFM) and GA (starting from 
PowerFM) in Table 1 are compared; it can be seen that GA is 
better for large circuits (83330 __. ~15850) in terms of power and 
A series of expenments were performed on ISCAS-85/89 bench- 
mark circuits, the results are analyzed and reported in this sec- 
tion. Table 1 shows a comparison of results of TS and GA when 
cutset. The results proved that it i r  beneficial to use-PowerFM 
as a starting solution for multiobjective GA and TS. Moreover 
loolang at the results of PowerFM alone, it comparably provided 
509 
Table 1. Stat from PowerFM versus Random Start for GA and TS 
impressive results in terms of Power and cutset considering that 
its main aim is to optimize power. 
Figure 5. Tabu Search algorithm starting from PowerFM for 
circuit ~1488 .  
5. CONCLUSIONS 
In this paper, we proposed a new modification to the FM algo- 
rithm PowerFM which targets power optimization. The possibil- 
ity to use the algorithm as a pmvider for initial solution for other 
iterative multiobjective algorithms in particular GA and TS was 
investigated. GA performed better than TS when starting from 
a solution rovided by PowerFM. PowerFM results where impar- 
tant due to its speed and gwd  quality of the final solution. A 
series of experiments were performed, analyzed and reponed to 
evaluate the efficiency of the algorithm. Results suggest that the 
algorithm proved to be efficient for optimizing power, and would 
provide a good starting solution for the multiobjective optimiza- 
tion using Genetic and Tabu search partitioning algorithms. 
Acknowledgment 
The authors thank King Fahd University of Petroleum & Miner- 
als, Dhahran, Saudi Arabia, for support, under project # : 
COEIITERATE/221 
6. REFERENCES 
111 M. Pedram. CAD for Low Power: Status and Promising 
Directions. IEEE International Symposium on VISI  Tech- 
nulog3 System and Applications, pages 331-336, 1995. 
121 U. Narayanan, G.1. Stamoulis, and R. Roy. Characterizing 
Individual Gate Power Sensitivity in Low Power Design. 
12th International Conference on VLX Design, pages 625- 
628, January 1999. 
[3] I S .  Choi and S.Y. Hwang. Circuit Partitioning algorithm 
for Low-Power Design Under Area Constraints Using Sim- 
ulated Annealing. IEE Pmc. Cicrcuits Devices System, 
146(1):8-15, February 1999. 
141 H. Vaishnav and M. Pedram. Delay optimal partitioning 
targeting low power VLSl circuits. IEEE Tmns. on Com- 
puter Aided Design, 18(6):298-301, june 1999. 
A Linear-Time 
Heuristic for Improving Network Partitions. pages 175- 
181,1982. 
[61 A. Ghosh, S. Devadas. K. Keutzer, and I. White. Estima- 
tion of Average Switching Activity in Combinational and 
Sequential Circuits. Design Automotion Conference, pages 
253-259.1992. 
[71 R. R. Yager. On Ordered Weighted Averaging Aggregation 
Operators in Multicriteria Decisionmaking. IEEE Trans- 
action on Systems, MAN, and Cybernetics, 18(1), January 
1988. 
Evolutionary Techniques for Multi- 
objective VLSl Netlist Partitioning. Master's thesis, King 
Fahd University of Petroleum and Minerals, Dhahran, 
Kingdom of Saudi Arabia, May 2002. 
191 Sadiq M. Sait  Aiman El-Maleh, and Raslan AI-Abaji. Sim- 
ulated evolution algorithm for multiobjective VLSI netlist 
bi-partitioning. Pmc. of the IEEEIntemationol Symposium 
on Circuits andsystems (ISCAS), V:457460, May 2003. 
[lo] Sadiq M. Sail, Aiman El-Maleh, and Raslan AI-Abaji. Gen- 
eral iterative heuristics for VLSl multiobjective partition- 
ing. Pmc. of the IEEE International Symposium on Circuits 
and Systems (ISCASJ, V497-500, May 2003. 
[5 ]  C. M. Fiduccia and R. M. Manheyses. 
[SI R. H. AI-Abaji. 
510 
