Design partitioning and layer assignment for 3D integrated circuits using tabu search and simulated annealing  by Sait, Sadiq M. et al.
Ab
T
a
N
t
p
A
q
A
o
K
p
1
b
t
d
r
c
i
t
h
a
s
d
f
M
1
iAvailable  online  at  www.sciencedirect.com
Journal  of  Applied  Research
and  Technology
www.jart.ccadet.unam.mxJournal of Applied Research and Technology 14 (2016) 67–76
Original
Design partitioning and layer assignment for 3D integrated circuits using
tabu search and simulated annealing
Sadiq M. Sait ∗, Feras Chikh Oughali, Mohammed Al-Asli
Center for Communications & IT Research, Computer Engineering Department, King Fahd University of Petroleum & Minerals, Dhahran 31261, Saudi Arabia
Received 30 August 2015; accepted 19 November 2015
Available online 22 January 2016
bstract
3D integrated circuits (3D-ICs) is an emerging technology with lots of potential. 3D-ICs enjoy small footprint area and vertical interconnections
etween different dies which allow shorter wirelength among gates. Hence, they exhibit both lesser interconnect delays and power consumption.
he design flow of 3D integrated circuits consists of many steps, the first of which is the 3D Partitioning and Layer Assignment. This step has
 significant importance as its outcome will influence the performance of subsequent steps. Like other partitioning problems this one is also an
P-hard. The approach taken to address this critical task is the application of iterative heuristics (Sait & Youssef, 1999), as they have been proven
o be of great value when it comes to handling such problems. Many aspects have been taken into consideration when attempting to solve this
roblem. These factors include layer assignment, location of I/O terminals, TSV minimization, and area balancing. Tabu Search and Simulated
nnealing are employed and engineered to tackle this task. Results on well-known benchmarks show that both these techniques produce high
uality solutions. The average percentage of the area deviation between layers is around 2.4% and the total number of required TSVs is reduced.
ll Rights Reserved © 2015 Universidad Nacional Autónoma de México, Centro de Ciencias Aplicadas y Desarrollo Tecnológico. This is an
pen access item distributed under the Creative Commons CC License BY-NC-ND 4.0.
eywords: Through-silicon via (TSV); 3D integrated circuits (3D ICs); Iterative heuristics; Tabu search; Simulated annealing; Combinatorial optimization; Multi-way
artitioning; NP-hard problems
o
f
2
c
(
r
a
h
M
v
v
i
p
(.  Introduction
The rapid advancement in technology allowed devices to
ecome faster and to be fabricated in smaller size. These fac-
ors have helped to enable higher integration densities and
ecreased circuit delay. Nonetheless, this high density on chips
equires longer interconnections which leads to greater inter-
onnect delays. Consequently, other related issues like signal
ntegrity and power consumption became the bottleneck of
odays technology. Fortunately, 3D integrated circuits (3D-ICs)
ave emerged as an approach to overcome interconnects delay
nd its consequences in 2D-ICs (Baliga, 2004). 3D-ICs enjoy a
maller footprint area, and the vertical interconnections between
ifferent dies allow shorter wirelength among gates. More-∗ Tel.: +966 503826267.
E-mail addresses: sadiq@kfupm.edu.sa (S.M. Sait),
oughali@kfupm.edu.sa (F.C. Oughali), alasli@kfupm.edu.sa (M. Al-Asli).
Peer Review under the responsibility of Universidad Nacional Autónoma de
éxico.
m
r
r
s
http://dx.doi.org/10.1016/j.jart.2015.11.001
665-6423/All Rights Reserved © 2015 Universidad Nacional Autónoma de México,
tem distributed under the Creative Commons CC License BY-NC-ND 4.0.ver, 3D-ICs enhance system integration by either increasing
unctionality or combining different technologies (Davis et al.,
005).
3D-ICs are built up of an IC stack with short vertical inter-
onnections between adjoining dies using through-silicon vias
TSVs). Despite their attractiveness of mitigated congestion and
educed wirelength, TSVs occupy significant silicon area. In
ddition to increase in die area, excessive use of TSVs will
ave a negative impact on wirelength in the 3D design (Kim,
ukhopadhyay, & Lim, 2009). There are two types of TSVs:
ia-first TSVs which reside within the device layer only, and
ia-last TSVs which occupy both device and metal layers, as
llustrated in Fig. 1. Both these types have much larger area foot-
rint than other components (e.g., wires, local vias and gates)
Gerousis, 2010), (Beyne et al., 2008). Therefore, usage of TSVs
ust be kept minimal in order to exploit their advantages and
educe their negative impact.
Depending on the stacking style, 3D integration is catego-
ized into three types. A basic face-to-face (F2F) type which
tacks only two chips face-to-face. For more than two layers,
 Centro de Ciencias Aplicadas y Desarrollo Tecnológico. This is an open access
68 S.M. Sait et al. / Journal of Applied Research and Technology 14 (2016) 67–76
Via-first
TSV
Via-last
TSV
Metal
layer
Epoxy
Device
layer
f
i
p
i
t
a
T
u
m
p
t
4
a
r
2
t
3
t
d
2
g
p
F
p
e
m
fi
s
t
t
t
f
d
Start
3D Partitioning & layer assignment
3D Floorplanning
3D Placement
3D Routing
End
T
i
a
p
T
d
p
i
H
i
m
(
&
F
A
a
l
c
g
p
a
T
p
r
t
o
o
i
T
o
a
c
wFig. 1. 3D structure: via-first and via-last TSVs.
ace-to-back (F2B) and back-to-back (B2B) process stacking is
nevitable. F2B is the most commonly used multi-chip stacking
rocess hence it is adopted in this paper. Although B2B stacking
s also used to stack multiple chips, it requires two TSVs to link
wo adjacent dies and leads to a larger delay.
In this work, we present a multi-objective circuit partitioning
nd layer assignment techniques that takes into consideration
SV count minimization and area balancing across all dies. We
se simulated annealing (SA) and tabu search (TS) iterative opti-
ization heuristics to achieve these objectives. The rest of this
aper is organized as follows. Related work is discussed in Sec-
ion 2. Section 3 introduces the problem description. Sections
 and 5 present tabu search and simulated annealing heuristics
nd their implementation. In Section 6 experimental results are
eported and discussed. Finally, Section 7 concludes this work.
.  Related  work
Much work has been attempted and discussed in the litera-
ure regarding all aspects of the design and implementation of
D integrated circuits. It has been suggested that it is critical
o perform circuit partitioning as an independent stage in the
esign flow of 3D integration (Chiang & Sinha, 2009; Li et al.,
006; Pavlidis & Friedman, 2009). The flow first partitions a
iven design into different layers and then solves the remaining
roblems of floorplanning, placement and routing as shown in
ig. 2. Accordingly, this approach effectively reduces the com-
lexity of the problem and preserves the quality of results (Li
t al., 2006; Pavlidis & Friedman, 2009).
Li, Mak, and Wang (2012) proposed computing a tier assign-
ent (i.e., circuit partitioning) based on 1-D placement. They
rst conduct a traditional 1-D placement of the modules using a
pectral method (Hagen & Kahng, 1992). The intuition behind
his approach is that tightly connected modules are placed close
o each other in the 1-D placement and thus will be assigned to
he same layer. The number of TSVs inserted into the ith layer
or connecting signals between the ith and the (i  + 1)th layer is
etermined by the number of nets crossing the ith cut position.
e
i
sFig. 2. Design flow of 3D integrated circuits.
he authors also imposed a constraint on the total area utilized
n each layer. Then, they used a dynamic-programming-based
lgorithm to determine the best L  −  1 cut positions on the linear
lacement to obtain an assignment with the minimum number of
SVs while satisfying the area constraint. However, the authors
id not take into consideration connections to I/Os in the 1-D
lacement which will result in additional TSVs in all the layers.
Another approach to solve the partitioning problem employs
nteger linear programming (ILP) (Lee, Jiang, & Mei, 2012).
owever, this can only be used for small-size circuits since
ts runtime grows very fast with problem size. An alternative
ethod is introduced as part of 3D place and route for FPGAs
Ababei, Mogal, & Bazargan, 2006; Siozios, Sotiriadis, Pavlidis,
 Soudris, 2007). The authors used a two-step approach.
irst, they applied the hMetis partitioning algorithm (Karypis,
ggarwal, Kumar, & Shekhar, 1999) to divide a design into
 set of partitions. Then, they associated each partition with a
ayer. A typical partitioning algorithm gives similar weights to
uts between any two partitions, while those weights can have a
reat impact on the 3D partitioning based on the distance among
artitions.
Huang, Liu, and Huang (2011) proposed an iterative layer-
ware partitioning algorithm which can minimize the number of
SVs and smooth the distribution of TSVs in 3D structures. The
roposed algorithm is iterative and gradually produces the final
esult layer by layer. The algorithm applies layer-aware parti-
ioning at each iteration. While the focus of the algorithm was
n TSV minimization, in this work there was no clear evidence
n how the algorithm handled area balancing among layers.
In another work, Kim, Athikulwongse, and Lim (2013) stud-
ed the impact of TSVs on various aspects of 3D layouts.
hese aspects include: maximum allowable TSV count, trade-
ff between wirelength and TSV count, and wirelength and die
rea versus number of dies. The authors stated that although they
ould not draw a clear conclusion on the relationship between
irelength and number of TSVs, using too many TSVs willventually increase the die area, which will result in wirelength
ncrease.
No non-deterministic algorithms have been attempted to
olve the problem of TSV minimization and layer assignment
esear
t
A
m
c
p
3
3
s
T
•
•
•
•
a
n
T
o
3
w
B
c
n
i
P
t
m
P
p
F
T
a
n
I
u
i
T
c
t
p
s
T
r
t
a
o
t
l
4
c
i
b
s
s
s
s
a
t
t
b
s
S
g
s
i
t
s
w
rS.M. Sait et al. / Journal of Applied R
hus far. Therefore, an attempt to use Tabu Search and Simulated
nnealing (Sait & Youssef, 1994) is presented in this work. As
ore design constraints than those reported in the literature are
onsidered, this work can be the base for future work on the
artitioning step or the other subsequent steps.
.  Problem  description
.1.  Motivation
There are many critical aspects that must be considered when
olving the partitioning problem in the context of 3D design flow.
hese aspects comprise the following points:
 Layer  assignment:  It is a crucial task to be aware of assign-
ing partitions into layers. Usually, different layer assignments
result in variance in TSV requirements (Huang et al., 2011).
 Location  of  I/O  terminals:  all external I/O terminals must
be assigned to the top-most layer (or the bottom-most layer).
Failing to satisfy this requirement will result in additional
TSV requirements in later stages of the design.
 TSV  minimization:  It is desired to have a design with min-
imal number of TSVs as excessive use of them will increase
the area of the design and hence the total wirelength which
will eventually degrade the performance of the circuit.
 Area  balancing:  TSVs have a significant area footprint when
compared to other components. Thus, an ill-distribution of
TSVs across layers will eventually result in a design with
imbalanced area. To Achieve this objective, many factors have
to be considered, namely, area of cells or blocks, area of TSVs,
and distribution of both cells and TSVs across all dies.
Subject to the above requirements, we present in this paper
 multi-objective circuit partitioning technique, using iterative
on-deterministic heuristics, that takes into consideration: (1)
SV count minimization, and (2) balanced area across all layers
r dies.
.2.  Problem  formulation
Any design can be represented as a hypergraph H  = (V,E),
here V  is a set of vertices that include the set of blocks or cells
 and the set of I/O terminals I. E  is the set of hyperedges that
onnect vertices from the set V, which in terms corresponds to the
etlist of a given design. Therefore, the problem can be mapped
nto hypergraph partitioning problem to N  + 1 total partitions
 = {P0, P1, .  . ., PN}, where the ith layer (partition) belongs to
he range 0   i   N. As all I/O terminals must reside in the top-
ost layer, all vertices in the set I  are assigned to the first partition
0. All other vertices in the set V  are assigned to N  disjoint
artitions {P1, P2, .  . ., PN} such that P1 ∩  P2 ∩  . . .  ∩  PN = ϕ.
or each vertex v  ∈  V  , area(v) denotes the area occupied by v.
SVarea denotes the area cost of a TSV.For each net j  in the design (i.e., hyperedge e  in H) that spans
cross two layers, we assume that only one TSV is used to con-
ect the subnet in the first layer to the one in the other layer.
n general, if net j spans across different layers where u is the
m
c
r
ach and Technology 14 (2016) 67–76 69
pper layer and l is the lower layer, the number of TSVs which
s required to connect all subnets of j  equals to TSV(j) = u −  l.
hus, the total number of required TSVs in the design can be
alculated as TSVtotal =
∑
∀j ∈ netlist
TSV  (j).
Given a 3D design and a target N  + 1 of total partitions, our
echnique maps the problem into hypergraph H  partitioning
roblem. It partitions the design into N  + 1 disjoint partitions
uch that all I/O terminals reside in the first partition (layer) P0.
he technique has an objective of minimizing the total number of
equired TSVs (TSVtotal). It also handles area balancing among
he different layers by minimizing the standard deviation of area
cross all layers, while taking into consideration area of blocks
r cells, number of TSVs in each layer, and the area occupied by
hese TSVs. Thus, it will result in area-balanced TSV-minimized
ayer-assigned partitions that respect the I/O constraints.
The problem can therefore be formulated as:
Minimize TSVtotal =
∑
∀j ∈ netlist
TSV  (j)
Minimize σarea
Subject to:
∀v  ∈  I  ⇒  v  ∈  P0
P1 ∩  P2 ∩  . . .  ∩  PN = ϕ
∃Pi,  i ∈ {1,  N} |∀v  ∈  (V  −  I) ⇒  v ∈  Pi
.  Tabu  Search  and  its  implementation
Tabu Search is a general iterative metaheuristic for solving
ombinatorial optimization problems. TS progresses by making
terative perturbations while preventing cycling to certain num-
er of recently visited points in the search space. TS procedure
tarts from an initial feasible solution S (current solution) in the
earch space . A neighborhood ℵ(S) is defined for each S. A
ample of neighbor solutions V* ⊂  ℵ  (S) is generated called trial
olutions (n  = ∣∣V∗∣∣
 |ℵ(S)|), and comprises what is known
s the candidate  list. From this generated set of trial solutions,
he best solution, say S* ∈  V∗ is chosen for consideration as
he next solution. A solution S* ∈  ℵ (S) can be reached from S
y an operation called a move to S*. The move to S* is con-
idered even if S* is worse than S, that is, Cost(S*) > Cost(S).
electing the best move in V* is based on the supposition that
ood moves are more likely to reach the optimal or near-optimal
olutions. The best candidate solution S* ∈  V∗ may  or may  not
mprove the current solution, but is still considered. It is this fea-
ure that enables escaping  from local optima. However, with this
trategy, it is possible to reach the local optimum, since moves
ith Cost(S*) > Cost(S) are accepted, and then in a later iteration
eturn back to local optimum.
In order to block returning to previously visited solutions aemory or list T, known as tabu  list, is maintained. This list
ontains information that to some extent forbids the search from
eturning to a previously visited solution. Whenever a move is
ccepted, its attributes are introduced into the tabu list T. Move
70 S.M. Sait et al. / Journal of Applied Research and Technology 14 (2016) 67–76
Accepted moves
Tabu list
size
r
m
c
F
w
o
l
s
h
i
a
o
t
i
i
c
(
t
s
a
v
Ω :  Set of feasible solutions (i.e., partitions).
S :  Current solution.
:  Best admissible solution.S*
:  Neighborhood of S ∈ Ω.ℵ(S )
:  Sample of neighborhood solutions.V*
:  Tabu list.T
:  Aspiration Level.AL
Begin
1.          Start with an initial feasible solution (partitions) S ∈ Ω.
2.          Initialize tabu list and aspiration level.
3.          For   fixed number of iterations Do
4.                    Generate neighbor solutions V* ⊂ ℵ(S ).
(Each solution results from swap/move of cell(s)).
5.                    Find best S* ∈ V*.
6.                    If move S to S*  is not in T Then
7.                               Accept move & update best solution.
8.                               Update tabu list (Store swap/move attributes).
9.                               Update aspiration level.
(AL = Cost of best solution seen so far).
10.                             Increment iteration number.
11.                  Else
12.                             If Cost(S* ) < AL Then
13.                                     Accept move - update best solution.
14.                                     Update tabu list & aspiration level.
15.                                     Increment iteration number.
16.                             EndIf
17.                  EndIf
18.        EndFor
End.
:  Objective function (Reduce # of TSVs & Std. deviation of area).Cost
(
a
c
aFig. 3. Tabu list visualized as window over accepted moves.
eversal are prevented for next k  = |T|  iterations because they
ight lead back to a previously visited solution. The tabu list
an be visualized as a window on accepted moves as shown in
ig. 3; moves which tend to undo previous moves within this
indow are forbidden.
In some cases, it is necessary to overrule the tabu status since
nly move attributes (not complete solutions) are stored in tabu
ists. These tabu moves may also prevent the consideration of
ome solutions which were not visited earlier. This is done with
elp of the notion of aspiration  criterion. Aspiration criterion
s a device used to override the tabu status of moves whenever
ppropriate. Aspiration criterion must make sure that the reverse
f a recently made move leads the search to an unvisited solu-
ion, generally a better one (Sait & Youssef, 1999). A flow chart
llustrating the basic short-term memory tabu search algorithm
s given in Fig. 4. Intermediate-term and long-term memory pro-
esses are used to intensify and diversify the search respectively
Sait & Youssef, 1999).
One of the Tabu Search algorithm parameters is the size of
he tabu list. A small tabu list size is preferred for exploring the
olution near a local optimum, and a larger tabu list size is prefer-
ble for breaking free of the vicinity of local minimum. List sizes
arying between 5 and 12 have been used in many applications
Current
solution
Best
solution
New
solution
“Best”
New
solutionMove n
TABU
?
No
Yes
Yes Current
solution
Current
solution
No
Regenerate
moves
Aspiration
criterion
passed?
Move 1
Fig. 4. Flow-chart of tabu search algorithm.
l
t
o
4
l
t
T
a
a
a
a
4
t
c
r
i
l
d
t
m
Z
eFig. 5. Algorithmic description of short-term tabu search (TS).
Sait & Youssef, 1999). Any aspect (feature or component of
 solution) that changes as a result of a move from S  to Strial
an be an attribute of that move. A single move can have several
ttributes. The duration for which a move containing the particu-
ar tabu attribute is forbidden (the size of tabu list) is called Tabu
enure. An algorithmic description of a simple implementation
f the tabu search is given in Fig. 5.
.1.  Solution  representation  and  initialization
Since the problem is mapped into a graph partitioning and
ayer assignment, the solution is represented as a number of par-
itions each of which corresponds to a layer in the 3D structure.
he first partition, which represents the top-most layer, includes
ll the I/O terminals. Other blocks or cells of the circuit are
ssigned to one of the other partitions. In the initialization step,
ll I/O terminals are assigned to the first partition, while blocks
re randomly assigned to the remaining partitions.
.2.  Cost  evaluation
The cost function is a measure of how good a particular solu-
ion is. For 3D partitioning and assignment, there are two main
riteria: area balancing and TSV count. The cost function should
eflect all objectives. Traditionally, multi-objective problems are
mplemented by combining the objectives into a scalar function
ike the weighted sum of the multiple attributes. Usually, it is
ifficult to determine suitable weights for these objectives as
heir values belong to different ranges.
One practice is to use fuzzy logic to handle these types of
ulti-objective problems (Sait & Youssef, 1999; Zadeh, 1965;
adeh, 1973; Zadeh, 1975). Unlike ordinary set theory where an
lement is either in a set or not, in fuzzy set theory, an element
S.M. Sait et al. / Journal of Applied Research and Technology 14 (2016) 67–76 71
Degree of
membership
for tsv-count
Degree of
membership for
area-deviaton
area-
deviation/
Dmax
1
1 1
0
1
0
T \T D \D
tsv-count/
Tmax
ctions
p
T
t
l
a
o
m
o
r
b
T
l
m
i
t
v
g
i
s
a
f
d
v
i
a
e
b
n
u
l
a
a
t
l
o
e
2
v
m
m
d
T
o
F
u
μ
w
s
t
f
4
p
r
c
t
g
(
(
umin max
Fig. 6. Normalized membership fun
artially belongs to a set with a certain degree of membership.
his is realized by using a membership function μA(.) that maps
he space of points (solutions) to the interval [0,1]. Operations
ike union (∪), intersection (∩), and complementation (¬) which
re used in ordinary set theory have been defined for operations
n fuzzy sets. One such logic is defined by Zadeh and called the
in-max logic (Sait & Youssef, 1994). In this logic, the above
perations are defined as follows:
μA∩B (x) = min (μA (x) , μB (x)) ,
μA∪B (x) = max (μA (x) , μB (x)) ,
μ¬A (x) = 1.0 − (μA (x)) .
The evaluation of a solution comprises the evaluation of fuzzy
ules which return a value that corresponds to the degree of mem-
ership of that solution in the fuzzy subset of good solutions.
he fuzzy logic rules are usually expressed on problem-specific
inguistic variables (Zadeh, 1965; Zadeh, 1973; Zadeh, 1975).
In our case, the objective is to find a solution which is opti-
ized with respect to TSV count and the standard deviation
n area across all partitions. To obtain a fuzzy logic defini-
ion of the above multi-criteria objective function, two linguistic
ariables, tsv-count  and area-deviation, are introduced and a lin-
uistic value “low” and “small” for each of them, respectively,
s defined. These linguistic values characterize the degree of
atisfaction of the designer with the values of objectives.
Membership functions for low tsv-count μtsv and small
rea-deviation μarea are built easily. These are non-increasing
unctions, since the lower the tsv-count and smaller the area-
eviation, the higher is the degree of satisfaction. The base
ariables tsv-count  and area-deviation  are normalized to the
nterval [0,1] as shown in Fig. 6. The values Tmin and Dmin
re lower bounds on the TSV count and area deviation. Tmin
quals to the number of nets that have an I/O terminal. This is
ased on the assumption that all blocks which belong to these
ets are assigned to the layer next to the I/O layer; one TSV is
sed to connect the I/O terminal to its associated net in the next
ayer. Dmin equals to 0, assuming that the sum of areas of blocks
nd areas of TSVs in all layers is the same. The values of Tmax
nd Dmax are corresponding to the upper bounds. Tmax equals to
he number of nets multiplied by number of layers (# nets ×  (#
ayers – 1)). In theory, it has been proven that the upper bound
f the standard deviation of a set of data with three or more
b
l
b
smin max
 for TSV-count and area-deviation.
lements can not exceed 58% of the range of that set (Croucher,
002; Croucher, 2004; Lee, Lee, & Lee, 2006). Therefore, the
alue of Dmax is set to 58% of the range between maximum and
inimum area.
The most desirable solution is the one with the highest
embership in the fuzzy subsets low  tsv-count  and small  area-
eviation. However, such a solution most likely does not exist.
herefore, one has to trade-off these individual criteria with each
ther. A weighted averaging operator (Fodor & Roubens, 1994;
odor & Roubens, 1992; Fodor, Marichal, & Roubens, 1995) is
sed to incorporate this trade-off as follows:
S(x) = 12(δ  ×  μtsv +  (1 −  δ) ×  μarea) (1)
here μS is the membership function of the fuzzy subset of good
olutions, and β  is a parameter between 0 and 1. When β  = 1,
he focus becomes on the tsv-count  objective, and for β = 0 the
ocus is switched to the other objective.
.3.  Neighborhood  solutions  generation
TS makes several neighborhood moves and selects the move
roducing the best solution among all candidate moves for cur-
ent iteration. This best candidate solution may not improve the
urrent solution. In each iteration, a number of neighbor solu-
ions (i.e., equal in number to the size of the candidate list) are
enerated by making perturbations as follows:
1) 75% of the time two random blocks will be selected from
two random layers, and their respective layers interchanged.
2) 25% of the time one random block will be selected from
a random layer, and it will be moved to another randomly
selected layer.
Subsequently, each solution in the candidate list is evaluated
sing the fuzzy function in Eq. (1) based on the change in num-
er of TSVs and the standard deviation of area among different
ayers before and after the swap/move. If two or more neigh-
orhood solutions have equal swap cost, one of them will be
elected.
7 esearch and Technology 14 (2016) 67–76
4
a
f
a
w
s
p
i
f
m
t
i
t
a
i
t
l
h
w
5
m
s
l
w
t
i
s
t
m
“
t
a
a
t
d
s
s
i
s
c
u
(
d
s
M
i
a
i
i
i
Algorithm   Simulated_annealing(S0, T0, α,  β,  M,  Maxtime);
(*S0 is the  initial  solution *)
(*BestS is the  best solution *)
(*T0 is the  initial  temperature *)
(*α is the cooling  rate *)
(*β a constant *)
(*Maxtime  is the  total  allowed time for the annealing process *)
(*M represents the  time  until the  next  parameter update *)
Begin
T=T0;
CurS=S0;
BestS =CurS; /* BestS is the  best  solution  seen  so  far */
CurCost=Cost(CurS);
BestCost=Cost(BestS);
Tim e =0;
Repeat
Call  Metropolis(CurS,  CurCost,  BestS,  Best Cost, T,  M );
Tim e =Time+M;
T=αT;
M=βM
Until (Time≥Max Time);
Return( BestS)
End.(* of  Simulated_annealing*)
p
T
u
t
b
p
Y
c
s
S
the current solution CurS, then the new solution is accepted, and
we do so by setting CurS=NewS. If the cost of the new solution
is better than the best solution (BestS) seen thus far, then we
Algorithm   Metropolis(CurS, CurCost, BestS, BestCost, T, M);
Begin
Repeat
NewS=Neighbor (CurS);/* Return a neighbor from CurS */
NewCost=Cost(NewS);
ΔCost=(NewCost−CurCost);
If   (ΔCost<0) Then
CurS=NewS ;
If NewCost<BestCost   Then
BestS=NewS
EndIf
Else
If (RANDOM<e−ΔCost/T )   Then
CurS=NewS ;
EndIf
EndIf2 S.M. Sait et al. / Journal of Applied R
.4.  Tabu  list  and  aspiration  level
Variety of tabu attributes were tested, when blocks (cells)
re swapped/moved. One experiment considered both i and j,
orbidding any perturbations that include both of them. Another
ttribute was to forbid moves related to block i, i.e., any move
hich include i, this covers the reverse of swapping i  with j. The
econd attribute of a move is used in all results reported in this
aper. If two blocks are involved in interchange, any move that
ncludes any of these two blocks is forbidden. The same applies
or moving blocks from one layer to the other. A short-term
emory element is used throughout the implementation where
abu list sizes ranging from 5 to 12 were tested. The change
n tabu list size in this range has little impact on the quality of
he solutions; therefore the size of tabu list was set to 10. The
spiration criterion is implemented as follows: tabu restriction
s overridden if the current solution is the best seen so far, and
he current solution is accepted as new best solution and tabu
ist is updated. In the next section, simulated annealing iterative
euristic is discussed. Details about the parameters used in this
ork are reported in Section 4.
.  Simulated  annealing  and  its  implementation
Simulated annealing is a general heuristic and one of the
ost well developed iterative techniques. It is widely used for
olving optimization problems. One important feature of simu-
ated annealing is that, like tabu search, it also accepts solution
ith deteriorated cost. It is this feature that gives the heuristic
he hill climbing capability. Initially, the probability of accept-
ng inferior solutions is large; but as the search progresses, only
maller deteriorations are accepted, until finally only good solu-
ions are accepted. In order to simulate the annealing process,
uch flexibility is allowed in neighborhood generation at higher
temperatures”, i.e., many ‘uphill’ moves are permitted at higher
emperatures. The temperature parameter is lowered gradually
s the search progresses. As the temperature is lowered, fewer
nd fewer uphill moves are accepted. In fact, at absolute zero,
he simulated annealing algorithm turns greedy, allowing only
ownhill moves.
The iterative improvement scheme starts with some given
tate, and examines a local  neighborhood  of the state for better
olutions. A local neighborhood of a state S, denoted by ℵ(S),
s the set of all states which can be reached from S by making a
mall change to S.
The simulated annealing algorithm is shown in Fig. 7. The
ore of the algorithm is the Metropolis  procedure, which sim-
lates the annealing process at a given temperature T  (Fig. 8)
Metropolis, Rosenbluth & Teller, 1953). The Metropolis  proce-
ure receives as input the current temperature T, and the current
olution CurS which it improves through local search. Finally,
etropolis  must also be provided with the value M, which
s the amount of time for which annealing must be applied
t temperature T. The procedure Simulated annealing simply
nvokes Metropolis  at decreasing temperatures. Temperature is
nitialized to a value T0 at the beginning of the procedure, and
s reduced in a controlled manner (typically in a geometricFig. 7. Procedure for simulated annealing algorithm.
rogression); the parameter α  is used to achieve this cooling.
he amount of time spent in annealing at a temperature is grad-
ally increased as temperature is lowered. This is done using
he parameter β  > 1. The variable Time keeps track of the time
eing expended in each call to the Metropolis. The annealing
rocedure halts when Time exceeds the allowed time (Sait &
oussef, 1999).
The Metropolis  procedure is shown in Fig. 8. It uses the pro-
edure Neighbor to generate a local neighbor NewS of any given
olution S. The function Cost  returns the cost of a given solution
. If the cost of the new solution NewS is better than the cost ofM=M−1
Until (M=0)
End. (* of Metropolis*)
Fig. 8. The metropolis procedure.
esearch and Technology 14 (2016) 67–76 73
a
c
w
n
b
c
t
a
T
M
n
R
u
e
t
f
l
g
6
a
p
o
c
r
a
t
s
d
v
T
c
u
w
1
f
T
f
T
B
C
a
a
n
n
n
t
d
e
f
p
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0
0
500 1000
20 000 40 000 60 000
Iterations
M
em
be
rs
hi
p 
va
lu
e
M
em
be
rs
hi
p 
va
lu
e
80 000 100 000 120 000
1500 2000
Iterations
a
b
2500 3000 3500 4000 4500
F
“
r
p
t
I
T
m
c
r
of the solution, versus the number of iterations for the “tseng”S.M. Sait et al. / Journal of Applied R
lso replace BestS  by NewS. If the new solution has a higher
ost in comparison to the original solution CurS, Metropolis
ill accept the new solution on a probabilistic  basis. A random
umber is generated in the range 0 to 1. If this random num-
er is smaller than e−Cost/T, where Cost  is the difference in
osts, (Cost  = Cost(NewS) −  Cost(CurS)), and T is the current
emperature, the uphill solution is accepted. This criterion for
ccepting the new solution is known as the Metropolis  criterion.
he Metropolis  procedure generates and examines M solutions.
The probability that an inferior solution is accepted by the
etropolis  is given by P(RANDOM<e−Cost/T). The random
umber generation is assumed to follow a uniform  distribution.
emember that Cost  > 0 since we have assumed that NewS  is
phill from CurS. At very high temperatures, (when T →  ∞),
−Cost/T1
, and hence the above probability approaches 1. On
he contrary, when T  →  0, the probability e−Cost/T falls to 0.
In order to implement simulated annealing, the same cost
unction formulated in Section IV is used. In addition, simi-
ar techniques are used to implement the Neighbor  function to
enerate new states from current states.
.  Experimental  results
The algorithms described in this work were implemented
nd tested on a set of benchmark circuits to solve the L-way
artitioning and assignment problem, where L  is the number
f layers. These include small, medium, and large size cir-
uits (“MCNC/GSRC benchmarks,” n.d.; Yang, 1991). Table 1
eports these circuits and their respective number of IOs, blocks,
nd nets.
After many experiments and fine tuning to the parameters of
he iterative heuristics, a candidate list of size 25 is used for tabu
earch; the tabu list size is set to 10. For simulated annealing,
ifferent values for parameters α, β, and M were tested and the
alue of α  = 0.98, β  = 1.001, and M  = 50 were found suitable.
he initial temperature, T0, was determined using the classi-
al heating method by experiment with increasing temperatures
ntil the probability of accepting of both good and bad moves
as very high. Simulation was by setting the temperature to
, and then increasing until 100. Acceptable values of T0 were
ound to be in the range [40,50]. The number of iterations for
S is fixed to 4,000. Since the candidate list size in TS is 25,
or the sake of comparison simulated annealing is allowed to
able 1
enchmark circuits.
ircuit Num. of IOs Num. of blocks Num. of nets
mi33 42 33 123
mi49 22 49 408
100 334 100 885
200 564 200 1585
300 569 300 1893
seng 174 2417 2295
iffeq 103 3024 2985
lliptic 245 6831 6717
risc 136 7024 6908
dc 56 7609 7569
c
i
F
t
sig. 9. Membership value of the fuzzy cost function versus iterations for the
tseng” circuit for: (a) tabu search; (b) simulated annealing.
un for 4000 ×  25 = 100,000 iterations. The objective is set to
artition the design into 4 layers in addition to the I/O layer. The
otal number of TSVs reported later includes TSVs that connect
/O terminals to other blocks in the circuit. The size of a TSV,
SVarea, is fixed to 10 units. The value of the weight of the total
embership function δ  is set to 0.8 in Eq. (1).
The performance of the presented algorithms is compared
onsidering different aspects. Fig. 9 shows a plot of the values
etrieved by the fuzzy cost function, that is membership valueircuit for both TS and SA. Recall that the TS and SA are seek-
ng to maximize the membership function. In this regard, SA
0.95
0.9
0.85
0.8
0.75
0.7
0.65
0.6
0 2000 4000 6000
Iterations
M
em
be
rs
hi
p 
va
lu
e
8000 10 000 12 000
ig. 10. Membership value of the fuzzy cost function versus iterations for
he “tseng” circuit for tabu search with extended run-time and candidate list
ize = 40.
7 esearch and Technology 14 (2016) 67–76
o
A
t
a
s
o
w
a
H
t
t
r
s
t
t
i
f
a
s
I
s
t
T
l
r
m
a
c
v
T
a
a
r
f
i
h
c
t
a
i
3500
3000
2500
2000
1500
1000
7000
6000
5000
4000
3000
2000
1000
0
500
0
0.6-0.65 0.65-0.7 0.7-0.75 0.75-0.8 0.8-0.85
Membership value
a
b
0.85-0.9 0.9-1
0.6-0.65 0.65-0.7 0.7-0.75 0.75-0.8 0.8-0.85
Membership value
Fr
eq
ue
nc
y 
of
 s
ol
ut
io
ns
Fr
eq
ue
nc
y 
of
 s
ol
ut
io
ns
0.85-0.9 0.9-1
F
“
o
o
t
a
i
i
f
T
T
T
C
a
a
n
n
n
t
d
e
f
p
A4 S.M. Sait et al. / Journal of Applied R
utperformed TS as it achieved a higher membership value.
fter a little bit of almost a random behavior where the ini-
ial temperature is high, simulated annealing sharply achieved
n improvement from 0.65 up to 0.85. Later on, it went up on a
lower pace until it almost saturated around a membership value
f 0.92. On the other hand, TS showed a steady improvement
hich started sharply at the beginning and then continued slow
nd steady. TS could not pass the 0.9 threshold in the given time.
owever, giving TS enough time should eventually lead to bet-
er solutions as TS can always escape local minima. To verify
his, Fig. 10 shows the performance of tabu search when the
un-time is extended to 10,000 iterations and the candidate list
ize is increased to 40. It is evident that, given enough run-time,
abu search can produce high quality solutions.
Another experiment consists of comparing the two heuris-
ics with respect to the searched spaces. Results are summarized
n Fig. 11 in the form of bar charts. TS exhibited the best per-
ormance, while SA was behind. This figure depicts where each
lgorithm spent its time. TS concentrated its efforts in good sub-
paces, i.e., evaluating solutions with high membership values.
n contrast, SA spent most of its time evaluating poor quality
olutions as it starts with almost a random behavior at the initial
emperature. These behaviors were observed with all test cases.
abu search showed a better behavior in general. As shown ear-
ier in Fig. 10, giving enough time, it can produce high quality
esults. Thus, the termination criterion, i.e., number of maxi-
um iterations, can be decided later based on the application
nd the time constraint.
Finally, Tables 2 and 3 report detailed results about the appli-
ation of TS and SA respectively. In these tables membership
alues of the fuzzy cost function, total number of the required
SVs, standard deviation of area across all layers (in units),
verage layer area, and the percentage of area deviation to the
verage area are presented (average values of different runs are
eported in these tables). Both algorithms exhibited a good per-
ormance on small as well as large circuits. It is evident that,
n terms of membership value of the fuzzy cost function, SA
as achieved better results when compared to TS in most of the
ases in our setup. This membership value translates itself to the
otal number of required TSVs and the standard deviation in area
cross different layers. A higher membership value could result
n a better TSV count and a better area deviation, or at least one
w
p
a
T
able 2
abu search results with: 4,000 iterations, candidate list size = 25, and tabu list size =
ircuit Avg. membership Total # TSVs Area Deviati
mi33 0.908 113 8 
mi49 0.941 277 55,797 
100 0.821 991 812 
200 0.791 2076 831 
300 0.813 2210 797 
seng 0.910 1568 479 
iffeq 0.893 1986 489 
lliptic 0.870 4365 1185 
risc 0.870 4442 1163 
dc 0.846 7296 498 
vg. 0.866 2532 6206 ig. 11. Bar charts depicting the sub-space searched by each heuristic for the
tseng” circuit: (a) tabu search; (b) simulated annealing.
f them. If we look at the “n100” circuit in the two tables, we
bserve almost the same membership value. However, in the first
able, associated with TS, it results in a better TSV count and
 worse area deviation as compared to the second table which
s associated with SA. In the used setup, SA outperformed TS
n achieving a better TSV count and area deviation especially
or large circuits like “tseng”, “diffeq”, “elliptic”, and “pdc”.
he standard deviation of area for all the cases is less than 5%
ith an average around 2.4%. Unfortunately, we could not com-
are our results with the ones reported in (Huang et al., 2011)
s they did not report the area deviation. Moreover, number of
SVs reported for some circuits is less than number of nets with
 10.
on (units) Avg. Layer Area (units) Dev. to Avg. ratio (%)
593 1.28
7,481,946 0.75
48,129 1.69
51,181 1.62
76,519 1.04
9924 4.83
12,761 3.83
29,275 4.05
29,711 3.91
41,699 1.20
778,174 2.42
S.M. Sait et al. / Journal of Applied Research and Technology 14 (2016) 67–76 75
Table 3
Simulated annealing results with: 100,000 iterations, α = 0.98, β = 1.001, M = 50, and the initial temperature, T0, in the range [40,50].
Circuit Avg. membership Total # TSVs Area Deviation (units) Avg. Layer Area (units) Dev. to Avg. ratio (%)
ami33 0.849 116 9 573 1.48
ami49 0.925 292 56,082 7,481,931 0.75
n100 0.820 996 725 48,170 1.50
n200 0.795 2035 930 50,646 1.84
n300 0.820 2133 882 76,223 1.16
tseng 0.924 1091 399 8927 4.47
diffeq 0.930 1309 398 11,337 3.51
elliptic 0.904 3502 897 26,728 3.36
frisc 0.897 3683 1213 27,772 4.37
p
A
I
a
t
(
c
o
7
a
h
t
a
t
b
p
o
n
C
A
M
R
A
B
B
C
D
F
F
G
H
H
C
F
C
K
K
K
L
Ldc 0.865 6398 399 
vg. 0.873 2155 6193 
/O connections which contradicts with both our assumptions
nd practicality. For example, “aqua” circuit has 3792 IOs and
he reported number of required TSVs for this circuit is 909.6
Huang et al., 2011). However, the results presented in this work
an be the base for future work on the partitioning step or the
ther subsequent steps.
.  Conclusion
In this work we provided a multi-objective circuit partitioning
nd layer assignment techniques, using iterative optimization
euristics namely Tabu Search and Simulated Annealing, that
akes into consideration TSV count minimization, and area bal-
ncing across all layers while including the I/O constraint. Both
he heuristics have been engineered to tackle this problem. They
oth were able to produce a high quality outcome. The average
ercentage of the area deviation compared to the average area
f a layer was around 2.4% in addition to minimizing the total
umber of required TSVs.
onﬂict  of  interest
The authors have no conflicts of interest to declare.
cknowledgment
Authors acknowledge King Fahd University of Petroleum &
inerals for all support.
eferences
babei, C., Mogal, H., & Bazargan, K. (2006). Three-dimensional
place and route for FPGAs. IEEE Transactions on Computer-
Aided Design of Integrated Circuits & Systems, 25(6), 1132–1140.
http://dx.doi.org/10.1109/TCAD.2005.855945
aliga, J. (2004). Chips go vertical [3D IC interconnection]. Spectrum, IEEE,
41(3), 43–47. http://dx.doi.org/10.1109/MSPEC.2004.1270547
eyne, E., De Moor, P., Ruythooren, W., Labie, R., Jourdain, A., Tilmans,
H., et al. (2008). Through-silicon via and die stacking technologies for
microsystems-integration. In 2008 IEEE International Electron Devices
Meeting (pp. 1–4). http://dx.doi.org/10.1109/IEDM.2008.4796734
hiang, C., & Sinha, S. (2009). The road to 3D EDA tool readiness. In Proceed-
ings of the Asia and South Paciﬁc Design Automation Conference, ASP-DAC
(pp. 429–436). http://dx.doi.org/10.1109/ASPDAC.2009.4796519
L40,647 0.98
777,295 2.34
avis, W. R., Wilson, J., Mick, S., Xu, J., Hua, H., Mineo, C.,
et al. (2005). Demystifying 3D ICs: The pros and cons of going
vertical. IEEE Design and Test of Computers, 22(6), 498–510.
http://dx.doi.org/10.1109/MDT.2005.136
odor, J., Marichal, J.-L., & Roubens, M. (1995). Characterization of the ordered
weighted averaging operators. IEEE Transactions on Fuzzy Systems, 3(2),
236–240. http://dx.doi.org/10.1109/91.388176
odor, J., & Roubens, M. (1994). Fuzzy preference modelling and multi-
criteria decision support. Theory and Decision Library Series D: System
Theory, Knowledge Engineering, and Problem Solving, http://dx.doi.org/
10.1002/(SICI)1099-0771(199612)9:4<300::AID-BDM226>3.0.CO;2-8
erousis, V. (2010). Physical design implementation for 3D IC: method-
ology and tools. In Proceedings of the 19th International Sym-
posium on Physical Design (p. 57). New York, USA: ACM.
http://dx.doi.org/10.1145/1735023.1735042
agen, L., & Kahng, A. B. (1992). New spectral methods for ratio
cut partitioning and clustering. IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, 11(9), 1074–1085.
http://dx.doi.org/10.1109/43.159993
uang, Y.-S., Liu, Y.-H., & Huang, J.-D. (2011). Layer-aware design
partitioning for vertical interconnect minimization. In 2011 IEEE
Computer Society Annual Symposium on VLSI (pp. 144–149).
http://dx.doi.org/10.1109/ISVLSI.2011.16
roucher, J. (2002). Statistics: Making business decisions. Sydney: McGraw-
Hill.
odor, J., & Roubens, M. (1992). Aggregation and scoring procedures in mul-
ticriteria decision making methods. In IEEE International Conference on
Fuzzy Systems (pp. 1261–1267).
roucher, J. S. (2004). An upper bound on the value of the standard
deviation. Teaching Statistics, 26(2), 54–55. http://dx.doi.org/10.1111/
j.1467-9639.2004.00157.x
arypis, G., Aggarwal, R., Kumar, V., & Shekhar, S. (1999). Multilevel
hypergraph partitioning: applications in VLSI domain. IEEE Transac-
tions on Very Large Scale Integration (VLSI) Systems, 7(1), 69–79.
http://dx.doi.org/10.1109/92.748202
im, D. H., Athikulwongse, K., & Lim, S. K. (2013). Study of through-
silicon-via impact on the 3-D stacked IC layout. IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, 21(5), 862–874.
http://dx.doi.org/10.1109/TVLSI.2012.2201760
im, D. H., Mukhopadhyay, S., & Lim, S. K. (2009). TSV-aware inter-
connect length and power prediction for 3D stacked ICs. In 2009
IEEE International Interconnect Technology Conference (pp. 26–28).
http://dx.doi.org/10.1109/IITC.2009.5090331
ee, C.-F., Lee, J. C., & Lee, A. C. (2006). Statistics for business and economics.
The American Statistician, Vol. 60 http://dx.doi.org/10.1198/tas.2006.s59
ee, W. Y., Jiang, I. H. R., & Mei, T. W. (2012). Generic inte-
ger linear programming formulation for 3D IC partitioning. Jour-
nal of Information Science and Engineering, 28(6), 1129–1144.
http://dx.doi.org/10.1109/SOCCON.2009.5398032
i, C.-R., Mak, W.-K., & Wang, T.-C. (2012). Fast fixed-outline 3-
D IC floorplanning with TSV co-placement. IEEE Transactions on
7 esear
L
M
M
P
S
Y
S
S
Z
Z
ics SMC, 3(1), 28–44. http://dx.doi.org/10.1109/TSMC.1973.54085756 S.M. Sait et al. / Journal of Applied R
Very Large Scale Integration (VLSI) Systems, 21(3) http://dx.doi.org/
10.1109/TVLSI.2012.2190537
i, Z., Hong, X., Zhou, Q., Cai, Y., Bian, J., Yang, H. H., et al. (2006). Hier-
archical 3-D floorplanning algorithm for wirelength optimization. IEEE
Transactions on Circuits and Systems I: Regular Papers, 53(12), 2637–2646.
http://dx.doi.org/10.1109/TCSI.2006.883857
CNC/G benchmarks, MCNC/GSRC benchmarks. (n.d.). Retrieved from
http://vlsicad.cs.binghamton.edu/benchmarks.html.
etropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H.,
& Teller, E. (1953). Equation of state calculations by fast comput-
ing machines. The Journal of Chemical Physics, 21(6), 1087–1092.
http://dx.doi.org/10.1063/1.1699114
avlidis, V. F., & Friedman, E. G. (2009). Interconnect-based design method-
ologies for three-dimensional integrated circuits. Proceedings of the IEEE,
97(1), 123–140. http://dx.doi.org/10.1109/JPROC.2008.2007473
ait, S. M., & Youssef, H. (1999). Iterative computer algorithms with appli-
cations in engineering: Solving combinatorial optimization problems.
California: IEEE Computer Society Press.
Zch and Technology 14 (2016) 67–76
ang, S. (1991). Logic Synthesis and Optimization Benchmarks User Guide
Version 3.0.
ait, S. M., & Youssef, H. (1994). VISI physical design automation: Theory and
practice. McGraw-Hill, Inc.
iozios, K., Sotiriadis, K., Pavlidis, V. F., & Soudris, D. (2007).
A software-supported methodology for designing high-performance
3D FPGA architectures. In 2007 IFIP International Conference on
Very Large Scale Integration, VLSI-SoC (pp. 54–59). http://dx.doi.org/
10.1109/VLSISOC.2007.4402472
adeh, L. a. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
http://dx.doi.org/10.1016/S0019-9958(65)90241-X
adeh, L. a. (1973). Outline of a new approach to the analysis of complex systems
and decision processes. IEEE Transactions on Systems, Man, and Cybernet-adeh, L. A. (1975). The concept of a linguistic variable and its applica-
tion to approximate reasoning—II. Information Sciences, 8(4), 301–357.
http://dx.doi.org/10.1016/0020-0255(75)90046-8
