ePlace-3D: Electrostatics based Placement for 3D-ICs by Lu, Jingwei et al.
ar
X
iv
:1
51
2.
08
29
1v
5 
 [c
s.O
H]
  2
8 F
eb
 20
16
ePlace-3D: Electrostatics based Placement for 3D-ICs
Jingwei Lu
Cadence Design Systems, Inc.
francesco.ljw@gmail.com
Hao Zhuang
Dept. CSE of UCSD
hao.zhuang@cs.ucsd.edu
Ilgweon Kang
Dept. CSE of UCSD
igkang@ucsd.edu
Pengwen Chen
Dept. Applied Maths of NCHU
pengwen@nchu.edu.tw
Chung-Kuan Cheng
Dept. CSE of UCSD
ckcheng@ucsd.edu
ABSTRACT
We propose a flat, analytic, mixed-size placement algorithm
ePlace-3D for three-dimension integrated circuits (3D-ICs)
using nonlinear optimization. Our contributions are (1) elec-
trostatics based 3D density function with globally uniform
smoothness (2) 3D numerical solution with improved spec-
tral formulation (3) 3D nonlinear pre-conditioner for con-
vergence acceleration (4) interleaved 2D-3D placement for
efficiency enhancement. Our placer outperforms the leading
work mPL6-3D and NTUplace3-3D with 6.44% and 37.15%
shorter wirelength, 9.11% and 10.27% fewer 3D vertical in-
terconnects (VI) on average of IBM-PLACE circuits. Vali-
dation on the large-scale modern mixed-size (MMS) 3D cir-
cuits shows high performance and scalability.
1. INTRODUCTION
Placement remains dominant on the overall quality of
physical design automation [29, 30]. Based on logic syn-
thesis [31], back-end design on timing [45], power [9, 44],
routability [8,38], variability [3,42] etc. are highly impacted
by placement performance. The emerging 3D-IC [28] chal-
lenges the traditional 2D placers [1,2,5,17,19,25,41] to pro-
duce 3D circuit layout with minimum wirelength yet limited
vertical interconnects (through-silicon vias (TSVs), mono-
lithic inter-tier vias (MIVs), etc.). Innovations of mixed-size
3D-IC placement become quite desirable.
Previous combinatorial 3D-IC placers form two cate-
gories. Folding based methods [4] folds the 2D-IC placement
layout to produce 3D solution with local refinement. Parti-
tioning based approaches [7,18] minimize the usage of verti-
cal resources. Kim et al. [18] partitions the netlist followed
by tier assignment, then applies 2D quadratic placement [40]
simultaneously over all the tiers. Analytic placers achieve
better 3D-IC placement performace versus combinatorial al-
gorithms. Goplen et al. [6] models the 3D-IC placement by
a quadratic framework [5]. Hsu et al. [10] extends the 2D-IC
placement prototype [11] and uses Bell-shape function [34]
to smooth the vertical dimension. Luo et al. [27] utilizes the
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full cita-
tion on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-
publish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from permissions@acm.org.
ISPD ’16, April 3–6, 2016, Santa Rosa, California, USA.
c© 2016 ACM. ISBN 978-1-4503-4039-7/16/04. . . $15.00
DOI: http://dx.doi.org/10.1145/2872334.2872361
2D algorithm in [1] and relaxes the discrete tiers via Huber
function [12]. However, these modeling functions are only
locally smooth. Moreover, their hierarchical cell clustering
and grid coarsening would degrade the quality [25]. Sepa-
rately, prior 3D placement benchmarks [13, 15] are of up to
only 210K cells, which are too small to represent modern
design complexity. Large-scale bookshelf 3D-IC placement
benchmarks become desirable.
In this work, we extend the 2D placers ePlace [22,23,25]
and ePlace-MS [22,26] to the 3D domain. Our algorithm is
named ePlace-3D and focused on wirelength minimiza-
tion and density equalization, while other 3D-IC objec-
tives like thermal are not covered. To the best of our knowl-
edge, this is the first work in literature achieving analytically
global smoothness along all the three dimensions. In con-
trast, previous analytic works [10,27] only ensure (partially)
local smoothness in their density functions [12, 34], while
their less continuous cell movement would slow down place-
ment convergence and cause more penalty on wirelength.
We conduct analytic global placement and stochastic legal-
ization in the entire 3D cuboid domain, which maximizes the
search space thus further boost the solution quality. ePlace-
3D well demonstrates the applicability of the electrostatic
density model eDensity [23, 24] in various physical dimen-
sions. Our specific contributions are listed as follows.
• eDensity-3D: an electrostatics based 3D density func-
tion ensuring global smoothness.
• A 3D numerical solution based on fast Fourier trans-
form (FFT) and improved spectral formulation.
• A nonlinear 3D preconditioner to equalize all the mov-
ing objects in the optimization perspective.
• Interleaving coarse-grained 3D placement with fine-
grained 2D placement to enhance efficiency.
• Our mixed-size 3D-IC placement prototype ePlace-
3D outperforms the leading placers mPL6-3D [27] and
NTUplace3-3D [10] with 6.44% and 37.15% shorter
wirelength, 9.11% and 10.27% fewer 3D vertical inter-
connects, while runs 2.55× and 0.30× faster on average
of all the ten IBM-PLACE benchmarks [13],
The remainder is organized as follows. Section 2 introduces
the background knowledge. Section 3 discusses our 3D place-
ment density function eDensity-3D, numerical solution, and
nonlinear precondition. Section 4 provides an overview of
ePlace-3D algorithm. Experiments and results are shown in
Section 5. We conclude in Section 6.
(a) Iter=0, U=6.18e17, τ =
90.13%.
(b) Iter=3, U=5.84e16, τ =
47.56%.
(c) Iter=6, U=3.56e15, τ =
11.75%.
(d) Iter=20, U=2.07e14, τ =
2.53%.
Figure 1: Iterative density-driven global placement (wirelength force disabled) with potential U and density
overflow τ on the MMS ADAPTEC1 benchmark with three tiers and resolution of 8× 8× 8. Electric density
and field are shown by gray scale and red arrows. All the movable objects are initialized at the bottom tier
where all IO blocks locate. eDensity-3D iteratively spreads all the movable objects evenly within the entire
3D domain to equalize the placement density.
2. BACKGROUND
Given a set V of n objects, net set N and 3D cuboid core
region R = [0, dx]×[0, dy ]×[0, dz], global placement is formu-
lated as constrained optimization. The constraint desires
all the objects to be accommodated with zero overlap. Let
v denote the placement solution, which consists of the phys-
ical coordinates of all the objects. The region R is uniformly
decomposed into mx ×my ×mz 3D bins denoted as set B.
For every bin b ∈ B, the density ρb(v) should not exceed
the target density ρt. The objective is to minimize the to-
tal half-perimeter wirelength (HPWL) of all the nets. Let
HPWLex = maxi,j∈e |xi − xj | denote the horizontal wire-
length of net e (similar forHPWLey), the total 2D HPWL is
HPWL(v) =
∑
e∈N
(
βxHPWLex(v) + βyHPWLey(v)
)
. We
use βx, βy and βz as dimensional weighting factors. 3D-IC
placement needs vertical interconnects, such as through-
silicon via (TSV) and monolithic inter-tier via (MIV), to
penetrate silicon tiers. Diverse types of connects have dif-
ferent physical and electrical properties. However, ePlace-
3D is compatible with any types of connects, which can
be reflected on the weight of βz. In the remainder of this
manuscript, we name all types of 3D vertical interconnects
uniformly asVI for simplicity. The number of vertical inter-
connect units (#VI) is computed as how many times silicon
tiers have been penetrated, e.g., one vertical connect between
tier one and tier three is counted as two V I. The nonlinear
placement optimization is formulated as
min
v
(HPWL(v) + βz#V I) s.t. ρb(v) ≤ ρt, ∀b ∈ B. (1)
Analytic methods conduct placement using gradient-directed
optimization. As HPWL(v) is not differentiable, we use
wirelength smoothing by weighted-average (WA) model [10].
Wex(v) =
∑
i∈e xi exp (xi/γx)∑
i∈e exp (xi/γx)
−
∑
i∈e xi exp (−xi/γx)∑
i∈e exp (−xi/γx)
(2)
HereWe(v) = βxWex(v)+βyWey (v)+βzWez (v) andW (v) =∑
eWe(v). γx, γy and γz control the modeling accuracy.
Density function relaxes all the |B| constraints in Eq. (1).
Most 2D and 3D quadratic placers [6,18,19] follow the lin-
ear density force formulation by [5]. Nonlinear placers [1,
10,11,27] have their dedicated density functions. NTUplace3-
3D [10] leverages bell-shape curve [34] for local smoothness
in 3D domain. mPL6-3D [1] uses Helmholtz function to
globally smoothen the 2D plane and Huber’s function to lo-
cally smoothen the vertical dimension. The electrostatics
based density function [25] converts objects to charges. By
the Lorentz law, the electric repulsive force spreads charges
away towards the electrostatic equilibrium state, which pro-
duces a globally even density distribution. Let U(v) denote
the density cost function, the constraints in Eq. (1) can be
relaxed by the penalty factor λ, while the unconstrained op-
timization is shown as below.
min
v
f(v) =W (v) + λU(v), (3)
In this work, we set vertical connects as zero-volumed thus
do not consider them in eDensity-3D1. Therefore, the opti-
mization of electrostatics will not be affected and can be still
achieved based on the movement of netlist objects. Density
overflow is used to terminate global placement and denoted
as τ , which is
τ =
∑
b∈B max
(
Vmb − ρtV
WS
b , 0
)
Vm
. (4)
Here Vm is the total volume of all the movable objects, V
m
b is
the total volume of objects in the bin b, and V WSb is the total
whitespace in bin b. The volume of each cell is computed as
its planar area multiplied by the depth of each tier.
1Practically, vertical connects can never be zero volumed.
However, for academic research we are able to simplify the
engineering problems to boost scientific innovations. Simi-
larly, state-of-the-art 2D placement academic works [1,2,17,
19,25,40,41] target wirelength only and ignore other objec-
tives like timing, power and routability. As vertical connects
may be of large volume thus significantly contribute to the
placement density, we will put it in our future work.
3. EDENSITY-3D: 3D DENSITY FUNCTION
In this section, we introduce our novel 3D density function
eDensity-3D, a fast numerical solution by spectral meth-
ods, and approximated 3D nonlinear preconditioner. The
key insight is, we treat the third dimension equally as the
other two dimensions, such that vertical cell movement will
be as smooth as the planar movement in 2D placement. The
behavior of eDensity-3D is visualized in Figure 1.
3.1 3D Density Function
Extending the planar function eDensity in [25], eDensity-
3D models the entire placement instance as a 3D electro-
static field. Every placement object (standard cells, macros
and fillers) is converted to a positively charged cuboid. The
electric repulsive force spreads all the objects away from the
high-density region. The 3D density cost U is modeled as
the total potential energy of the system and defined as below
U(v) =
∑
i∈V
Ui(v) =
∑
i∈V
qiΦi(v). (5)
qi denotes the electric quantity of the charge i and is set as
the physical volume of placement object i. Φi is the elec-
tric potential at charge i. Charges with high potential will
reduce the placement overlap by moving towards the direc-
tion of largest energy descent. Unlike the spatial density
distribution ρ(x, y, z) (Figure 1) which is coarse and non-
differentiable, the electric potential distribution Φ(x, y, z) is
globally smooth. We use the potential gradient (thus electric
field), ∇Φ(x, y, z) = E(x, y, z), to direct cell movement for
density equalization. Given a placement layout v, we gen-
erate the density map ρ(x, y, z), then compute the potential
map Φ(x, y, z) by solving the 3D Poisson’s equation


∇ · ∇Φ(x, y, z) = −ρ(x, y, z),
nˆ · ∇Φ(x, y, z) = 0, (x, y, z) ∈ ∂R,∫∫∫
R
Φ(x, y, z) =
∫∫∫
R
ρ(x, y, z) = 0.
(6)
Here nˆ is the outer unit normal of the placement cube R.
∂R is the boundary and consists of orthogonal rectangular
planes to enclose the placement cuboid. In Eq. (6), the first
equation has∇·∇ ≡ ∂
2
∂x2
+ ∂
2
∂y2
+ ∂
2
∂z2
. Neumann condition by
the second equation requires that when any object i reaches
any boundary plane, its density force vector will have the
component perpendicular to the plane reduced to zero, in
order to prevent i from penetrating the plane. The third
equation shows that the integral of density ρ(x, y, z) and
potential Φ(x, y, z) within R are set to zero to ensure that
(1) electric force drives all the charges towards even den-
sity distribution rather than pushing them to infinity, which
matches the placement objective (2) the 3D Poisson’s equa-
tion would have a unique solution by satisfying the Neumann
condition. We differentiate the potential Φi on each charge
i to generate the electric field ∇Φi = Ei =
(
Eix , Eiy , Eiz
)
.
The electric (density) force is ∇Ui = qi∇Φi = qiEi.
3.2 3D Numerical Solution
Based on the 2D solution in [25], we solve the 3D Pois-
son’s equation by spectral methods using frequency decom-
position [39]. To satisfy the Neumann condition of zero gra-
dients at the boundaries, we use sinusoidal wave to express
the electric field E(x, y, z). We construct an odd and pe-
riodic field distribution by negatively mirroring itself w.r.t.
the origin, then periodically extending it towards positive
and negative infinities. Electric potential and density dis-
tributions are then expressed by cosine waveforms, which
are the integration and differentiation of the field. Let aj,k,l
denote the 3D coefficients of the density frequency.
aj,k,l =
1
n3
∑
x,y,z
ρ(x, y, z) cos(wjx) cos(wky) cos(wlz) (7)
eDensity [25] sets wj = pi
j
mx
, which equals the discrete
index for the jth frequency component. However, as we are
conducting placement in a continuous domain, the multipli-
cation of x and wj induces inconsistency. In this work, we
propose improved spectral methods for the 3D place-
ment density function. Specifically, we set wj = pi
j
dx
since
x ranges within (0, dx). As a result, wjx = pij
x
dx
well
matches the original unit of discrete frequency index, and
we have all the frequency indexes defined as {wj , wk, wl} =
{pij
dx
, pik
dy
, pil
dz
}. As mentioned in Section 2, here dx, dy and
dz represent the dimensions of the cuboid placement core
region. d{x,y,z} can be set as any value since w{j,k,l} will be
normalized by {x,y,z}
d{x,y,z}
. j, k and l range in [0, n−1], which is
only half of a cosine function period. In contrast, one com-
plete function period centered at the origin is [−n, n − 1].
Therefore, we have pi rather than 2pi in the above frequency
index. We set a0,0,0 = 0 to remove the zero-frequency com-
ponent. The spatial density distribution ρ(x, y, z) is
ρ(x, y, z) =
∑
j,k,l
aj,k,l cos(wjx) cos(wky) cos(wlz). (8)
!"#$%&'()
*+,-.(#/+'
0+'1%'&#")
2-/,%3#/+'
0+
41156%&"))
711&"5+'18))
9:)
;1#<&,&'(
41156%&"))
=%>&$5?%3&)
9:)
;1#<&,&'(
@%>&$)=#<"+)
A#8+.(
6%&"5B85/&")
9:)$&(#%1&$)
-1#<&,&'(
?4)=#<"+)
A&C#1%3#/+'
@%>)?($5*&11)
A#8+.(
D&,+E&)
@%11&"F
D#'$+,)
@%11&"):%F("G
@%11&"52'18)
;1#<&,&'(
*&11F)H)@%11&")
*+5;1#<&
?($5*&11)6%&")
4FF%C',&'(
41156%&")9:)
*&11)H)@%11&"))
*+5;1#<&
41156%&")9:)
@%11&"52'18)
;1#<&,&'(
I&F
;#"#,&(&")
J-$#(&
!"#$%&'"($)*+)
,-./0-)1-02$3$45
!"#$%&'"($)!026.)
7$80-"(09.4
'504%06%&:$--);+)
,-./0-)1-02$3$45
!"#$%&'"($);+)
,-./0-)1-02$3$45
'504%06%&:$--)*+)
,-./0-)1-02$3$45
KL)ML)N)
UW !! ,
mGP
v
IP
v
*+'E&"C&O)
PKQRSTU)
cell&mac
v
mac
v
mac
v opt.
cell
v
filler
v
filler
v opt.
cell&filler
v opt.
cell
v
 opt.
filler-2D
v opt.
cell-2D
v opt.
Figure 2: the flowchart of ePlace-3d.
To achieve ∇ · ∇Φ(x, y, z) = −ρ(x, y, z), the solution to
the potential can be expressed as
Φ(x, y, z) =
∑
j,k,l
aj,k,l
w2
j
+ w2
k
+ w2
l
cos(wjx) cos(wky) cos(wlz). (9)
By differentiating Eq. (9), we have the electric field distri-
bution E(x, y, z) = (Ex, Ey, Ez) shown as below


Ex(x, y, z) =
∑
j,k,l
aj,k,lwj
w2
j
+w2
k
+w2
l
sin(wjx) cos(wky) cos(wlz),
Ey(x, y, z) =
∑
j,k,l
aj,k,lwk
w2
j
+w2
k
+w2
l
cos(wjx) sin(wky) cos(wlz),
Ez(x, y, z) =
∑
j,k,l
aj,k,lwl
w2
j
+w2
k
+w2
l
cos(wjx) cos(wky) sin(wlz).
(10)
Let |B| = mx ×my × mz denote the total number of bins
in global placement. Instead of quadratic complexity, above
spectral equations can be efficiently solved using FFT algo-
rithms [36] with O (|B|log|B|) complexity.
3.3 3D Nonlinear Precondition
Theoretically, preconditioning improves convergence rate
rather than solution quality. However, as placement is a
highly nonlinear, non-convex and ill-conditioned problem,
the Hessian matrix with improved condition number would
reshape the search direction for the nonlinear solver to fol-
low. As a result, preconditioning would open the gate for un-
explored high-dimension search space, while surprising qual-
ity enhancement would be expectable.
Preconditioned mixed-size placement should tolerate the
huge physical and topological differences between all the
standard cells, macros, and dummy fillers. In [25], the non-
linear preconditioner H for 2D placement is modeled as
Hix =
∂2f
∂x2i
=
∂2W
∂x2i
+ λ
∂2U
∂x2i
≈ |Ni|+ λAi. (11)
Here Ni are all the nets incident to the object i, Ai is the
2D area of the object i. In 3D placement, we use Vi to
denote the volume of i instead. The preconditioned gra-
dient ∇fpre = H
−1∇f then improves and accelerates the
placement. Our studies show that Eq. (11) relies on the as-
sumption of
∂2W/∂x2i
∂2U/∂x2
i
≈ |Ni|
Ai
. However, the third dimension
weakens ∂
2W
∂x2
i
and breaks the above assumption. As a result,
|Ni| dominates λVi and makes fillers and macros with small
|Ni| spread faster than standard cells, as Eq. (12) shows
∥∥H−1ix ∇U
∥∥ =
∥∥∥∥
∂U/∂xi
λVi
∥∥∥∥≫
∥∥∥∥
∂f/∂xi
|Ni|+ λVi
∥∥∥∥ =
∥∥H−1ix ∇f
∥∥ .
(12)
Instead, we propose a new preconditioner as below
Hix =
∂2f
∂x2i
≈ λ
∂2U
∂x2i
≈ λVi, (13)
The noise factors introduced by |Ni| is resolved, where all
the objects are being equalized in the optimizer’s perspective
and simultaneously spread over the entire domain. Exper-
iments show that our 3D preconditioner reduces the global
placement iterations by 15% and improves the wirelength by
30% over all the 16 MMS benchmarks.
3.4 Complexity
Complexity significantly impacts the placement runtime.
In each iteration, we traverse all the bins to reset their den-
sity in O(|B|) time, then traverse all the placement objects
in O(n) time to update the superimposed density map. By
Eq. (7), (9) and (10), five times of 3D FFT computation
are invoked, which costs O(5n log n) time. By our grid siz-
ing strategy in Eq. (14), |B|/n is limited to constant. The
overall complexity is thus O(|B|+n+5n log n) ≈ O(n log n),
In ePlace-3D, the placement domain is geometrically trans-
formed from R = [0, dx]×[0, dy ]×[0, dz] toR
′ = [0, 1]×[0, 1]×
[0, 1]. We set the density resolutions mx = my = mz = m3D
to make the placement domain R′ uniformly decomposed
into |B| = m33D cubic bins. Let VR denote the total volume
of R and VCavg denote the average area of all standard cells.
The grid sizing is set as
|B| =
VR
k × VCavg × ρ
−1
t
. (14)
Here every k standard cells are accommodated by one bin.
Placement quality (efficiency) is determined by the value of
k. In this work, we constantly set k = 1.0.
4. EPLACE-3D: OVERVIEW
ePlace-3D is built upon the infrastructure of ePlace-MS [26].
Figure 2 shows the flowchart of our algorithm. Given a
placement instance, our algorithm minimizes the quadratic
wirelength over the 3D domain to produce the initial solu-
tion vIP. To approach the optimum solution in the end, we
make vIP as minimum-wirelength violation-tolerant.
(a) Iter=0, WL=1.32e7,
#VI=0, τ = 93.7%.
(b) Iter=266, WL=3.29e7,
#VI=1.35e3, τ = 77.1%.
(c) Iter=328, WL=3.91e7,
#VI=4.14e3, τ = 61.1%.
(d) Iter=376, WL=4.21e7,
#VI=7.70e3, τ = 45.2%.
(e) Iter=432, WL=4.64e7,
#VI=8.57e3, τ = 28.9%.
(f) Iter=481, WL=5.06e7,
#VI=8.70e3, τ = 14.9%.
Figure 3: 3D-IC mixed-size global placement on
MMS ADAPTEC1 with three tiers. Standard cells,
macros and fillers are denoted by red dots, blue rect-
angles and cyan dots.
Our 3D-IC global placement is visualized in Figure 3. Un-
connected fillers [1, 25] are inserted to populate up extra
whitespace. All the fillers are equally sized by the average
dimensions of all the standard cells. The optimum solution
of 3D global placement will have all the cells, macros and
fillers orient towards discrete tiers. Otherwise, some cuboid
placement sites will be partially wasted, degrading the so-
lution quality. Figure 3(f) illustrates the beauty of our ap-
proach, i.e., the analytic 3D placer is visually approaching
density evenness from the vertical dimension, which ensures
negligible quality overhead during tier assignment. We use
Nesterov’s method [35] as the nonlinear solver and determine
the steplength by [25].
A multi-tier 2D-IC mixed-size global placement follows
by assigning all the macros and standard cells to the closest
tiers and separately filling the remaining whitespace on each
tier with fillers. Planar placement is conducted simultane-
ously over all the tiers. As wirelength smoothing is homo-
geneous over all the tiers (with the same γ), heterogeneous
grid sizing is not feasible as density force is dependent on
resolution by Eq. (10). We set all the tiers with the same
density resolution m2D, which is the maximum of that of
all the tiers by Eq. (14) with k = 1. In practice, we have
O(m22D) ≈ O(m
3
3D). Figure 4 illustrates the progression.
(a) Iter=0, WL=4.71e7,
#VI=7.96e3, τ = 31.9%.
(b) Iter=246, WL=3.64e7,
#VI=7.96e3, τ = 50.3%.
(c) Iter=322, WL=4.46e7,
#VI=7.96e3, τ = 31.9%.
(d) Iter=395, WL=4.99e7,
#VI=7.96e3, τ = 14.8%.
Figure 4: 2D-IC mixed-size global placement on
MMS ADAPTEC1 with three tiers. Initial and final
overflow are both larger than the final overflow of 3D
placement due to finer granularity (m2D ≫ m3D).
Our 3D-IC macro legalizer generates a legal macro layout
with zero macro overlap and small wirelength overhead. The
algorithm is stochastic based on simulated annealing [20].
A 3D-IC standard-cell global placement follows to mitigate
the quality loss due to sub-optimal macro legalization. We
assign standard cells to their closest tiers and conduct si-
multaneous 2D-IC standard-cell placement on all the tiers.
The standard-cell layouts of all the tiers are locally refined.
Figure 5 shows the respective placement progressions, more
details can be found in [26]. The detailed placer from [37] is
then invoked for a tier-by-tier standard-cell legalization and
detailed placement from the bottom to the top tier.
In general, we have fine-grained 2D placement interleaved
with coarse-grained 3D placement, which achieves a good
trade-off between quality and efficiency. On average of all
the ten IBM-PLACE circuits, the application of 2D refine-
ment reduces the wirelength by more than 4%.
5. EXPERIMENTS AND RESULTS
We implement ePlace-3D using C programming language
in the single-thread mode and execute the program in a
Linux machine with Intel i7 920 2.67GHz CPU and 12GB
memory. There is no benchmark specific parameter tuning
in our work. #VI are controlled by the weighting factor
βz based on capacitance ratio. By [16], one TSV (VI) has
the capacitance of CV I = 30fF at 45nm tech-node. ITRS
annual reports [14] show that unit capacitance of intercon-
nects at intermediate routing layers is constantly 2pF/cm
across various tech-nodes. Placement row height is 1.4um
at 45nm tech-node (70nm M1 half-pitch, ten M1 tracks per
row), capacitance becomes CROW = 0.3fF for 2D intercon-
nect spanning one-row height. Based on the length units for
each benchmark, as well as our geometric transformation of
the placement core region to be [0, 1] × [0, 1]× [0, 1] as dis-
cussed in Section 3.4, we compute the respective capacitance
ratio of one VI versus one unit wirelength and use it as the
VI weight. Specifically, we have
βz =
#tiers×CV I
#rows× CROW
(15)
(a) 3D macro LG: iter=0,
WL=4.99e7, #VI=7.96e3,
Om=9.05e5.
(b) 3D macro LG: iter=4,
WL=5.10e7, #VI=9.10e3,
Om=0.
(c) 3D standard-cell
GP: iter=0, WL=5.10e7,
#VI=9.10e3, τ = 8.1%.
(d) 2D standard cell
GP: iter=0, WL=4.92e7,
#VI=9.10e3, τ = 66.7%.
(e) 2D standard cell GP:
iter=394, WL=5.08e7,
#VI=9.11e3, τ = 14.8%.
(f) 2D standard cell DP:
WL=5.42e7, #VI=9.10e3,
τ = 0%.
Figure 5: Post-placement on MMS ADAPTEC1
with three tiers. Standard cells, macros and fillers
are denoted by red dots, blue rectangles and cyan
dots. Om denotes the total macro overlap.
Notice that the focus of this work is the algorithm frame-
work of 3D placement, not the accurate weight modeling
of vertical connects. The weighting factor can be adjusted
by VLSI designers for their particular needs, e.g., vertical
Table 1: HPWL (e7), #VI (vertical interconnect) (e3) and runtime (minutes) on the IBM-PLACE benchmark
suite [13]. Cited results are marked with ∗. All the experiments are conducted under single-thread mode.
The results are evaluated by the same scripts and normalized to ePlace-3D. The best result for each case is
in bold-face.
Categories NTUplace3-3D [10] mPL6-3D∗ [27] ePlace-3D
Benchmarks # Cells # Nets HPWL #VI CPU HPWL #VI CPU HPWL #VI CPU
IBM01 12K 12K 0.34 0.69 0.20 0.26 1.04 2.95 0.25 1.31 0.58
IBM03 22K 22K 0.76 3.32 0.50 0.59 3.11 4.72 0.56 3.27 1.33
IBM04 27K 26K 1.00 2.60 0.60 0.81 2.95 6.41 0.74 3.53 1.88
IBM06 32K 33K 1.30 3.99 0.80 1.05 3.97 6.20 0.92 4.50 2.98
IBM07 45K 44K 1.92 5.73 1.30 1.59 4.68 8.64 1.50 4.39 3.87
IBM08 51K 48K 2.08 4.90 1.70 1.71 3.94 11.23 1.54 4.90 4.75
IBM09 52K 50K 1.92 3.88 1.50 1.45 3.24 14.61 1.40 3.18 5.63
IBM13 82K 84K 3.69 3.98 2.60 2.88 5.59 19.62 2.67 4.73 8.65
IBM15 158K 161K 9.16 15.67 7.20 6.79 10.52 46.82 6.39 9.16 40.25
IBM18 210K 201K 13.41 12.19 13.60 9.16 15.22 52.09 9.47 6.83 63.07
Avg. 69K 68K 37.15% 10.27% 0.30× 6.44% 9.11% 2.55× 0.00% 0.00% 1.00×
connects of different electric and physical attributes (TSVs,
MIVs, super contacts, etc.).
We conduct experiments on IBM-PLACE [13] standard-
cell benchmarks without macros or blockages, all of which
are derived from real IC design. We include two state-
of-the-art 3D-IC placers, mPL6-3D [27]2, and NTUplace3-
3D [10], in our experiments on IBM-PLACE. As other cat-
egories of algorithms (e.g., folding and partition based ap-
proaches) have been outperformed by analytic placement in
literature, we do not include them in our experiments. We
have obtained the binary of NTUplace3-3D from the original
authors and executed it on our machine for experiments3.
mPL6-3D is not available (as notified by the author), so
we cite the performance from their latest publication [27].
We use exactly the same benchmark transformation as that
by mPL6-3D and NTUplace3-3D. I.e., we insert four sili-
con tiers into each benchmark, scale down each tier to 1
4
of
the original 2D placement area, add 10% whitespace to each
tier, and keep the aspect ratio of each tier to be the same as
the original 2D design. As a result, all the experiments on
the three placers, including those from [27], are conducted
on exactly the same IBM-PLACE-3D benchmarks. As
HPWL and #VI are being computed in exactly the same
way, the performance comparison among the three placers
are fair. The results on IBM-PLACE cases are shown in Ta-
ble 1. On average of all the ten circuits, ePlace-3D outper-
forms mPL6-3D and NTUplace3-3D with 6.44% and 37.15%
shorter wirelength together with 9.11% and 10.27% fewer
VIs. ePlace-3D runs 2.55× faster than mPL6-3D but 0.30×
slower than NTUplace3-3D, nevertheless, the improvement
on wirelength (37.15%) and VI (10.27%) is significant.
To validate the scalability of ePlace-3D, we also conduct
experiments on the large-scalemodern mixed-size (MMS)
benchmarks [43] with on average 829K and up to 2.5M
netlist objects. MMS benchmarks was first published in
DAC 2009. The circuits inherit the same netlists and den-
2Although mPL6-3D has extension to thermal-aware place-
ment, its experiments on the IBM-PLACE cases are based
on their original prototype driven by only wirelength and
density but not thermal.
3There is a small quality gap on NTUplace3-3D between our
local experiment results and that published in [10], which
may be due to the differences in computing platforms.
sity constraints ρt from ISPD 2005 [33] and ISPD 2006 [32]
benchmarks but have all the macros freed to place. The orig-
inal planar placement domain is geometrically transformed
to be of 2, 3 and 4 silicon tiers, each tier is equally down-
sized to keep both the aspect ratio and total silicon area un-
changed. All the standard cells and macros keep their origi-
nal dimensions and span only one tier. MMS circuits have all
their fixed objects with zero area (volume) and outside the
placement boundaries, and we geometrically transform them
to the boundary of the bottom (first) tier. Also, as macros
are all free to move, we skip the geometrical transforma-
tion of the fixed macro layout from 2D to 3D, which is sub-
optimal and usually causes quality loss. Similar to mPL6-
3D [27] and NTUplace3-3D [10], we add 10% extra whites-
pace to each tier, in order to relieve the placement dilemma
due to the increased area ratio between large macros and sil-
icon tiers4. There are benchmark-dependent target density
ρt for eight out of the sixteen MMS circuits. Detailed circuit
statistics can be found in Table 1 of [43]. We create eval-
uation scripts to compute the total wirelength, number of
vertical interconnects, and legality of the produced 3D-IC
placement solution. The results on the MMS benchmarks
are shown in Table 2. Notice that here HPWL is the orig-
inal half-perimeter wirelength. It is not penalized by the
amount of density overflow, since the density overflow in 3D
domain is of one more dimension thus hard to compare with
that of 2D domain. The binary of NTUplace3-3D does not
work with these benchmarks, while the binary of mPL6-3D
is not available for use. As a result, we compare the 3D
MMS placement solutions with the best published (golden)
2D results in literature [26]. By using two, three and four
tiers, ePlace-3D outperforms the golden 2D placement with
on average 13.67%, 20.50% and 27.54% shorter wirelength.
On the other side, the average ratio between the number of
vertical interconnect units versus the number of placement
objects (standard cells and macros) are only 2.17%, 4.30%
and 6.10%, respectively. These vertical connect ratios are
much smaller than the average VI ratio on IBM-PLACE,
which are more than 9% for all the three placers in Table 1.
4 BIGBLUE3, NEWBLUE2 and NEWBLUE3 have very
large macros. For the tier insertion of two, three and four,
we add 20%, 30% and 40% whitespace to each tier to make
sure that the largest macro can be accommodated.
Table 2: HPWL (e6), #VI (vertical interconnect) and runtime (mins) on MMS circuits. Cited results are
marked with ∗. All the experiments are in single-thread mode. The HPWL and CPU results are normalized
to the best published 2D placement results [26], #VI are normalized to # objects.
# tiers ePlace-MS∗ [26] ePlace-3D w/ 2 tiers ePlace-3D w/ 3 tiers ePlace-3D w/ 4 tiers
Benchmarks # Objs # Nets HPWL CPU HPWL #VI CPU HPWL #VI CPU HPWL #VI CPU
ADAPTEC1 211K 221K 67.15 5.47 59.51 5733 24.63 54.19 9104 14.65 51.3 13568 16.03
ADAPTEC2 255K 266K 77.37 7.43 73.97 9269 39.67 75.38 9929 25.18 59.97 18085 24.57
ADAPTEC3 451K 466K 164.50 27.23 141.97 5557 95.48 136.85 18203 88.55 120.29 28694 94.42
ADAPTEC4 496K 515K 148.38 29.35 126.94 8149 107.15 113.22 13811 121.40 106.34 14527 118.13
BIGBLUE1 278K 284K 86.82 7.82 76.06 8272 40.63 71.34 10508 36.17 63.64 19403 38.05
BIGBLUE2 557K 577K 130.18 13.70 109.27 2565 70.25 97.1 5347 63.58 90.14 9241 64.95
BIGBLUE3 1096K 1123K 302.29 72.98 251.77 24466 268.47 271.27 42053 291.38 295.38 62669 388.08
BIGBLUE4 2177K 2229K 657.92 204.15 577.98 21263 491.97 537.2 50552 563.98 500.25 113590 420.17
ADAPTEC5 843K 867K 310.54 48.35 258.18 22705 170.90 244.57 27764 146.22 223.44 50732 149.22
NEWBLUE1 330K 338K 61.85 10.87 56.36 5901 28.15 53.05 7295 24.08 48.85 12346 25.07
NEWBLUE2 441K 465K 162.93 62.40 179.82 25571 67.27 143.5 43642 77.20 169.78 53487 72.98
NEWBLUE3 494K 552K 304.15 17.53 240.47 7686 308.62 365.10 48979 410.73 397.46 51597 265.67
NEWBLUE4 646K 637K 228.54 29.73 197.21 11372 110.02 177.82 29767 112.80 171.21 35067 101.78
NEWBLUE5 1233K 1284K 392.27 63.40 344.95 45995 202.12 303.05 64336 195.52 280.42 95768 216.22
NEWBLUE6 1255K 1288K 408.36 69.65 379.59 10901 222.72 325.35 50487 194.57 298.82 66983 180.88
NEWBLUE7 2507K 2636K 894.31 191.47 814.79 18615 363.30 696.27 92943 375.65 670.51 111562 353.92
Avg. 829K 859K 0.00% 1.00× 13.67% 2.17% 3.13× 20.50% 4.30% 3.03× 27.54% 6.10% 2.94×
Due to the introduction of the third dimension, the search
space of placement optimization is substantially enlarged.
However, the runtime increase is just 3×, which indicates
high efficiency of ePlace-3D.
We also study the trends of HPWL and #VI by linearly
sweeping the number of tiers and exponentially sweeping
the VI weight. We select eight out of the sixteen MMS
benchmarks (ADAPTEC1, ADAPTEC4, BIGBLUE1, BIG-
BLUE2, BIGBLUE3, BIGBLUE4, NEWBLUE6, NEWBLUE7),
all of which could accommodate the maximum macro block
after inserting ten tiers. Keeping the same aspect ratio, the
area of each tier is scaled down by ten times with the inser-
tion of 10% extra whitespace. Figure 6 shows that ePlace-3D
is able to reduce the total 2D wirelength by up to 40% (with
the insertion of up to ten tiers), while #VI is roughly scaled
up by the number of tiers. VI weight sweeping is conducted
!
!"
#"
$"
%"
&"
'!"
(()"
%()"
*()"
&()"
+()"
'!()"
'" #" ," $" (" %" *" &" +" '!"
-
".
/
0
1"
23
24
5"
6
7
8
9"
:;
<
=
>"
-"?@A1"
6789"
-./01"
Figure 6: Avg. HPWL and #VI of eight selected
MMS cases w.r.t. number of silicon tiers.
on all the sixteen MMS benchmarks. Figure 7 shows the
trends of average HPWL and #VI by dividing the normal
VI weight by up to 32 times (i.e. × 0.03125). The total
2D HPWL saturates at the reduction of 7%, while #VI is
scaled up by roughtly 2.5×.
Our 3D-IC placement algorithm shows significant quality
improvement while limited runtime overhead. BIGBLUE4
and NEWBLUE7 are the largest circuits with 2.2M and
2.5M cells, and they consume the longest runtime on the 3D-
!
!"#$%
&"'$%
&"#$%
'"'$%
'"#$%
(')%
(*)%
(+)%
(,)%
&!!)%
&% !"$% !"'$% !"&'$% !"!+'$% !"!-&'$%
.
%/
0
1
2%
34
35
6%
7
8
9
:%
;<
=
>
?%
/01%@ABCD3%;@E%*%FAG2?%
789:%
./012%
Figure 7: Avg. HPWL and #VI of all the sixteen
MMS cases w.r.t. VI weights.
IC placement. However, compared to the respective golden
2D placement solutions, the runtime ratio is upper-bounded
by 2.5×, which is still less than the average runtime ratios
of 3.13× for two tiers, 3.03× for three tiers and 2.94× for
four tiers, respectively. To this end, ePlace-3D shows good
scalability and acceptable efficiency on the large cases.
In this work, we do not test ePlace-3D on circuits with
fixed macros, as geometrically transforming the 2D floor-
plan into 3D is difficult and usually error-prone. However,
ePlace-3D shows high performance and scalability on MMS
benchmarks with lots of movable large macros, which is more
difficult to place than fixed-macro layouts. As a result, we
are confident on the performance of ePlace-3D on any cir-
cuits with fixed macros. The advantage of 3D tier insertion
vanishes if there are large macros to accommodate (BIG-
BLUE3, NEWBLUE3, etc.). Transformation of 2D planar
macros into 3D cuboid macros would resolve this issue and
ensure the consistent benefits by inserting more tiers. How-
ever, it is beyond this work and will be covered in future.
6. CONCLUSION
We propose the first electrostatics based placement algo-
rithm ePlace-3D, which is effective and efficient for 3D-ICs
with uniform exploration over the entire 3D space. Our 3D-
IC density function leverages the analogy between place-
ment spreading and electrostatic equilibrium, while global
and uniform smoothness is realized at all the three dimen-
sions. Our balancing and preconditioning techniques pre-
vent solution oscillation or divergence. The interleaved 3D
coarse-grained optimization followed by 2D fine-grained post
processing obtains a good trade-off between quality and ef-
ficiency. The experimental results validate the high perfor-
mance and scalability of our approach, indicating the bene-
fits of placement smoothness. In future, we will develop 3D
density function to address the volume of vertical intercon-
nects (VI). We would also like to explore advanced technol-
ogy for 3D-IC placement/routing with patterning and graph
coloring technology [21].
7. ACKNOWLEDGMENT
The authors acknowledge (1) Prof. Dae Hyun Kim and
Prof. Sung Kyu Lim for providing the 3D-IC flow scripts
and IWLS testcases (2) Dr. Meng-Kai Hsu and Prof. Yao-
Wen Chang for providing the binary of NTUplace3-3D (3)
Dr. Guojie Luo and Prof. Jason Cong for providing the
binary of mPL6-3D (4) the support of NSF CCF-1017864.
8. REFERENCES
[1] T. F. Chan, J. Cong, J. R. Shinnerl, K. Sze, and M. Xie. mPL6:
Enhanced Multilevel Mixed-Size Placement. In ISPD, pages
212–214, 2006.
[2] T.-C. Chen, Z.-W. Jiang, T.-C. Hsu, H.-C. Chen, and Y.-W.
Chang. NTUPlace3: An Analytical Placer for Large-Scale
Mixed-Size Designs with Preplaced Blocks and Density
Constraint. IEEE TCAD, 27(7):1228–1240, 2008.
[3] Y.-G. Chen, T. Wang, K.-Y. Lai, W.-Y. Wen, Y. Shi, and S.-C.
Chang. Critical Path Monitor Enabled Dynamic Voltage
Scaling for Graceful Degradation in Sub-Threshold Designs. In
DAC, pages 1–6, 2014.
[4] J. Cong, G. Luo, J. Wei, and Y. Zhang. Thermal-Aware 3-D IC
Placement Via Transformation. In ASPDAC, pages 780–785,
2007.
[5] H. Eisenmann and F. M. Johannes. Generic Global Placement
and Floorplanning. In DAC, pages 269–274, 1998.
[6] B. Goplen and S. Sapatnekar. Efficient Thermal Placement of
Standard Cells in 3D ICs using a Force Directed Approach. In
ICCAD, 2003.
[7] B. Goplen and S. Sapatnekar. Placement of 3-D ICs with
Thermal and Interlayer Via Considerations. In DAC, pages
626–631, 2007.
[8] S. K. Han, K. Jeong, A. B. Kahng, and J. Lu. Stability and
Scalability in Global Routing. In SLIP, pages 1–6, 2011.
[9] Q. He, D. Chen, and D. Jiao. From Layout Directly to
Simulation: A First-Principle Guided Circuit Simulator of
Linear Complexity and Its Efficient Parallelization. IEEE
CPMT, 2(4):687–699, 2012.
[10] M.-K. Hsu, V. Balabanov, and Y.-W. Chang. TSV-Aware
Analytical Placement for 3D IC Designs Based on a Novel
Weighted-Average Wirelength Model. IEEE TCAD,
32(4):497–509, 2013.
[11] M.-K. Hsu and Y.-W. Chang. Unified Analytical Global
Placement for Large-Scale Mixed-Size Circuit Designs. IEEE
TCAD, 2012.
[12] P. J. Huber. Robust Statistics. John Wiley and Sons, 1981.
[13] IBM-PLACE. http://er.cs.ucla.edu/benchmarks/ibm-place.
2001.
[14] ITRS. http://www.itrs.net/Links/2012ITRS/Home2012.htm.
2012.
[15] IWLS. http://iwls.org/iwls2005/benchmarks.html. 2005.
[16] M. Jung et al. How to Reduce Power in 3D IC Designs: A Case
Study with OpenSPARC T2 Core. In CICC, 2013.
[17] A. B. Kahng and Q. Wang. A Faster Implementation of
APlace. In ISPD, pages 218–220, 2006.
[18] D. H. Kim, K. Athikulwongse, and S. K. Lim. A Study of
Through-Silicon-Via Impact on the 3-D Stacked IC Layout. In
ICCAD, 2009.
[19] M.-C. Kim and I. Markov. ComPLx: A Competitive
Primal-dual Lagrange Optimization for Global Placement. In
DAC, 2012.
[20] S. Kirkpatrick, C. D. G. Jr., and M. P. Vecchi. Optimization by
Simulated Annealing. Science, 220(4598):671–680, 1983.
[21] W. Lin, M. McGrath, I. Ramzy, T. H. Lai, and D. Lee.
Detecting Job Interference in Large Distributed Multi-Agent
Systems - A Formal Approach. In IEEE IM, 2013.
[22] J. Lu. Analytic VLSI Placement using Electrostatic Analogy.
Ph.D. Dissertation, University of California, San Diego, 2014.
[23] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. Huang, C.-C. Teng,
and C.-K. Cheng. ePlace: Electrostatics based Placement using
Fast Fourier Transform and Nesterov’s Method. ACM
TODAES, 20(2):article 17, 2015.
[24] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. J.-H. Huang, C.-C.
Teng, and C.-K. Cheng. FFTPL: An Analytic Placement
Algorithm Using Fast Fourier Transform for Density
Equalization. In ASICON, pages 1–4, 2013.
[25] J. Lu, P. Chen, C.-C. Chang, L. Sha, D. J.-H. Huang, C.-C.
Teng, and C.-K. Cheng. ePlace: Electrostatics based Placement
using Nesterov’s Method. In DAC, pages 1–6, 2014.
[26] J. Lu, H. Zhuang, P. Chen, H. Chang, C.-C. Chang, Y.-C.
Wong, L. Sha, D. Huang, Y. Luo, C.-C. Teng, and C.-K.
Cheng. ePlace-MS: Electrostatics based Placement for
Mixed-Size Circuits. IEEE TCAD, 34(5):685–698, 2015.
[27] G. Luo, Y. Shi, and J. Cong. An Analytical Placement
Framework for 3-D ICs and Its Extension on Thermal
Awareness. IEEE TCAD, 2013.
[28] P.-W. Luo, T. Wang, C.-L. Wey, L.-C. Cheng, B.-L. Sheu, and
Y. Shi. Reliable Power Delivery System Design for
Three-Dimensional Integrated Circuits (3D ICs). In ISVLSI,
pages 356–361, 2012.
[29] I. L. Markov, J. Hu, and M.-C. Kim. Progress and Challenges
in VLSI Placement Research. In DAC, 2012.
[30] J. Miao, A. Gerstlauer, and M. Orshansky. Approximate Logic
Synthesis under General Error Magnitude and Frequency
Constraints. In ICCAD, pages 779–786, 2013.
[31] J. Miao, A. Gerstlauer, and M. Orshansky. Multi-Level
Approximate Logic Synthesis under General Error Constraints.
In ICCAD, pages 504–510, 2014.
[32] G.-J. Nam. ISPD 2006 Placement Contest: Benchmark Suite
and Results. In ISPD, pages 167–167, 2006.
[33] G.-J. Nam et al. The ISPD2005 Placement Contest and
Benchmark Suite. In ISPD, pages 216–220, 2005.
[34] W. C. Naylor, R. Donelly, and L. Sha. Non-Linear
Optimization System and Method for Wire Length and Delay
Optimization for an Automatic Electric Circuit Placer. In US
Patent 6301693, 2001.
[35] Y. E. Nesterov. A Method of Solving A Convex Programming
Problem with Convergence Rate O(1/k2). Soviet Math,
27(2):372–376, 1983.
[36] T. Ooura. General Purpose FFT Package,
http://www.kurims.kyoto-u.ac.jp/˜ooura/fft.html. 2001.
[37] M. Pan, N. Viswanathan, and C. Chu. An Efficient and
Effective Detailed Placement Algorithm. In ICCAD, pages
48–55, 2005.
[38] C.-W. Sham, E. F.-Y. Young, and J. Lu. Congestion Prediction
in Early Stages of Physical Design. ACM TODAES,
14(1):12:1–18, 2009.
[39] G. Skollermo. A Fourier Method for the Numerical Solution of
Poisson’s Equation. Mathematics of Computation,
29(131):697–711, 1975.
[40] P. Spindler, U. Schlichtmann, and F. M. Johannes. Kraftwerk2
- A Fast Force-Directed Quadratic Placement Approach Using
an Accurate Net Model. IEEE TCAD, 27(8):1398–1411, 2008.
[41] N. Viswanathan, M. Pan, and C. Chu. FastPlace3.0: A Fast
Multilevel Quadratic Placement Algorithm with Placement
Congestion Control. In ASPDAC, pages 135–140, 2007.
[42] T. Wang, C. Zhang, J. Xiong, and Y. Shi. Eagle-Eye: A
Near-Optimal Statistical Framework for Noise Sensor
Placement. In ICCAD, pages 437–443, 2013.
[43] J. Z. Yan, N. Viswanathan, and C. Chu. Handling Complexities
in Modern Large-Scale Mixed-Size Placement. In DAC, 2009.
[44] X. Zhang, J. Lu, Y. Liu, and C.-K. Cheng. Worst-Case Noise
Area Prediction of On-Chip Power Distribution Network. In
SLIP, pages 1–8, 2014.
[45] H. Zhuang, J. Lu, K. Samadi, Y. Du, and C.-K. Cheng.
Performance-Driven Placement for Design of Rotation and
Right Arithmetic Shifters in Monolithic 3D ICs. In ICCCAS,
pages 509–513, 2013.
