Effects of Process Variations on 3-D Global Clock Distribution Networks by Xu, Hu et al.
20
Effect of Process Variations in 3D Global Clock Distribution Networks
HU XU, VASILIS F. PAVLIDIS, and GIOVANNI DE MICHELI, Integrated Systems Lab - EPFL
In three-dimensional (3D) integrated circuits, the effect of process variations on clock skew differs from
2D circuits. The combined effect of inter-die and intra-die process variations on the design of 3D clock
distribution networks is considered in this article. A statistical clock skew model incorporating both the
systematic and random components of process variations is employed to describe this effect. Two regular 3D
clock tree topologies are investigated and compared in terms of clock skew variation. The statistical skew
model used to describe clock skew variations is verified through Monte-Carlo simulations. The clock skew
is shown to change in different ways with the number of planes forming the 3D IC and the clock network
architecture. Simulations based on a 45-nm CMOS technology show that the maximum standard deviation
of clock skew can vary from 15 ps to 77 ps. Results indicate that simply increasing the number of planes of
a 3D IC does not necessarily lead to lower skew variation and higher operating frequencies. A multigroup
3D clock tree topology is proposed to effectively mitigate the variability of clock skew. Tradeoffs between the
investigated 3D clock distribution networks and the number of planes comprising a 3D circuit are discussed
and related design guidelines are offered. The skew variation in 3D clock trees is also compared with the
skew variation of clock grids.
Categories and Subject Descriptors: B.7.1 [Hardware]: Integrated Circuits—VLSI (very large scale
integration)
General Terms: Performance, Reliability
Additional Key Words and Phrases: Clock distribution network, clock skew, process variations, 3D ICs
ACM Reference Format:
Xu, H., Pavlidis, V. F., and De Micheli, G. 2012. Effect of process variations in 3D global clock distribution
networks. ACM J. Emerg. Technol. Comput. Syst. 8, 3, Article 20 (August 2012), 25 pages.
DOI = 10.1145/2287696.2287703 http://doi.acm.org/10.1145/2287696.2287703
1. INTRODUCTION
Clock skew is, typically, defined as the difference between the propagation delays of
the clock signal from the source to the sinks of the clock distribution network. High
clock frequencies severely constrain the clock skew budget, so that a sufficient fraction
of the clock period can be dedicated to data processing.
3D integration emerges as a potent solution to alleviate the increasing interconnect
delay in modern ICs [Pavlidis and Friedman 2009a, 2009b]. Multiplane circuits consid-
erably reduce the interconnect length because of the vertical interconnections [Joyner
et al. 2001]. Considering the important synchronization issue, the reduced interconnect
latency can be exploited to either relax the clock skew constraints or further increase
the speed of a circuit. A careful compromise between the two approaches can, never-
theless, be a better solution to avoid timing failures while delivering higher (yet not
This work is funded in part by the Swiss National Science Foundation (no. 260021 126517/1), European
Research Council grant (no. 246810 NANOSYS), and Intel Braunschweig Labs, Germany.
Authors’ addresses: H. Xu (corresponding author), V. F. Pavlidis, and G. De Micheli, Integrated Systems Lab,
EPFL, Switzerland; email: hughesxuh@gmail.com.
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted
without fee provided that copies are not made or distributed for profit or commercial advantage and that
copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for
components of this work owned by others than ACM must be honored. Abstracting with credit is permitted.
To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this
work in other works requires prior specific permission and/or a fee. Permissions may be requested from
Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212)
869-0481, or permissions@acm.org.
c© 2012 ACM 1550-4832/2012/08-ART20 $15.00
DOI 10.1145/2287696.2287703 http://doi.acm.org/10.1145/2287696.2287703
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:2 H. Xu et al.
the highest) performance as compared to a 2D circuit. The operating frequency of a
3D circuit is considered to increase proportionally with the number of planes forming
the circuit [Joyner et al. 2001, 2004]. The behavior of the clock skew in such diverse
systems, however, is not considered in these analyses.
Clock skew is introduced at the design and fabrication stages and during the op-
eration of ICs. There is a plethora of methods to manage the excessive clock skew in
the design phase [Bakoglu et al. 1986; Cong et al. 1998; Friedman 2001]. Note that
these techniques are developed for 2D circuits and are not directly applicable to 3D
ICs. Careful physical design, however, does not guarantee the elimination of unde-
sirable skew since the unwanted skew can be introduced in the fabrication phase.
The primary sources of process variations include fluctuations of the gate length, dop-
ing concentrations, oxide thickness, and Inter-Layer Dielectric (ILD) thickness [Nassif
2000; Zarkesh-Ha et al. 1999].
The resulting process variations are generally divided into inter-die and intra-
die variations. Inter-die variations affect the characteristics of devices independently
among dice, but the devices within one die are uniformly affected. Intra-die variations
affect the characteristics of devices unequally within one die. Intra-die variations in-
clude systematic and random components [Orshansky et al. 2000; Bowman et al. 2002,
2009]. Since both of intra- and inter-die process variations are present in a 3D IC, the
inclusion of these process variations is required in the analysis and design of 3D clock
distribution networks.
The focus of this article is on the effect of process variations on the clock skew of
potential 3D clock distribution architectures. The case studies include clock distribution
networks that globally distribute the clock signal in a 3D stack [Mondal et al. 2007;
Pavlidis et al. 2008; Arunachalam and Burleson 2008] and a new 3D topology. The
proposed model can also be used to analyze synthesized 3D clock trees [Zhao and
Lim 2010; Kim and Kim 2010; Yang et al. 2011]. The resulting skew variations in
synthesized clock trees implicitly depend on the efficiency of the synthesis technique.
Since the intention is to investigate the effect of process variations rather than the
efficiency of a 3D clock tree synthesizer, such as in Zhao and Lim [2010], Kim and Kim
[2010] and Yang et al. [2011], regular structures such as H-trees are explored.
The considered 3D technologies include the manufacturing processes where multiple
physical planes bondedwith different approaches are electrically connected byThrough
Silicon Vias (TSVs) [Katti et al. 2010]. In these 3D ICs, the clock distribution networks
can span more than one plane, where each plane is fabricated separately. The variation
of TSVs is negligible due to the small dimensions of TSVs as compared with the wave
length used by the manufacturing equipement [Reda et al. 2009]. The impact of TSV
stress on clock skew is discussed in Yang et al. [2011]. This impact on global clock
distribution networks is assumed to be mitigated by proper keep out zones around
each TSV in this article.
Simulation results indicate that 3D integration can decrease skew variation as com-
pared to planar circuits. Nevertheless, this reduction depends on the employed 3D clock
tree topology and the number of planes. Consequently, the objectives of this article are:
(1) to accurately model skew variation for 3D clock trees, (2) to determine the behavior
of the skew variation within 3D clock H-trees, (3) to propose an enhanced 3D clock tree
topology, (4) and to provide a set of design guidelines to limit the variability of clock
skew within 3D clock distribution networks.
The remainder of the article is organized as follows. Existing techniques for skew
analysis in 2D circuits are briefly reviewed in the following section. A statistical model
of the delay of critical paths in 3D ICs is also discussed. The problem of skew analysis
for 3D clock distribution networks is formulated in Section 3. An accurate clock skew
model for 3D clock trees considering the impact of process variations is described in
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:3
Section 4. This model can be extended to include the variation of interconnects, which
is presented as an appendix. Simulation results and a comparison of two regular 3D H-
tree topologies are presented in Section 5. A new 3D topology is described in Section 6.
A comparison of skew variations between the clock trees and a typical 2D clock grid is
presented in Section 7. Some conclusions are drawn in Section 8.
2. RELATED WORK
Several techniques for analyzing the effect of process variation on clock skew have been
developed for 2D circuits emphasizing intra-die variations. Some of these techniques
are briefly summarized in this section.
For 2D ICs, clock skew variations are estimated either by corner or statistical analysis
[Jiang and Horiguchi 2001]. Corner analysis is useful for estimating process variations
for 2D ICs but this method often introduces pessimism in the timing of a circuit. A
method for statistical clock skew analysis based on Monte Carlo simulations is intro-
duced in Malavasi et al. [2002]. The computational time of this method is, however,
prohibitively high for large-scale ICs. Based on statistical timing analysis [Liou et al.
2001], other statistical skew modeling methods considering intra-die variations are
presented in Jiang and Horiguchi [2001], Harris and Naffziger [2001], Agarwal et al.
[2003, 2004] and Sundareswaran et al. [2008] to efficiently analyze skew variations.
Although statistical skew analysis has been studied in 2D ICs, the resulting methods
cannot be directly applied to 3D systems. In 2D ICs, since the inter-die process vari-
ations uniformly affect the devices within a circuit, the majority of the skew analysis
methods emphasize the intra-die variations.
Recent work analyzing the effect of process variations on the performance of 3D ICs
is presented in Garg and Marculescu [2009a, 2009b] and Reda et al. [2009], where the
impact of process variations on the delay of critical paths is studied. The distribution of
the maximum delay of datapaths is determined by considering that the datapaths have
no common segments. Since the clock paths to different sinks can (actually are designed
to) have common segments, these models are not directly applicable to determine
the skew of clock distribution networks. The effect of process variations in 3D clock
distribution networks is discussed in Xu et al. [2010], where the intra-die variation is
considered independent among the devices within the same plane. Nevertheless, the
spatial correlation of intra-die variations is shown to be nonnegligible when analyzing
the process-induced skew in clock trees [Orshansky et al. 2000; Agarwal et al. 2003;
Hashimoto et al. 2005].
3. PROBLEM DEFINITION
The problem of skew analysis for 3D clock distribution networks considering process
variations is formulated in this section. As discussed in Friedman [2001], only the
clock skew between the sequential elements which transfer data between each other
(data-related or “sequentially-adjacent” registers) affects the performance of a circuit.
Consequently, in addition to global skew, appropriate pairwise skew distributions are
adopted to evaluate the performance of clock distribution networks [Sundareswaran
et al. 2008].
H-tree is a common topology used to globally distribute the clock signal within a
circuit [Bakoglu et al. 1986; Friedman 2001]. A typical buffered 3DH-tree is illustrated
in Figure 1 [Pavlidis et al. 2008; Mondal et al. 2007]. The pairwise clock skew is
defined as the skew between every pair of sinks in 3D clock distribution networks,
Sskew = {si, j |si, j = Di − Dj,1 ≤ i, j ≤ nsink}. Sinks i and j can be located in any plane of
the 3D circuit and si, j denotes the skew between sinks i and j. The clock delay to sinks
i and j is denoted by Di and Dj , respectively. The number of clock sinks is nsink.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:4 H. Xu et al.
1st plane
2nd plane
3rd plane
4th plane
1st plane
2nd plane
3rd plane
4th plane
Clock 
source
1
2
3
4
5
6
7
8
13
14
15
16
9
10
11
12
17
18
19
20
21
22
23
24
29
30
31
32
25
26
27
28
49
50
51
52
53
54
55
56
61
62
63
64
57
58
59
60
33
34
35
36
37
38
39
40
45
46
47
48
41
42
43
44
(a)
0
5
10
0
2
4
6
8
10
1
2
3
4
X [mm]Y [mm]
Ind
ex
 of
 pl
an
e
TSV
source
sink 1 sink 2
sink 3
sink 4
sink 5
(b)
Fig. 1. 3D H-trees spanning four planes, where (a) is the topology of a 3D H-tree and (b) is the 3D view of a
3D H-tree.
As discussed in Section 5.2, the number of buffers and the length of the interconnects
in clock trees significantly affect the distribution of clock skew. The area and number of
the physical planes comprising a 3D IC, consequently, affect the highest clock frequency
that can be supported by a 3D IC. By investigating the effect of process variations on
Sskew, several guidelines for the design of 3D global clock trees with a low Sskew are
offered.
4. CLOCK SKEW MODELING FOR 3D CLOCK TREES
The distribution of clock skew considering inter- and intra-die process variations in 3D
clock trees is modeled in this section. The presented model used to obtain the distribu-
tion of buffer delay is presented in Section 4.1. Utilizing this model, the distribution of
the delay of a 3D clock path is analytically derived in Sections 4.2 and 4.3. In Section
4.4, the distribution of the clock skew between each pair of clock sinks in 3D clock trees
is discussed.
4.1. Buffer Delay Distribution
The variation of buffer delay originates from the device parameters deviating from their
nominal values, as statistically modeled in Zarkesh-Ha et al. [1999] and Azuma et al.
[1998]. The fluctuation of the buffer delay is typically approximated as being linear
in the device parameter variations [Sundareswaran et al. 2008; Devgan and Kashyap
2003]. Alternatively, the variation in delay can be determined through the variations of
the input capacitance and output resistance [Jiang and Horiguchi 2001; Xu et al. 2010].
The second method is enhanced by considering the input slew rate to more accurately
model the distribution of the buffer delay. The interconnects constituting the 3D circuit
are modeled as distributed RC wires. The circuit illustrated in Figure 2 is utilized to
obtain the variation of buffer delay for different slew rates of the input signal.
Let Rin denote the output resistance of a buffer driving the buffer under considera-
tion. The load capacitance of the buffer under consideration is denoted by Cl. Intercon-
nects with diverse impedance characteristics are modeled by employing different Rint
and Cint, where Rint and Cint denote the resistance and capacitance of interconnects,
respectively. The interconnect Rint and Cint can also be adjusted to produce different
slew rates for the input signal of the buffer in Figure 2.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:5
Rin
Cl
Cint
Rint
I OS
{Cb, Db, Rb}
Fig. 2. An elemental circuit used to measure the variations in the buffer characteristics.
For a step input signal, the Elmore delay [Elmore 1948] from source S to nodes I
and O in Figure 2, respectively, is
DSI = 0.69RinCint + 0.38RintCint + 0.69(Rin + Rint)Cb, (1)
DSI = 0.69(Rin + Rint)Cb, (2)
DSO = DSI + Db + 0.69RbCl, (3)
DSO = DSI + Db + 0.69ClRb, (4)
where Cb, Rb, and Db are the input capacitance, the output resistance, and the intrinsic
delay of the buffer, respectively. The variations of Cb, Rb, and Db are denoted by Cb,
Rb, and Db, respectively. While investigating the buffer as shown in Figure 2, the
presumed Rin is considered constant (for the moment).
By measuring the delay variation at nodes I and O and setting Cl to zero (corre-
sponding to DSO0 ) and another value (e.g., 200 fF, corresponding to DSO1 ) in Monte
Carlo simulations, the mean value and standard deviation of Cb, Rb, and Db are
obtained by (2) and (4) [Xu et al. 2010]. Assuming the sources of the process variations
can be described by Gaussian distribution, the characteristics of a buffer can also be
approximated as Gaussian distribution [Chang and Sapatnekar 2005]
Cb ∼ N (0, σ 2Cb ), Rb ∼ N (0, σ 2Rb ), Db ∼ N (0, σ 2Db ). (5)
σCb is directly obtained through (2). According to (2) and (4), σDSO is determined by σCb ,
σDb , σRb , and the covariance among these variables
σ 2DSO = (0.69(Rin + Rint)σCb )2 + σ 2Db + (0.69σRbCl)2 + 1.38(Rin + Rint)cov(Db,Cb)
+ 1.38Clcov(Db, Rb) + 0.952Cl(Rin + Rint)cov(Cb, Rb), (6)
σ 2Db = σ 2DSO0 − σ
2
DSI − 1.38(Rin + Rint)cov(Db,Cb), (7)
σCb =
σDSI
0.69(Rin + Rint) . (8)
σRb is obtained from (6) by substituting σDb and σCb from (7) and (8), respectively.
Consider that Cb, Rb, and Db are used to obtain the delay variation of each
buffer stage di, which is similar to DSO. When calculating σdi (similar to recalcu-
lating σDSO1 through (6)), the σCb , σRb , and σDb are substituted into (6) again. In this
procedure, the covariances cov(Db, Rb), cov(Db,Cb) and cov(Cb, Rb) are considerably
canceled out. Consequently, the correlation among Cb, Rb, and Db does not affect
di significantly, as long as di is calculated based on this same correlation. Since
Cb, Rb, and Db are due to the same process variation sources, these variables are
assumed to be fully correlated herein.
4.2. Delay Distribution of a 3D Clock Path
The buffer delaymodel described in Section 4.1 is used to produce the delay distribution
of clock paths in 3D ICs. An example of a 3D clock path is illustrated in Figure 3. Note
that this path is general and can be applied to any 3D clock tree in addition to the 3D
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:6 H. Xu et al.
ii-1 i+1
{Cb(i-1),Db(i-1),Rb(i-1)}
Rint, Cint
RTSV, CTSV Rint, Cint
j
1st plane 2nd planeTSV
{Cb(i),Db(i),Rb(i)} {Cb(i+1),Db(i+1),Rb(i+1)}
{Cb(j),Db(j),Rb(j)}
Fig. 3. The electrical model of a segment of a clock path.
topologies investigated herein. The devices in different physical planes are connected
by TSVs [Katti et al. 2010], which, in turn, are modeled as RC wires of different
resistance and capacitance as compared to the horizontal wires (e.g., RTSV and CTSV in
Figure 3). RTSV and CTSV are considered fixed.
Consider the clock path consisting of buffers i − 1, i, and i + 1. From (2) and (4), the
delay variation di attributed to the variation of buffer i along the investigated path
is
di = 0.69(R′in(i) + Rb(i−1))Cb(i) + 0.69Rb(i)(C ′l(i) + Cb(i+1) + Cb( j))
+ 0.69R′b(i)Cb( j) + Db(i), (9)
Rin(i) = Rb(i−1) + RTSV, (10)
Cl(i) = 2Cint + Cb(i+1) + Cb( j), (11)
where the prime (′) denotes the nominal value. For buffer i, theRb(i−1) of the upstream
buffer and Cb(i+1) of the downstream buffer are both included in (9). To determine
the delay of a clock path, di for all the buffers along this path is summed up. In this
case,Rb(i−1)Cb(i) andRb(i)Cb(i+1) are duplicated. Therefore, one of these two terms
needs to be removed. Consequently, di is rewritten as
di = 0.69
(
R′in(i)Cb(i) + Rb(i)(C ′l(i) + Cb(i+1) + Cb( j)) + R′b(i)Cb( j)
)+ Db(i)
= 0.69 (R′in(i)Cb(i) + R′b(i)Cb( j))+ Db(i) + δi, (12)
where δi = 0.69Rb(i)(C ′l(i) + Cb(i+1) + Cb( j)).
The variation ofCb is relatively low as compared with the nominalCb (σ/μ < 3% for
both D2D and WID variations as reported in Table II). The observed delay variation
of buffers in other works is also typically much lower than the nominal value (e.g.,
σ/μ ≤ 5% for both D2D and WID variations as reported in Bowman et al. [2009] ). δi
can, therefore, be approximated linearly using a first-order Taylor expansion around
zero [Chang and Sapatnekar 2005].
δi ≈
[
∂δi
∂Rb(i)
]
0
Rb(i) +
[
∂δi
∂Cb(i+1)
]
0
Cb(i+1) +
[
∂δi
∂Cb( j)
]
0
Cb( j)
= 0.69C ′l(i)Rb(i) (13)
As reported in Table II and discussed in Bowman et al. [2009] and Garg and
Marculescu [2009a, 2009b], the σ/μ of the transistor characteristics is typically con-
sidered ≤5%. The 3σ variation is smaller than 15% of the nominal value for Rb and
10% for Cb. Since Cb and Rb are modeled by Gaussian distributions, for more than
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:7
99.7% buffers, CbRb is lower than 1.5%CbRb. Moreover, from the nominal value and
standard deviation of Cb, Rb, and Db reported in Table II, 0.69CbRb and 0.69CbRb
are much lower than Db and Db, respectively. Consequently, approximating δi with
(13) does not introduce significant loss of accuracy.
As mentioned previously, Rb(i), Cb(i), and Db(i) are approximated as Gaussian
distributions and can be assumed to be fully correlated. According to (12) and (13), di
can be approximated as a Gaussian distribution.
di ∼ N
(
0, σ 2dD2Di + σ
2
dWIDi
)
(14)
σ 2dD2Di
=
{
(σ1 + σ2 + σ3)2 + σ 24 if buffers i and j are in different planes
(σ1 + σ2 + σ3 + σ4)2 if buffers i and j are in the same plane (15)
σ 2dWIDi
= (σ5 + σ6 + σ7)2 + σ 28 + 2corr(i, j)(σ5 + σ6 + σ7)σ8 (16)
σ1 = 0.69R′in(i)σCD2Db(i) , σ2 = 0.69C
′
l(i)σRD2Db(i)
, σ3 = σDD2Db(i) , σ4 = 0.69R
′
b(i)σCD2Db( j)
,
σ5 = 0.69R′in(i)σCWIDb(i) , σ6 = 0.69C
′
l(i)σRWIDb(i)
, σ7 = σDWIDb(i) , σ8 = 0.69R
′
b(i)σCWIDb( j)
The correlation between buffers i and j is denoted by corr(i, j), the model of which is
discussed in Section 4.3.
Consequently, for a 3D clock path to a sink u which includes nu clock buffers, the
variation of the delay is expressed as the summation of (12) applied to each buffer
along the path. The variance of the distribution of a 3D clock path is a Gaussian
distribution consisting of the WID and D2D variations of the buffers.
Du =
nu∑
i=1
di (17)
Du ∼ N
(
0, σ 2DD2Du + σ
2
DWIDu
)
(18)
The D2D and WID sources of delay variation along a 3D clock path are, respectively,
discussed in the following subsections.
4.2.1. D2D Variation Model for the Delay of 3D Clock Paths. The variation of the delay of 3D
clock paths due to the D2D process variations is the sum of the D2D variations of the
buffer delay in all the planes
DD2Du =
Np∑
j=1
DD2Du( j) , (19)
DD2Du( j) =
nu( j)∑
i=1
DD2Du( j,i), (20)
where Np is the number of the planes that the clock tree spans. DD2Du( j) is the variation
of the delay of the clock path from the clock source to sink u in plane j. The number of
buffers located in plane j along this clock path is denoted by nu( j). The variation of the
delay related to the ith buffer in plane j is denoted by Du( j,i).
Since the D2D variations affect the buffers in the same plane equally, according to
(14), (15), and (19), the distribution of DD2Du( j) is a Gaussian distribution. The D2D
variations affect the buffers in different planes independently and, therefore, DD2Du( j) is
independent fromDD2Du(k) for any j = k. Consequently, according to (19), the distribution
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:8 H. Xu et al.
of DD2Du is also a Gaussian distribution.
DD2Du ∼ N
(
0, σ 2
DD2Du
)
(21)
σ 2DD2Du
=
Np∑
j=1
σ 2DD2Du( j)
=
Np∑
j=1
(nu( j)∑
i=1
σDD2Du( j,i)
)2
(22)
4.2.2. WID Variation Model for the Delay of 3D Clock Paths. The delay of a 3D clock path
affected by WID variations is the sum of WID variations of all the buffers along this
path. Consequently, according to (16), the distribution of DWIDu is also a Gaussian
distribution. The resulting variance of the delay of sink u due to WID variations is
DWIDu ∼ N
(
0, σ 2DWIDu
)
, (23)
σ 2DWIDu
=
nu∑
i=1
σ 2dWIDi
+ 2
∑
1≤i< j≤nu
corr(i, j)σdWIDi σdWIDj , (24)
where corr(i, j) is the correlation between the WID variations of buffers i and j. If
buffers i and j are located in different planes, corr(i, j) = 0. The correlation of the
impact of WID variations on different buffers within the same plane can be classified
as systematic and random. The systematic WID variations typically exhibit a spatial
correlation [Bowman et al. 2009; Agarwal et al. 2003; Orshansky et al. 2000; Hashimoto
et al. 2005]. For the buffers located within the same plane, two types of correlations for
the WID variation are investigated, as presented in Section 4.3.
4.3. Correlations between the WID Variations of Buffers within the Same Plane
The correlation between the WID variations of buffers i and j within one plane is
described by corr(i, j). Two types of correlations are considered in this article: uncorre-
lated and multilevel correlation.
4.3.1. Uncorrelated WID Variations. In this case, the WID variation of the buffers within
one plane are considered independent from each other. For any pair of buffers i and j,
corr(i, j) = 0. Consequently, from (24), the standard deviation of the delay of sink u due
to WID variations is
σ 2DWIDu
=
nu∑
i=1
σ 2dWIDi
. (25)
4.3.2. Multilevel Correlation. The adopted spatial correlation model (multilevel correla-
tion) is based on the statistical timing analysis method proposed in Agarwal et al.
[2003]. A multilevel quad-tree partitioning is used and the intra-die variations of a
device are divided into l levels, as illustrated in Figure 4 [Agarwal et al. 2003]. At the
lth level, there are 4l−1 regions.
An independent variable is assigned to each region to represent a component of
the WID variation of a device. The overall WID variation of a buffer k is com-
posed by the sum of these independent components at different levels: LWID,k =∑1≤i≤l
region r intersects k Li,r, where Li,r is the random variable associated with the quad-
tree at level i, region (i, r). The distribution of LWID,k is captured by the elementary
circuit in Figure 2. This distribution is obtained by assigning the same probability
distribution to all the random variables associated with a particular level and by divid-
ing the total intra-die variability among the different levels. Consequently, the spatial
correlation between devices in the same plane can be modeled. Devices located close to
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:9
2,2
2,5
2,6
2,3
2,4
2,7
2,8
2,9
2,10
2,11
2,12
2,13
2,14
2,15
2,16
1,1
1,2
1,3
1,4
2,1
0,1
Fig. 4. Modeling spatial correlations using quad-tree partitioning [Agarwal et al. 2003].
each other are highly correlated, while the devices located in large horizontal distance
exhibit low correlation.
The correlation between the WID variations of buffers i and j are described by the
sum of the correlations at all the levels
corr(i, j) = 1
l
l∑
m=1
corrm(i, j), (26)
where corrm(i, j) is the correlation between buffers i and j at the mth level. As illus-
trated in Figure 4, assuming buffers i and j are located in the zones (m, regioni) and
(m, region j), respectively,
corrm(i, j) =
{
1, if (m, zonei) = (m, zone j)
0, if (m, zonei) = (m, zone j) . (27)
The uncorrelated WID variations and the multilevel correlation model are imple-
mented for several 3D clock trees. The resulting model used to describe the skew
variation is presented in the following section.
4.4. Clock Skew Distribution in 3D Clock Trees
The clock skew between any pair of sinks in a 3D clock tree is the difference of the
clock delay between these sinks. For a 3D clock tree with nsink sinks distributed in Np
planes, the nominal value and the variation of clock skew su,v between sinks u and v,
respectively, are
s′u,v =D′u − D′v, (28)
su,v =sWIDu,v + sD2Du,v = DWIDu − DWIDv + DD2Du − DD2Dv . (29)
The mean value of su,v is E(su,v) = E(sWIDu,v ) = E(sD2Du,v ) = 0. DWIDu − DWIDv and
DD2Du − DD2Dv are independent from each other. Consequently, sD2Du,v and sWIDu,v are
discussed separately in the following subsections.
4.4.1. Skew Model of 3D Clock Trees with D2D Variations. The correlation between every two
terms in the expression ofsD2Du,v can be one or zero (i.e., fully correlated or uncorrelated,
respectively). According to (19), sD2Du,v can be written as the sum of the terms in
different planes
sD2Du,v =
Np∑
j=1
sD2D(u,v) j , s
D2D
(u,v) j =
nu( j)∑
i=1
DD2Du( j,i) −
nv( j)∑
i=1
DD2Dv( j,i), (30)
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:10 H. Xu et al.
Clock
source
sink u
sink v
nu,v
nu
nv
Fig. 5. The clock paths to sinks u and v where the paths share nu,v buffers.
where DD2Du( j,i) is the D2D delay variation related to the i
th buffer in the jth plane along
the clock path ending at sink u. The number of buffers in the jth plane along this path
is denoted as nu( j).
All the buffers in the same plane are equally affected by the D2D variations, which
means that the correlation between each pair of variables in (30) is one. Since DD2Du( j,i)
and DD2D
v( j,i) are both modeled as Gaussian distributions, s
D2D
(u,v) j
is also a Gaussian
distribution. In (30), ∀ j1 = j2(1 ≤ j1, j2 ≤ Np), sD2D(u,v) j1 is independent from s
D2D
(u,v) j2
.
Consequently, sD2Du,v is also described by a Gaussian distribution.
sD2Du,v ∼ N
(
0, σ 2sD2Du,v
)
(31)
σ 2sD2Du,v
=
Np∑
j=1
σ 2sD2Du,v( j)
=
Np∑
j=1
(nu( j)∑
i=1
σDD2Du( j,i)
−
nv( j)∑
i=1
σDD2D
v( j,i)
)2
(32)
4.4.2. Skew Model of 3D Clock Trees with WID Variations. According to (24), the distribution
of sWIDu,v is also a Gaussian distribution
sWIDu,v ∼ N (0, σ 2sWIDu,v ), (33)
σ 2sWIDu,v
=
nu∑
i=nu,v+1
σ 2DWIDu(i)
+
nv∑
j=nu,v+1
σ 2DWID
v( j)
+ 2
nu∑
i, j=nu,v+1
i< j
corr(i, j)σDWIDu(i) σDWIDu( j)
+ 2
nv∑
i, j=nu,v+1
i< j
corr(i, j)σDWID
v(i)
σDWID
v( j)
− 2
∑
nu,v+1≤i≤nu
nu,v+1≤ j≤nv
corr(i, j)σDWIDu(i) σDWIDv( j) , (34)
where nu,v is the number of the buffers shared by the clock paths ending at sinks u
and v, as depicted in Figure 5. Downstream buffer nu,v, the subpaths to u and v do
not share any buffer. The correlation between the variation of buffers is presented in
Section 4.3.
According to (29) through (34), the variation of the clock skew su,v between sinks u
and v in a 3D clock tree is modeled as a Gaussian distribution.
su,v ∼N
(
0, σ 2sWIDu,v + σ
2
sD2Du,v
)
(35)
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:11
Table I. Device and Interconnect Parameters of the Investigated Circuit
Parameter Wn/Ln Wp/Lp Vdd [V] Rb [] Cb [fF] Db [ps]
Value 30 60 1.0 349.0 5.7 24.8
Parameter rint [/mm] cint [fF/mm] øTSV [μm] lTSV [μm] RTSV [m] CTSV [fF]
Value 51.2 230.2 2 20 133 52
Table II. Variations of the Electrical Characteristics of the Buffers
Input Slew Rb [] Cb [fF] Db [ps]
μ σWID σD2D μ σWID σD2D μ σWID σD2D
47 [mV/ps] 371 18.8 15.3 4.9 0.04 0.03 19.9 1.04 0.85
σ/μ 5.1% 4.1% σ/μ 0.8% 0.7% σ/μ 5.2% 4.3%
16 [mV/ps] 349 17.8 14.7 5.7 0.31 0.16 24.8 1.49 1.21
σ/μ 5.1% 4.2% σ/μ 2.3% 2.1% σ/μ 6.0% 4.9%
6 [mV/ps] 345 16.7 13.7 7.2 0.08 0.06 30.1 2.19 1.79
σ/μ 4.8% 4.0% σ/μ 1.1% 0.9% σ/μ 7.3% 5.9%
If the maximum tolerant skew variation is S ≥ 0, the probability that a 3D clock tree
satisfies this constraint is
P(|su,v| ≤ S) =
∫ S−s′u,v
−S−s′u,v
fsu,v (t)dt, (36)
fsu,v (t) =
1√
2πσ 2su,v
e−t
2/(2σ 2su,v ). (37)
The model of skew variations is used to analyze the effect of process variations in
various 3D clock trees. This model can be extended to include the variations of horizon-
tal interconnects, as analyzed in the appendix. The investigated 3D clock distribution
networks and simulation results are presented in the following section.
5. SIMULATION RESULTS
Based on the skew variation model presented in Section 4.4, two H-tree-based 3D
topologies are investigated highlighting the impact of process variations on the clock
skew in 3D circuits. The accuracy of the variation model is discussed in Section 5.1.
The investigated global 3D H-tree topologies are described in Section 5.2. Simulation
results and a comparison between these 3D clock trees are presented in Sections 5.3
and 5.4.
5.1. Accuracy of the Clock Skew Variation Model for 3D Clock Trees
The skew variation model is compared with Monte Carlo simulations in this section.
The structure used for this purpose is an H-tree clock distribution network. This H-tree
is placed in a circuit with total area 10 mm × 10 mm.
The circuit is assumed to be implemented at a 45nm CMOS technology. The parame-
ters of the transistors and the interconnects are extracted from the PTM 45 nm CMOS
and global interconnect models [NIMO ASU 2008] and the ITRS reports [ITRS 2009].
The clock buffers consist of two inverters connected in series. The circuit parameters
used in the following sections are listed in Table I. The ratio of the width to the chan-
nel length is denoted by Wn/Ln and Wp/Lp for NMOS and PMOS, respectively. The
interconnect resistance and capacitance per unit length are denoted by rint and cint, re-
spectively. The physical and electrical characteristics of TSVs are also listed in Table I
and are based on the data reported in Katti et al. [2010] and Savidis and Friedman
[2008]. The diameter and length of the TSVs are notated as øTSV and lTSV, respectively.
The variation of the effective channel length of transistors, Leff, is considered in this
article, which has been identified as themost significant component of device variations
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:12 H. Xu et al.
Clock 
source
3rd plane 3rd plane
4th plane 4th plane
1st plane
2nd plane
1st plane
2nd plane
(a)
0
2
4
0
2
4
1
2
3
4
X [mm]Y [mm]
In
de
x 
of
 p
la
ne
(b)
Fig. 6. A single-via 3D clock H-tree where different views (a) 2D and (b) 3D are illustrated.
[Nassif 2000; Bowman et al. 2002, 2009; Agarwal et al. 2003]. Note that the effect of
other sources of process variations can also be determined by the circuit illustrated
in Figure 2 and described with the proposed model. The corresponding nominal Leff,
D2D variation (3σD2DLeff ), and WID variation (3σ
WID
Leff
) are 27 nm, 2.2 nm, and 2.7 nm,
respectively [ITRS 2009]. Cadence Spectre is used for the Monte Carlo simulations
[Cadence Design Systems, Inc. 2008]. The resulting variations of Rb, Cb, and Db are
listed in Table II, which are obtained as discussed in Section 4.1 with the input slew
rates being 47 mV/ps, 16 mV/ps, and 6 mV/ps. The mean value and standard deviation
are denoted by μ and σ , respectively. The ratio σ/μ usually indicates the importance of
variations [Bowman et al. 2002]. The Monte Carlo simulation is repeated 1000 times.
As reported in Table II, the σ of Rb, Cb, and Db all depend on the input slew rate.
To consider the slew rate and the load is, therefore, necessary while evaluating the
variations of the buffer delay. The method presented in Section 4.1 is applicable to
accurately consider this dependence.
Two H-tree topologies are used to verify the accuracy of the skew variation model.
The first topology (multi-via) is illustrated in Figure 1(b). The second topology (single-
via) is illustrated in Figure 6. Both these topologies are introduced in the following
section. The H-tree spans four planes. The clock source is located at the center of
the first plane. There are 128 clock sinks in total, 32 in each plane. Clock buffers,
which are marked with 	, are inserted following the technique described in Te´llez and
Sarrafzadeh [1997]. The clock frequency is 1 GHz and the constraint on the input slew
rate is 16 mV/ps (the transition time is 5% of the clock period). The numbers of the
inserted buffers in multi-via and single-via topologies are 168 and 540, respectively.
Only few buffers are illustrated in Figures 1 and 6 for improved readability. The wire
segments between two buffers are simulated using a standard π model.
As shown in Figure 1(b), sinks 1, 2, and 3 are located in the first plane. Sinks 4
and 5 are located in the topmost plane. Skews s1,2, s1,3, s1,4, and s1,5 are considered to
demonstrate the accuracy of the adopted model. The difference between the resulting
standard deviation σ of Spectre simulation and the skew variation model is reported
in Table III. The skew variations with uncorrelated (independent) WID variations are
reported below “Indep”. The variations modeled by the multilevel spatial correlation
are reported below “Multi-Level”, where five levels are assumed (l = 5). The error of
the skew variation model is below 6% between any pair of sinks in the investigated
clock tree. As listed in Table III, the distribution of the clock skew determined by
the skew variation model exhibits reasonable accuracy as compared with Monte Carlo
simulations.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:13
Table III. σ of Skew Variation of the 3D Circuits Shown in Figures 1 and 6
Correlation Indep. Multi-Level CPU timeSkew variation σs1,2 σs1,3 σs1,4 σs1,5 σs1,2 σs1,3 σs1,4 σs1,5
Multi-via
Model [ps] 6.96 14.76 7.41 14.97 7.00 33.55 7.11 32.92 26 sec.
Spectre [ps] 7.27 15.71 7.56 15.97 7.40 34.90 7.31 35.10 48 min.
Error [%] -4 -6 -2 -6 -5 -4 -3 -6 -
Single-via
Model [ps] 3.91 13.59 51.33 51.33 2.43 29.01 71.14 69.63 39 sec.
Spectre [ps] 3.84 13.13 50.13 50.09 2.34 28.1 68.19 67.9 55 min.
Error [%] 2 3 2 2 4 3 4 3 -
The computational time for the proposed model and the Spice-based Monte Carlo
simulations is also listed in Table III. The runtime is reported as the average time for
independent and spatially correlated WID variations. To decrease the runtime, only
the clock paths related to the reported σ are simulated. Although only a part of the
3D clock trees is simulated, using the proposed model, the runtime is reduced by 85×.
The efficiency of the variability-aware design of 3D clock distribution networks can
be significantly improved by the proposed model, especially when different topologies
of clock trees are compared and iterative design modifications are required. The com-
parison of the runtime for the entire 3D clock distribution networks is discussed in
Section 6.
5.2. Typical 3D Clock H-Tree Topologies
The skew variation for two types of 3D global H-trees is investigated in the following
subsections. Both of these networks have been utilized in the design of a prototype
3D circuit [Pavlidis et al. 2008] and other case studies of multiplane circuits [Mondal
et al. 2007]. H-trees are typically used to globally deliver the clock signal to large-scale
circuits [Friedman 2001]. The regularity of these topologies facilitates the investigation
of WID and D2D variations as compared to synthesized clock trees which exhibit
significantly different wire length and TSV density characteristics. In other words, the
main objective is to demonstrate the physical behavior of a 3D system under these
variations and the related trade-offs, rather than the decrease in the wire length of
a clock tree which is produced by an efficient 3D clock tree synthesis technique. A
number of local clock networks, such as local meshes, clock trees [Zhao and Lim 2010],
and rings can be used to distribute the clock signal in the vicinity of each leaf of the
H-tree. Although H-trees are considered, the analysis also applies to other global clock
architectures, such as X-trees.
The first topology (multi-via topology) is shown in Figure 1, where the clock source
and buffers (except for the buffers at the last level) of a 3DH-tree are located in a single
physical plane (e.g., the first plane). In this topology, the clock signal is propagated to
the sinks in other planes by multiple TSVs. The vertical lines at each leaf correspond
to a cluster of TSVs.
The second clock tree topology (single-via topology) is illustrated in Figure 6, where
a 2D H-tree is replicated in each plane. The clock signal is propagated by a single via
(or a group of TSVs to prevent TSV failures and to lower the resistance of this vertical
path) connecting the clock source to each H-tree replica.
5.3. Skew Variations between the Clock Sinks in the Same Plane
In 3D H-trees and for intra-plane paths, the number, size, and location of the buffers
along these paths are equal for a single plane, since the multi-via and single-via topolo-
gies are both symmetric topologies (at least within the x and y directions). The D2D
variations in each plane, therefore, affect these clock paths equally. Consequently, ac-
cording to (30), for both the multi-via and single-via topologies, only WID variations
affect the variation of skew between sinks located in the same plane. For both the
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:14 H. Xu et al.
single-via and multi-via tree topologies, the variation of skew between the buffers
located in the same plane exhibits the same behavior as in 2D circuits.
For the considered topologies, the clock buffers are inserted by uniform buffer in-
sertion techniques under the same constraints of skew and slew rate [Te´llez and Sar-
rafzadeh 1997]. Rin and Cl of each buffer, therefore, are approximately equal. For a 3D
clock tree, as described by (34), if Rin and Cl of each buffer remain unchanged, the σsWID(u,v)
between two sinks decreases as the number of the nonshared clock buffers (e.g., the
buffers after the nu,v buffers in Figure 5) decreases. For a 3D IC with total area A, the
side length of each plane is L ∝
√
A
Np
. Consequently, the number of buffers in one plane
decreases as L decreases for an increasing number of planes forming the 3D circuit.
For the single-via topology, all the clock sinkswithin a plane are connected to the clock
source by the same TSV. The length of this TSV and the increasing number of buffers
vertically connected to this TSV do not affect the intra-plane skew. Consequently, based
on the proposed model and the preceding analysis, it is concluded that the following
holds.
Conjecture 5.1. For the single-via topology, the distribution of the skew between the
clock sinks in the same plane becomes narrower as the number of planes increases.
For the multi-via topology, however, the clock sinks in the same plane connect to dif-
ferent TSVs. As the number of planes increases, both the number of buffers connecting
to a TSV and the length of the TSVs increase. The input slew rate decreases since an
increasing load is driven. As reported in Table II, the resulting delay variation of the
buffers after the TSVs increases. Moreover, the load of the buffers driving the TSVs
increases. These changes of the topological characteristics result in the increase of σd(i),
as described by (16). This increase, consequently, counteracts and can surmount the de-
crease in variations due to the decreasing number of clock buffers along the clock paths.
Conjecture 5.2. For the multi-via topology, the distribution of the skew between the
clock sinks in the same plane changes nonmonotonically as the number of planes
increases.
Example 1. Simulation results exhibiting the different behavior of the single-via
and multi-via topologies are shown in Figure 7. In this example, a global clock tree
with 256 sinks is placed in a 3D IC with increasing number of planes. A sink can be
either a subtree, a local clock mesh, or a cluster of registers (or a buffer driving any
of these structures). The total area of the circuit is 100 mm2. The impact of process
variations on the skew between pairs of sinks within the same plane is demonstrated
by skews s1,2 and s1,3 as illustrated in Figure 1(b). The results of the simulations with
the independent and multilevel correlated WID variations (l = 5) are illustrated in
Figures 7(a) and 7(b), respectively.
The buffers inserted into the 3D clock trees are reported in Table IV. The number of
buffers inserted within one plane in the single-via topology is lower than the multi-via
topology, which introduces a lower skew variation than themulti-via topology. The total
number of buffers in a single-via topology is, however, much higher than the multi-via
topology.
Conjecture 5.1. The variance of the skew between two intra-plane sinks of the single-
via 3D H-tree is smaller than the corresponding variance of the skew in a multi-via 3D
H-tree.
The numbers of buffers in each plane for both topologies decrease as the number
of planes increases. The increasing number of buffers connected to TSVs (due to the
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:15
0
5
10
15
20
25
1 2 3 4 5 6 7 8 9 10
of
sk
ew
[p
s]
# of planes
s(1,2)_mul_via
s(1,2)_single_via
s(1,3)_mul_via
s(1,3)_single_via
(a)
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10
of
sk
ew
[p
s]
# of planes
s(1,2)_mul_via
s(1,2)_single_via
s(1,3)_mul_via
s(1,3)_single_via
(b)
Fig. 7. σ of skew within the first plane for increasing number of planes, where the WID variations are
considered (a) independent and (b) multilevel correlated.
Table IV. The Number of Buffers Inserted into the 3D Clock Trees
# of planes 1 2 3 4 5 6 7 8 9 10
multi-via 981 588 558 296 264 242 234 138 134 134
single-via (per plane) 981 460 430 231 199 177 169 105 101 101
single-via (total) 981 920 1290 924 995 1062 1183 840 909 1010
greater number of planes) increases σsu,v in the multi-via topology but does not affect
σsu,v in the single-via topology. Consequently, the decrease in the number of buffers leads
to a reduction in skew variation within the same plane for the single-via topology, as
shown by the  and  curves in Figure 7. Nevertheless, for the multi-via topology, as
shown by the 	 and × curves, σsu,v within the same plane changes nonmonotonically
with the number of planes. For the sinks with short distance, the σs1,2 even increases
with the number of planes. As a result, for multi-via 3D H-trees, simply increasing
the number of planes does not necessarily improve the clock skew. By employing the
proposed skew variation model, the number of planes that produces the lowest skew
variation is determined.
The maximum supported clock frequency fmax of a circuit is constrained by skew
[Friedman 2001]. Although fmax is typically determined by the critical path delay,
the skew criterion is used here to offer a tangible explanation of the effect of process
variations on the performance of circuits. The maximum allowed skew is assumed to
be 10% 1fmax for the simulated 3D clock trees. To achieve a timing yield higher than 99%,
fmax should be smaller than 10% 13σsu,v , where 3σsu,v is the skew at the 3σ point from the
mean value. Assuming that the clock frequency is limited by the largest skew variation,
the fmax corresponding to σs1,3 shown in Figure 7, is illustrated in Figure 8. The results
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:16 H. Xu et al.
0
0.5
1
1.5
2
2.5
3
3.5
1 2 3 4 5 6 7 8 9 10
m
ax
cl
oc
k
fr
eq
ue
nc
y
[G
H
z]
# of planes
mul_via (I)
single_via (I)
mul_via (II)
single_via (II)
Fig. 8. The maximum supported clock frequency determined by the skew variation within one plane.
with independent WID variations are illustrated by “multi-via (I)” and “single-via (I)”.
The results with multilevel WID correlations are illustrated by “multi-via (II)” and
“single-via (II)”.
As illustrated in Figure 8, the single-via topology can produce an up to 53% and
64% higher clock frequency for independent and multilevel correlated WID variations,
respectively, as compared to the multi-via topology. This improvement increases as the
number of planes increases. The fmax in the multi-via topology changes nonmonotoni-
cally with the number of planes.
Guideline 1. In a 3D circuit, if the data-related sinks are located mostly within the
same plane, the single-via topology is more efficient in reducing the skew variations
and can support a higher clock frequency.
The fmax produced by a 3D tree with independent WID variations, not surprisingly,
is higher than a 3D tree with multilevel correlated WID variations. According to (34),
this situation is due to the larger spatial correlation between devices which introduces
higher skew variations into a 3D clock tree.
5.4. Skew Variations between the Clock Sinks in Different Planes
As described by (30) through (32), when the investigated clock sinks are located in
different planes, the corresponding clock skew is also affected by D2D variations. As a
result, the skew variations between inter-plane sinks vary from the intra-plane skew
variation. To demonstrate this difference, the σsu,v between each pair of sinks of a multi-
via 3D clock tree is illustrated in Figure 9. Since σsu,v = σsv,u, only half of the skew array
is shown in Figure 9 for clarity.
Example 2. In this example, a 3D clock tree spanning eight planes is implemented
using the single- and multi-via topologies. The resulting skew of the multi-via topology
is illustrated in Figure 9. The electrical parameters of the wires are given in Section 5.1.
There are 32 sinks in each plane and 256 sinks in total. Sinks 1 to 32 are located in
the first plane and sinks 33 to 64 are located in the second plane, etc. As an example,
consider the σ of the skew between sinks 3 and 4. This standard deviation is determined
by the z value of the point x = 3 and y = 4. From these figures, the σs of inter-plane
skew is larger than the σs of intra-plane skew. The change of skew variation between
inter-plane sinks with the number of planes is illustrated in Figure 10.
For the single-via topology, the skew variation for the interplane pairs of sinks re-
mains approximately the same irrespective of the planes to which the sinks belong.
This behavior is because the paths to sinks located in different planes do not share any
common segments (see Figure 6).
When the number of planes is greater than two, the skew variation decreases as the
number of planes increases, as also shown in Figure 10. Since the paths lay in different
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:17
Fig. 9. The σ of skew between each pair of clock sinks under the multi-via topology where (a) is the 3D view
and (b) is the top view.
0
10
20
30
40
50
60
70
80
90
1 2 3 4 5 6 7 8 9 10
of
sk
ew
[p
s]
# of planes
s(1,4)_mul_via
s(1,4)_single_via
s(1,5)_mul_via
s(1,5)_single_via
(a)
0
20
40
60
80
100
120
1 2 3 4 5 6 7 8 9 10
of
sk
ew
[p
s]
# of planes
s(1,4)_mul_via
s(1,4)_single_via
s(1,5)_mul_via
s(1,5)_single_via
(b)
Fig. 10. σ of skew between the sinks in the first and the topmost plane for the single-via and multi-via
topologies. The locations of the pairs of sinks defining s1,4 and s1,5 are shown in Figure 1(b). (a) is based on
independent WID variations and (b) is based on multilevel correlated WID variations.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:18 H. Xu et al.
Table V. The Maximum Clock Frequency Supported by Multi-via and Single-Via Topologies
# of planes 1 (2-D) 2 3 4 5 6 7 8 9 10
I
multi-via [GHz] 1.62 1.93 2.07 2.16 2.15 2.13 2.11 2.09 2.00 1.92
single-via [GHz] 1.62 0.44 0.52 0.60 0.64 0.69 0.75 0.82 0.84 0.86
multi / single 1 4.4 4.0 3.6 3.4 3.1 2.8 2.5 2.4 2.2
II
multi-via [GHz] 0.49 0.74 0.98 0.96 1.05 1.14 1.19 1.33 1.50 1.45
single-via [GHz] 0.49 0.34 0.40 0.44 0.47 0.51 0.56 0.61 0.62 0.63
multi / single 1.0 2.2 2.5 2.2 2.2 2.2 2.1 2.2 2.4 2.3
planes, according to (32), the effect of D2D variations on the 3D single-via topology is
much larger than in planar H-trees.
The skew variation under the multi-via topology varies significantly from the single-
via topology, as illustrated in Figure 10. The skew variation between planes signifi-
cantly depends on the location of the related sinks. According to (30)–(32), the impact
of D2D variations increases as the number of buffers located in different planes in-
creases. For the multi-via topology, all clock paths preceding the TSVs are in the first
plane. The effect of D2D variations on themulti-via topology, therefore, is much smaller
than the single-via topology, as shown in Figure 10. Nevertheless, as shown in Figure
10, the skew variation of the multi-via topology changes nonmonotonically with the
number of planes. The reason is similar to the skew variation within the same plane
as discussed previously.
Guideline 2. In a 3D circuit, if the data-related sinks are widely distributed in several
planes, the multi-via topology is more efficient in reducing the skew variation and
supports a higher clock frequency.
Assuming the fmax of a 3D IC is limited by the inter-plane skew variation, the
maximum operating frequency supported by the single-via and multi-via topologies
is reported in Table V. The results based on independent and multilevel correlated
WID variations are reported after “I” and “II”, respectively. As listed in this table, for
a circuit with a different number of planes and different clock tree topology, the fmax
can vary from 440 MHz to 2.16 GHz. The corresponding largest σsu,v varies from 77 ps
to 15 ps. For the same number of planes, the fmax of the multi-via topology is up to 4.4
times higher than the single-via topology. As reported by “I”, the single-via topology
produces lower fmax than a planar tree when the WID variation is completely random.
The single-via topology, however, can produce a higher fmax as compared to a 2D tree
when the systematic WID variation is considered as reported by “II”.
3D integration is considered to significantly reduce the interconnect delay and en-
hance the clock frequency of circuits [Pavlidis and Friedman 2009b; Joyner et al. 2004].
This enhancement, however, is shown not to grow directly proportionally with the num-
ber of planes where process variations (bothWID andD2D) are considered in the design
process.
Results indicate that the performance improvement in a 3D clock network depends
significantly on the distribution of the sinks (and consequently the clock paths) among
the planes. As reported in Table V, when the data-related sinks are distributed at
different planes, the skew of single-via 3D clock trees is affected more by process
variations than the corresponding 2D clock trees. This behavior is consistent with the
conclusions made in Garg andMarculescu [2009a] and Akopyan et al. [2008]. The effect
of process variations on 3D clock distribution networks can be mitigated by employing
a multi-via topology in this case. This topology can better exploit the traits of vertical
integration (i.e., shorter wires) to significantly increase the operating frequency.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:19
0
2
4
0
2
4
1
2
3
4
5
X [mm]Y [mm]
Ind
ex 
of p
lan
e
sink 1
sink 6
sink 7
Fig. 11. An example of the multigroup 3D clock H-tree topology.
6. MULTIGROUP 3D CLOCK TREE TOPOLOGY
As stated in Guidelines 5.3 and 5.4, the single-via 3D clock H-tree topology is more
efficient in reducing the skew variation within a single plane, while the multi-via topol-
ogy is more efficient in reducing the skew variation between planes. To exploit these
advantages, a hybrid H-tree topology (multigroup topology) combining the features of
these topologies is proposed in this section.
The new multigroup topology is illustrated in Figure 11. The key idea is that the
Np planes forming a 3D circuit are divided in G groups of “data-related planes”. The
data-related planes are the physical planes containing data-related registers. The ith
group of data-related planes consists of hi(≤ Np) physical planes. The clock signal is
distributed within these hi planes by a multi-via topology.
An example of this H-tree topology is illustrated in Figure 11. This H-tree includes
two groups of data-related planes (G = 2). Each group spans three (h1 = 3) and two
physical planes (h2 = 2), respectively. The buffers contained in each group of data-
related planes are denoted by 	 and ◦. The TSVs connecting these buffers are called
“sink-TSVs”. The roots of the multi-via topologies are connected with a “root-TSV” (or
a cluster of TSVs) as illustrated by the segment at the center of the planes.
For a 3D IC, if all the data-related clock sinks cannot be located within the same
plane but in adjacent planes, the multigroup topology is more efficient in reducing the
skew variation than the aforementioned topologies. Comparedwith the single-via topol-
ogy, using G instead of Np H-trees, the multigroup topology significantly reduces the
skew variation between data-related planes. Compared with the multi-via topology,
the buffers connected to the sink-TSVs for the multigroup topology are fewer than
the buffers connected to the TSVs of the multi-via topology. Therefore, both the skew
variation within a single plane and the skew variation between data-related planes
are reduced.
Example 3. A 3D circuit with eight planes is simulated for the three topologies to
investigate the efficiency of the multigroup 3D clock tree. The physical and electrical
characteristics of the circuit are reported in Section 5.1. Two variants of the multigroup
topology are simulated, including two groups (hybrid 2, G = 2, hi = 4) and four groups
(hybrid 4, G = 4, hi = 2) of data-related planes, respectively. Simulation results are
shown in Figure 12.
In Figure 12, skews s1,2, s1,3 (defined in Figure 1(b)), s1,6, and s1,7 (defined in Figure 11)
are depicted showing the skew variation between the nearest and the farthest sinks.
The results based on independent andmultilevel correlatedWID variations are denoted
by (I) and (II), respectively. The σs1,2 and σs1,3 produced by the multigroup topology are
lower than the multi-via topology and decrease as the number of sub-H-trees increases.
For the topology with four sub-H-trees (hybrid 4), s1,2(I), s1,3(I), s1,2(II), and s1,3(II) are
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:20 H. Xu et al.
0
10
20
30
40
single via mul via hybrid_2 hybrid_4
of
sk
ew
[p
s]
topology
s(1,2) (I) s(1,3) (I) s(1,2) (II) s(1,3) (II)
(a)
0
20
40
60
80
single via mul via hybrid_2 hybrid_4
of
sk
ew
[p
s]
topology
s(1,6) (I) s(1,7) (I) s(1,6) (II) s(1,7) (II)
(b)
Fig. 12. σ of skew for three 3D clock tree topologies. (a) Intra-plane skews s1,2 and s1,3. (b) Inter-plane skews
s1,6 and s1,7 within a group of data-related planes.
Table VI. σs1,7 and Computational Time of Three 3D Clock Tree Topologies
Topology single-via multi-via hybrid 2 hybrid 4
σ1,7
Model [ps] 40.6 15.9 13.0 11.9
Spectre [ps] 41.6 16.8 13.7 12.3
Error [%] -2 -5 -5 -3
CPU time
Model [min.] 29 28 25 27
Spectre [h.] 500 173 221 265
Spec./Model 1034 364 535 582
reduced by 55%, 23%, 44%, and 10% respectively, as compared with the multi-via
topology.
Although the σs1,3 within the same plane of themultigroup topology is still greater (4%
for hybrid 4) than the single-via topology, the inter-plane skews σs1,6 and σs1,7 within
a group of data-related planes of the multigroup topology are significantly reduced
as shown in Figure 12(b). This reduction is also greater than the multi-via topology.
The number of sub-H-trees within a multigroup 3D topology is determined by the
distribution of the data-related sinks.
Guideline 3. When the data-related sinks are located in adjacent planes of a 3D
circuit, the multigroup 3D clock tree topology is more efficient in reducing the skew
variation than both the single- and multi-via topologies.
Furthermore, as illustrated in Figure 12, for the sinks with a short horizontal dis-
tance in a multigroup topology, the systematic WID variations (denoted by (II)) in-
troduce lower skew variation than the random WID variations (denoted by (I)), for
example, s1,2 and s1,6. For the sinks with a large horizontal distance (e.g., s1,3 and s1,7),
the skew variation produced by the systematic WID variations is higher.
The results illustrated in Figure 12 are compared with Monte Carlo simulations.
The setup of the Monte Carlo simulation environment is listed in Section 5.1. The
σ of s1,6 and s1,7 within a group of data-related planes is reported for the indepen-
dent WID variations in Table VI. As reported in this table, the preceding analysis on
the multigroup 3D H-trees is consistent with the results of Monte Carlo simulations.
The error of the skew variation model is typically smaller than 5% as compared with
the Monte Carlo simulations.
The computational time is also listed for different topologies in Table VI. Since
this runtime is for the entire 3D clock trees, the computational time is significantly
higher than that reported in Table III for both the proposed model and Monte Carlo
simulations. As the complexity of the 3D clock tree increases, the time savings by
the proposed model significantly increases, up to 1000×. Consequently, the efficiency
of the variability-aware design of 3D clock distribution networks can be considerably
improved by estimating the skew variation with the proposed model.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:21
Clock
Distribution
Global 
Clock
Grid
PLL out 1
23
4
(a) (b)
Fig. 13. An example of combining clock trees and grids, where (a) is the topology of a tree-grid structure
[Restle et al. 2002] and (b) is the investigated global grid.
Table VII. Monte Carlo Results of Different Clock Distribution Networks
# of planes μmax [ps] σmax [ps] Power@1 GHz [mW]grid multi single grid multi single grid multi single
1 9.77 0.00 12.00 21.53 189.60 125.90
2 8.83 0.04 0.04 11.25 18.30 76.10 105.10 72.01 110.00
4 4.15 0.15 0.18 6.15 16.22 56.65 67.86 51.70 103.10
8 3.10 0.34 0.38 6.39 16.82 41.58 47.68 39.84 89.71
7. SKEW VARIATION IN CLOCK GRIDS
To compare the skew variation in 3D clock trees with other clock distribution topologies,
the skew of clock grids is discussed in this section. A typical hybrid structure of clock
trees and grids is simulated and compared with the 3D clock trees in terms of process-
induced clock skew.
A pregrid clock distribution network is required to drive a grid structure. A combi-
nation of clock trees and grids (tree-grid structure) can meet this requirement [Restle
et al. 2002], as illustrated in Figure 13(a). The sinks of the clock tree are connected to
a global clock grid. Buffers are inserted into the clock tree to meet the constraint on
the slew rate.
A clock tree-grid structure with 256 sinks is compared with the previous 3D clock
trees in terms of skew variations. The 2D global clock grid has 256 nodes and the
area of each cell is 0.58 mm × 0.58 mm, as illustrated in Figure 13(b). When the grid
is embedded to a 3D IC, the area of the grid is shrunk proportionally to the area of
each plane. The grid is located in the first plane and the clock signal is propagated
to other planes through TSVs at each node similar to the multi-via 3D tree. The
electrical and physical characteristics of the circuit are reported in Section 5.3. The
Monte Carlo simulation results of the tree-grid are reported in Table VII, where the
independent WID variations are considered. The results correspond to the circuits
with one, two, four, and eight planes, respectively. The maximum mean skew and
the standard deviation within each clock distribution network are denoted by μmax
and σmax, respectively. The average power for each clock distribution network for the
nominal device parameters is also reported at the clock frequency of 1 GHz.
As shown in Table VII, the tree-grid structure produces the lowest skew variation
(σmax) compared with the other topologies. Nevertheless, the tree-grid produces the
largest mean skew which is significantly higher than the 3D H-trees. This situation
is due to the considerable delay and capacitance of the wires in the global clock grid.
Furthermore, the average power consumed by the tree-grid is the highest among all the
three topologies. Extending the grid tomultiple planes can, however, improve the power
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:22 H. Xu et al.
consumption and mean skew. In conclusion, clock grids reduce the skew variations
compared with clock trees but increase the mean skew and the power consumption.
Themulti-via 3DH-trees can significantly reduce the power consumption andmaintain
a sufficiently low mean skew while reducing the skew variation. The single-via 3D
H-trees produce the highest power consumption and skew variations due to the large
number of buffers, as reported in Table IV.
8. CONCLUSIONS
The effect of process variations of devices on the clock skew of 3D ICs is analyzed.
An accurate method to estimate the skew variation within 3D clock trees considering
spatial intra-die correlations is presented. By applying this method, the skew variation
for single- and multi-via 3D clock tree topologies is investigated. Both the process
variations of transistors and wires are included in this model.
Results based on a 45nm CMOS technology demonstrate that the proper clock tree
topology should be determined according to the distribution of data-related sinks.When
data-related sinks are mostly located in the same plane, a single-via 3D H-tree is more
efficient in reducing the skew variation, where the skew variation decreases as the
number of planes increases. If the data-related sinks are widely distributed in different
planes, multi-via 3D H-trees produce considerably smaller skew variation. Based on
the features of these topologies, a multigroup 3D clock tree topology is proposed. When
the data-related sinks are distributed in adjacent planes, the skew variation is further
reduced by the proposed multigroup 3D clock tree.
The clock trees are also compared with a typical clock grid. The simulation results
show that a global clock grid can decrease the maximum skew variations but signifi-
cantly increase the power and the mean skew. Considering that stricter power budgets
can apply in 3D systems due to the accentuated thermal issues, carefully designed
3D H-trees can be a preferred solution. Following the proposed design guidelines re-
sults in those 3D H-trees with improved skew variations and low power dissipation.
The proposed multigroup 3D clock tree exhibits a superior performance combining the
advantages of both the multi- and single-via topologies.
APPENDIX: Extension of the Proposed Model to Include Variations of Wires
The proposed model can be extended to include the variations of horizontal wires.
The extended model is presented in this section. Consider the 3D clock tree shown in
Figure 3, where the delay variation of a buffer stage dstage(i) includes the variation of
the capacitance Cint and resistance Rint of the wires.
dstage(i) = di + 0.69(R′b(i) + Rb(i))Cint + 0.38(R′intCint + RintC ′int + RintCint)
+ 0.69(R′intCb(i+1) + RintC ′b(i+1) + RintCb(i+1)) (38)
According to the definition of di in (12), the term 0.69R′intCb(i+1) is included in di+1.
Consequently, dstage(i) is rewritten as
dstage(i) = di + 0.69(R′b(i) + Rb(i))Cint + 0.38(R′intCint + RintC ′int + RintCint)
+ 0.69(RintC ′b(i+1) + RintCb(i+1)) = di + dint(i), (39)
where the delay variation due to the wires is denoted by dint(i).
As discussed in Chang and Sapatnekar [2005], since the variation of characteristics
of metal wires is relatively low as compared with the nominal value, the variation of the
wire delay can be approximated by the first-order Taylor expansion without significant
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:23
Table VIII. Parameters of Horizontal Interconnects
Parameters Wm [nm] tm [nm] tILD [nm]
Nominal 430 1000 160
3σD2D 43 50 12
3σWID 21.5 25 6
Table IX. Skew Variation of the 3D Circuits Considering Wire Variations
Topology multi-via single-via
Skew variation σs1,2 σs1,3 σs1,4 σs1,5 σs1,2 σs1,3 σs1,4 σs1,5
Model [ps] 7.01 15.09 7.45 15.3 3.99 13.94 56.46 56.46
Spectre [ps] 7.19 16.44 7.64 16.55 4.00 13.77 56.38 56.3
Error [%] -3 -8 -2 -8 0 1 0 0
loss of accuracy. Similar to expression (13), dint(i) can be approximated as
dint(i) ≈
∑
pj∈ P
([
∂dint(i)
∂Rint
∂Rint
∂pj
]
0
pj +
[
∂dint(i)
∂Cint
∂Cint
∂pj
]
0
pj
)
, (40)
where pj is the jth parameter of the wire and P is the vector of parameters of wires
affected by process variations. For example, consider the variation of the width and the
thickness of the metal and the thickness of ILD [Chang and Sapatnekar 2005; Bowman
et al. 2009], P = (Wm, tm, tILD). Assuming these parameters are modeled by Gaussian
distributions and independent from each other [Chang and Sapatnekar 2005], the
distribution of dint(i) can be approximated by a Gaussian distribution. We have
dint(i) ∼ N (0, σ 2dint(i) ), (41)
σ 2dint(i) =
∑
pj∈ P
([
∂dint(i)
∂Rint
∂Rint
∂pj
]
0
+
[
∂dint(i)
∂Cint
∂Cint
∂pj
]
0
)2
σ 2pj , (42)
Rint = ρltmWm , (43)
Cint = 2(Cg + Cc)l, (44)
where Cg is the ground and fringe capacitance and Cc is the coupling capacitance. The
expressions of Cg and Cc are obtained from Wong et al. [2000] and NIMO ASU [2008].
Considering the delay variation caused by both the clock buffers and wires in (39),
the skew variation su,v includes two terms, su,v = sb(u,v)+sint(u,v). The distribution
of sb(u,v) is obtained through (12) to (35). The distribution of sint(u,v) can be obtained
through (17) to (35) by substitutingdint(i) fordi. Consequently, su,v can be described
by Gaussian distribution.
su,v ∼ N (0, σ 2sb(u,v) + σ 2sint(u,v) ) (45)
The extended model is compared with Monte Carlo simulations including the vari-
ations of rint and cint in the π model of interconnects. Based on the parameters used
in Chang and Sapatnekar [2005], the nominal value and standard deviation of the pa-
rameters of wires are listed in Table VIII. The multi-via and single-via trees simulated
in Section 5.1 are used to verify the accuracy of the extended model. The results for
the independent WID variations are reported in Table IX. As reported in Table IX, the
accuracy of the model including the variation of wires is reasonably high.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
20:24 H. Xu et al.
REFERENCES
AGARWAL, A., BLAAUW, D., AND ZOLOTOV, V. 2003. Statistical timing analysis for intra-die process variations
with spatial correlations. In Proceedings of the IEEE/ACM International Conference on Computer-Aided
Design. 900–907.
AGARWAL, A., ZOLOTOV, V., AND BLAAUW, D. T. 2004. Statistical clock skew analysis considering intradie-process
variations. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 23, 8, 1231–1242.
AKOPYAN, F., OTERO, C., FANG, D., JACKSON, S. J., AND MANOHAR, R. 2008. Variability in 3-D integrated circuits.
In Proceedings of the IEEE Custom Integrated Circuits Conference. 659–662.
ARUNACHALAM, V. AND BURLESON,W. 2008. Low-Power clock distribution in amultilayer core 3Dmicroprocessor.
In Proceedings of the Great Lakes Symposium on VLSI. 429–434.
AZUMA, A., OISHI, A., OKAYAMA, Y., KASAI, K., AND TOYOSHIMA, Y. 1998. Methodology of mosfet characteristics
fluctuation description using bsim3v3 spice model for statistical circuit simulations. In Proceedings of
the International Workshop on Statistical Metrology. 14–17.
BAKOGLU, H. B., WALKER, T. J., AND MEINDL, J. D. 1986. A symmetric clock-distribution tree and optimized
high-speed interconnections for reduced clock skew in ulsi and wsi circuits. In Proceedings of the IEEE
International Conference on Computer Design. 118–122.
BOWMAN, K. A., ALAMELDEEN, A. R., SRINIVASAN, S. T., AND WILKERSON, C. B. 2009. Impact of die-to-die and
within-die parameter variations on the clock frequency and throughput of multi-core processors. IEEE
Trans. VLSI. Syst. 17, 12, 1679–1690.
BOMAN, K. A., DUVALL, S. G., ANDMEINDL, J. D. 2002. Impact of die-to-die and within-die parameter fluctuations
on the maximum clock frequency distribution for gigascale integration. IEEE J. Solid-State Circ. 37, 2,
183–190.
CADENCE DESIGN SYSTEMS. 2008. Virtuoso Spectre Circuit Simulator User Guide 7.0.1 Ed. Cadence Design
Systems, Inc.
CHANG, H. AND SAPATNEKAR, S. 2005. Statistical timing analysis under spatial correlations. IEEE Trans.
Comput. Aid. Des. Integr. Circ. Syst. 24, 9, 1467–1482.
CONG, J., KAHNG, A. B., KOH, C.-K., AND TSAO, C.-W. A. 1998. Bounded-Skew clock and steiner routing. ACM
Trans. Des. Autom. Electron. Syst. 3, 3, 341–388.
DEVGAN, A. AND KASHYAP, C. 2003. Block-Based static timing analysis with uncertainty. In Proceedings of the
IEEE/ACM International Conference on Computer-Aided Design. 607–614.
ELMORE, W. 1948. The transient response of damped linear networks with particular regard to wide-band
amplifiers. J. Appl. Phys. 19, 1, 55–63.
FRIEDMAN, E. 2001. Clock distribution networks in synchronous digital integrated circuits. Proc. IEEE 89, 5,
665–692.
GARG, S. AND MARCULESCU, D. 2009a. 3D-GCP: An analytical model for the impact of process variations on the
critical path delay distribution of 3D ICs. In Proceedings of the International Symposium on Quality of
Electronic Design. 147–155.
GARG, S. ANDMARCULESCU, D. 2009b. System-Level process variability analysis andmitigation for 3DMPSoCs.
In Proceedings of the Design, Automation and Test in Europe Conference. 604–609.
HARRIS, D. AND NAFFZIGER, S. 2001. Statistical clock skew modeling with data delay variations. IEEE Trans.
VLSI Syst. 9, 6, 888–898.
HASHIMOTO, M., YAMAMOTO, T., AND ONODERA, H. 2005. Statistical analysis of clock skew variation in H-tree
structure. In Proceedings of the International Symposium on Quality of Electronic Design. Vol. 88. 402–
407.
ITRS. 2009. International technology roadmap for semiconductors. http://www.itrs.net/
JIANG, X. ANDHORIGUCHI, S. 2001. Statistical skewmodeling for general clock distribution networks in presence
of process variations. IEEE Trans. VLSI. Syst. 9, 5, 704–717.
JOYNER, J. W., VENKATESAN, R., ZARKESH-HA, P., DAVIS, J. A., ANDMEINDL, J. D. 2001. Impact of three-dimensional
architectures on interconnects in gigascale integration. IEEE Trans. VLSI. Syst. 9, 6, 922–928.
JOYNER, J. W., ZARKESH-HA, P., AND MEINDL, J. D. 2004. Global interconnect design in a three-dimensional
system-on-a-chip. IEEE Trans. VLSI. Syst. 12, 4, 367–372.
KATTI, G., STUCCHI, M., DE MEYER, K., AND DEHAENE, W. 2010. Electrical modeling and characterization of
through silicon via for three-dimensional ICs. IEEE Trans. Electron. Dev. 57, 1, 256–262.
KIM, T.-Y. AND KIM, T. 2010. Clock tree embedding for 3D ICs. In Proceedings of the Asia and South Pacific
Design Automation Conference. 486–491.
LIOU, J., CHENG, K.-T., KUNDU, S., AND KRSTIC´, A. 2001. Fast statistical timing analysis by probabilistic event
propagation. In Proceedings of the IEEE/ACM Design Automation Conference. 661–666.
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
Effect of Process Variations in 3D Global Clock Distribution Networks 20:25
MALAVASI, E., ZANELLA, S., CAO, M., USCHERSOHN, J., MISHELOFF, M., AND GUARDIANI, C. 2002. Impact analysis of
process variability on clock skew. In Proceedings of the International Symposium on Quality Electronic
Design. 129–132.
MONDAL, M., RICKETTS, A. J., KIROLOS, S., RAGHEB, T., LINK, G., VIJAYKRISHNAN, N., AND MASSOUD, Y. 2007.
Thermally robust clocking schemes for 3D integrated circuits. In Proceedings of the Design, Automation
and Test in Europe Conference. 1206–1211.
NASSIF, S. 2000. Delay variability: Sources, impacts and trends. In Proceedings of the IEEE International
Solid-State Circuits Conference. 368–369.
NEWMAN, M., MUTHUKUMAR, S., SCHUELEIN, M., DAMBRAUSKAS, T., DUNAWAY, P.A., JORDAN, J.M., KULKARNI, S.,
LINDE, C.D., OPHEIM, T.A., STEINGEL, R.A., WORWAG, W., TOPIC, L.A., AND SWAN, J.M. 2006. Fabrication and
electrical characterization of 3D vertical interconnects. In Proceedings of Electronic Components and
Technology Conference. 394–398.
NIMO ASU. 2008. ASU predictive technology model. http://ptm.asu.edu/
ORSHANSKY, M., MILOR, L., CHEN, P., KEUTZER, K., AND HU, C. 2000. Impact of systematic spatial intra-chip
gate length variability on performance of high-speed digital circuits. In Proceedings of the IEEE/ACM
International Conference on Computer-Aided Design. 62–67.
PAVLIDIS, V. AND FRIEDMAN, E. 2009a. Three-Dimensional Integrated Circuit Design. Morgan Kaufmann.
PAVLIDIS, V. AND FRIEDMAN, E. 2009b. Interconnect-Based design methodologies for three-dimensional inte-
grated circuits. Proc. IEEE 97, 1, 123–140.
PAVLIDIS, V., SAVIDIS, I., AND FRIEDMAN, E. 2008. Clock distribution networks for 3-D integrated circuits. In
Proceedings of the IEEE Custom Integrated Circuits Conference. 651–654.
REDA, S., SI, A., AND BAHAR, R. 2009. Reducing the leakage and timing variability of 2D ICs using 3D ICs. In
Proceedings of the IEEE/ACM International Symposium on LowPower Electronics andDesign. 283–286.
RESTLE, P., CARTER, C. A., ECKHARDT, J. P., KRAUTER, B. L., MCCREDIE, B. D., JENKINS, K. A., WEGER, A. J.,
AND MULE, A. V. 2002. The clock distribution of the power4 microprocessor. In Proceedings of the IEEE
International Solid-State Circuits Conference. Vol. 88, 144–145.
SAVIDIS, I. AND FRIEDMAN, E. 2008. Electrical modeling and characterization of 3-D vias. In Proceedings of the
IEEE International Symposium on Circuits and Systems. 784–787.
SUNDARESWARAN, S., NECHANICKA, L., PANDA, R., GAVRILOV, S., SOLOVYEV, R., AND ABRAHAM, J. A. 2008. A timing
methodology considering within-die clock skew variations. In Proceedings of the IEEE International
SOC Conference. 351–356.
TE´LLEZ, G. E. AND SARRAFZADEH, M. 1997. Minimal buffer insertion in clock trees with skew and slew rate
constraints. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 16, 4, 333–342.
WONG, S.-C., LEE, G.-Y., AND MA, D.-J. 2000. Modeling of interconnect capacitance, delay, and crosstalk in
VLSI. IEEE Trans. Semicond. Manufact. 13, 1, 108–111.
XU, H., PAVLIDIS, V., AND MICHELI, G. D. 2010. Process-Induced skew variation for scaled 2-D and 3-D ICs. In
Proceedings of the IEEE/ACM System Level Interconnect Prediction Workshop. 17–24.
YANG, J., PAK, J., ZHAO, X., AND LIM, S. 2011. Robust clock tree synthesis with timing yield optimization for
3D-ICs. In Proceedings of the IEEE/ACM Asia South Pacific Design Automation Conference. 621–626.
ZARKESH-HA, P., MULE, T., AND MEINDL, J. D. 1999. Characterization and modeling of clock skew with process
variations. In Proceedings of the IEEE Custom Integrated Circuits Conference. 441–444.
ZHAO, X. AND LIM, S. K. 2010. Power and slew-aware clock network design for through-silicon-via (TSV) based
3D ICs. In Proceedings of the Asia and South Pacific Design Automation Conference. 175–180.
Received February 2011; revised June 2011; accepted September 2011
ACM Journal on Emerging Technologies in Computing Systems, Vol. 8, No. 3, Article 20, Pub. date: August 2012.
