Skew Variability in 3-D ICs with Multiple Clock Domains by Xu, Hu et al.
Skew Variability in 3-D ICs with Multiple Clock Domains
Hu Xu, Vasilis F. Pavlidis, and Giovanni De Micheli
LSI - EPFL, CH-1015, Switzerland
Email: {hu.xu, vasileios.pavlidis, giovanni.demicheli}@epﬂ.ch
Abstract—The effect of process variations on the clock skew in three
dimensional (3-D) circuits with multiple clock domains is investigated. In
3-D ICs, the combined effect of inter-die and intra-die process variations
should be considered in the design of clock distribution networks. A
statistical clock skew model incorporating spatially correlated intra-die
process variations is employed to describe this effect. The clock skew
is shown to change in different ways with the allocation of the clock
domains within the 3-D circuit. Various schemes to assign the clock
domains are investigated. Different scenarios of inter-die and intra-die
process variations and an intra-die spatial correlation model are applied
to these schemes. An approach where each physical plane corresponds to
a single clock domain is shown to be inferior to other clocking schemes
for speciﬁc variation scenarios. Tradeoffs between the number of clock
domains within a physical plane and the number of planes a clock tree
spans are discussed and related design guidelines are offered.
Index Terms—3-D ICs, clock tree, process variations, clock skew,
multiple clock domains.
I. INTRODUCTION
3-D integration emerges as a potent solution to alleviate the
increasing interconnect delay in modern ICs [1]. Considering the
important synchronization issue, the reduced interconnect latency can
be exploited to either relax the clock skew constraints or further
increase the speed of a circuit.
Clock skew is, typically, deﬁned as the difference between the
propagation delays of the clock signal from the source to the sinks
of the clock distribution network. There is a plethora of methods to
manage the excessive clock skew in the design phase [2]. Careful
physical design, however, does not eliminate the undesirable skew
since the unwanted skew can be also introduced in the fabrication
phase. In this phase, the primary sources of process variations
include ﬂuctuations of the gate length, doping concentrations, oxide
thickness, and inter-layer dielectric thickness [3], [4].
The resulting process variations are generally divided into inter-die
and intra-die variations. Inter-die variations affect the characteristics
of devices independently among dice, but the devices within one die
are uniformly affected. Intra-die variations affect the characteristics
of devices unequally within one die. The inclusion of both the intra-
and inter-die process variations is required in the analysis and design
of 3-D clock distribution networks.
The effect of process variations on the performance of 3-D clock
distribution networks is discussed in [5], where a single clock
domain is considered. The focus of this paper is on the effect of
process variations on the clock skew of potential 3-D synchronization
architectures with multiple clock domains. The case studies include
regular clock networks that globally distribute the clock signal in a
3-D stack [6], [7]. The proposed model can also be used to analyze
synthesized 3-D clock trees [8], [9]. The resulting skew variations in
synthesized clock trees also depend on the efﬁciency of the synthesis
technique. Since the intention is to investigate the effect of process
This work is funded in part by the Swiss National Science Foundation
(No. 260021 126517/1), European Research Council Grant (No. 246810
NANOSYS), and Intel Braunschweig Labs, Germany.
variations rather than the efﬁciency of a 3-D clock tree synthesizer,
such as in [8], [9], regular structures, such as H-trees are explored.
The considered 3-D technologies include these manufacturing pro-
cesses where multiple physical planes bonded with different means
are electrically connected by through silicon vias (TSVs) [10]. In
such 3-D ICs, a clock tree can span more than one plane, where
each plane is fabricated separately.
Simulation results indicate that in 3-D ICs with multiple clock
domains, the resulting skew variation depends on the assignment of
the clock domains and on the relation between the inter-die and intra-
die variations. Moreover, the spatial correlation of intra-die variations
[11]–[13] is shown to be non-negligible when analyzing the process-
induced skew in multiple-domain 3-D clock trees. Consequently, the
objectives of this paper are 1) to determine the behavior of skew
variations in 3-D ICs with multi-clock domains, 2) to include spatially
correlated process variations in statistical skew analysis for 3-D clock
trees, 3) and to provide a set of design guidelines, thereby decreasing
the variability of clock skew within multi-domain 3-D clock H-trees.
The remainder of the paper is organized as follows. A statistical
skew model for 3-D clock trees considering the spatially correlated
intra-die process variations is introduced in the following section. The
investigated multi-domain 3-D clock trees are discussed in Section
III. Simulation results and a comparison of various multi-domain 3-D
H-tree topologies are presented in Section IV. Design guidelines are
also provided. The conclusions are drawn in Section V.
II. SPATIALLY CORRELATED INTRA-DIE VARIATION MODELS IN
3-D CLOCK TREES
The statistical skew model for 3-D ICs presented in [5] is employed
and extended to include the spatial correlation of intra-die process
variations. Since the planes of a 3-D IC are usually fabricated
separately, the inter-die process variations are considered independent
from plane to plane and uniform for the devices within one plane
[14]. For a clock tree spanning N planes, the distribution of the
skew S1,2 between sinks s1 and s2 considering inter-die (die-to-die
(D2D)) variations is described by a Gaussian distribution [5],
fΔSD2D1,2
= N (0, σ2SD2D1,2 ), σ
2
SD2D1,2
=
N∑
j=1
σ2SD2D
1,2(j)
, (1)
σSD2D
1,2(j)
=
ns1(j)∑
i=1
σdD2D
s1(j,i)
−
ns2(j)∑
i=1
σdD2D
s2(j,i)
. (2)
The D2D skew variation in plane j is denoted as SD2D1,2(j). The number
of buffers along the paths to sinks a and b in plane j is ns1(j) and
ns2(j), respectively. The delay of the i
th buffer in plane j along the
path to sink s1 is denoted as ds1(j,i).
Alternatively, the intra-die process variations affect the delay of
buffers within one plane non-uniformly. This effect consists of a
random and a systematic component modeled as a Gaussian distri-
bution and an analytic spatial correlation function, respectively [11],
978-1-4244-9472-9/11/$26.00 ©2011 IEEE 2221
2,2
2,5
2,6
2,3
2,4
2,7
2,8
2,9
2,10
2,11
2,12
2,13
2,14
2,15
2,16
1,1
1,2
1,3
1,4
2,1
0,1
Fig. 1. Modeling spatial correlations using quad-tree partitioning [12].
[12]. Between planes, the intra-die variations are still considered as
independent. The distribution of the skew with intra-die (within-die
(WID)) variations is, therefore, written as
fSWID1,2
= N (0, σ2SWID1,2 ), σ
2
SWID1,2
=
N∑
j=1
σ2SWID
1,2(j)
, (3)
σ2SWID
1,2(j)
=
ns1(j)∑
i=1
σ2dWID
s1(j,i)
+
ns2(j)∑
i=1
σ2dWID
s2(j,i)
+
2
ns1(j)∑
i=1,h=1,i =h
Cov(ds1(j,i), ds1(j,h))+
2
ns2(j)∑
i=1,h=1,i =h
Cov(ds2(j,i), ds2(j,h))−
2
ns1(j)∑
i=1
ns2(j)∑
h=1
Cov(ds1(j,i), ds2(j,h)). (4)
The spatial correlation between buffers b1 and b2 within one plane
is described by the covariance, Cov(b1, b2). Both the random WID
variations and the spatial correlations are considered herein.
For the random WID variations, the covariance between different
buffers is always zero, i.e.. Expression (4) can be rewritten as
σ2SWID
1,2(j)
=
ns1(j)∑
i=1
σ2dWID
s1(j,i)
+
ns2(j)∑
i=1
σ2dWID
s2(j,i)
. (5)
Consequently, σSWID
1,2(j)
increases as the number of buffers along the
related paths increases.
The spatial correlation model (multi-correlation) is based on the
statistical timing analysis method proposed in [12]. A multi-level
quad-tree partitioning is used and the intra-die variations of a device
are divided into l levels, as illustrated in Fig. 1 [12]. At the lth level,
there are 4l−1 regions.
The intra-die variations of a buffer, for example, b1 are described
by the sum of the variations of all the regions that b1 belongs
Δdb1=
l∑
i=1
Δdi,j , (6)
where Δdi,j is the delay variation caused by intra-die variations in
region (i, j) (where b1 is located) at the ith level, as illustrated in Fig.
1. The covariance between two buffers b1 and b2 within one plane is
Cov(db1 , db2)=
l∑
i=1
Cov(di,j , di,k), (7)
Cov(di,j , di,k)=
{
σ2di,j , if j = k
0, if j = k . (8)
0
2
4
6
8
10
0
2
4
6
8
10
1
2
3
4
P
la
ne
X [mm]Y [mm]
Clock domain 1
Clock domain 2
Clock domain 3
Clock domain 4
0 10
0
10
source
s1 s2
s3
Fig. 2. A four-plane 3-D IC with four clock domains. A PLL and an H-tree
is used to generate and distribute, respectively, the clock signal within each
domain (plane). The clock sources are located at the center of each plane.
Both the random and multi-correlated skew variation models are
implemented for multi-domain 3-D clock trees. The investigated 3-D
clock distribution networks are presented in the following section.
III. MULTIPLE CLOCK DOMAINS FOR 3-D ICS
In 3-D circuits, the clock trees belonging to different clock domains
can be located in the same or different planes. Various approaches to
distribute multi-domain 3-D clock trees are discussed in this section.
A straightforward idea is to assign each clock domain to a single
plane, as illustrated in Fig. 2. For each clock domain, a PLL is
assumed to generate the clock signal for the corresponding clock
network. In this scenario, excluding any synchronization requirement
between different clock domains, the impact of D2D process varia-
tions expressed by (1) can be eliminated. Only WID variations need
to be considered.
As illustrated in Fig. 2, the sinks of a clock domain are distributed
across the entire plane. Long interconnects and a large number
of buffers can be, consequently, required. Each clock tree can be
signiﬁcantly affected by WID variations. An approach to mitigate this
problem is to decrease the total wire length of the tree, by distributing
the clock registers to other planes. In this case, several clock domains
are integrated in one plane, as illustrated in Fig. 3(a). The design of
the 3-D clock H-tree within each domain is based on [5], [7].
In Fig. 3(a), each clock tree spans four planes through TSVs. The
skew variation within each clock domain is affected by the D2D
variations in all the four planes. The topology illustrated in Fig. 3(b)
produced by combining the topologies in Figs. 2 and 3(a) provides
another approach to manage the effect of D2D and WID variations.
A comparison of different D2D and WID variation scenarios for the
investigated 3-D circuits with multiple clock domains is presented in
the following section.
IV. SIMULATION RESULTS AND DISCUSSION
The multi-domain 3-D clock trees discussed in Section III are
analyzed with the extended models of skew variations. Several
combinations of D2D and WID process variations are simulated to
investigate the efﬁciency of different allocations of the clock domains
within a 3-D stack.
The PTM model for a 90 nm technology node is used [15]. The
characteristics of TSVs are extracted based on [10]. An eight-plane
3-D IC (10 mm × 10 mm per plane), envisioning highly complex
3-D systems, with eight clock domains is simulated. There are 128
clock sinks within each clock domain i.e., 1024 sinks in total. A clock
buffer is inserted at each sink driving the downstream devices (e.g., a
2222
0
2
4
6
8
10
0
2
4
6
8
10
1
2
3
4
X [mm]Y [mm]
P
la
ne
0 10
0
10
s1 s2
s3
Clock domain 1
Clock domain 2
Clock domain 3
Clock domain 4
(a)
0
2
4
6
8
10
0
2
4
6
8
10
1
2
3
4
X [mm]Y [mm]
P
la
ne
0 10
0
10
s3
s2
s1
Clock domain 1
Clock domain 2
Clock domain 3
Clock domain 4
(b)
Fig. 3. Different assignments of clock domains in a four-plane 3-D IC. (a)
Four clock domains within each plane. (b) Two clock domains within each
plane (a total of four clock domains).
TABLE I
ELECTRICAL CHARACTERISTICS OF THE INVESTIGATED CIRCUITS.
Buffer Interconnect TSV Clock
Rb 536 [Ω] r 244.44 [Ω/mm] Rv 0.13 [Ω] Vdd 1.2 [V]
Cb 15.7 [fF] c 225.039 [fF/mm] Cv 50 [fF] fclk 1 [GHz]
cluster of ﬂip-ﬂops or a local clock mesh). Clock buffers are inserted
into the clock trees after [16], where the constraint on the slew rate
is 8.8 mV/ps. The electrical characteristics of the clock networks
are listed in Table I. The output resistance and input capacitance
of the buffers, the resistance and capacitance per unit length of the
interconnects, and the resistance and capacitance of the TSVs are
denoted by Rb, Cb, r, c, Rv , and Cv respectively.
Four schemes of multiple clock domains are investigated: (A) one
clock domain per plane (see Fig. 2), (B) two clock domains per
plane, each spanning four planes, (C) four clock domains per plane
each traversing two planes (similar to Fig. 3(b)), and (D) eight clock
domains each extending in all of the planes (similar to Fig. 3(a)).
Note that the total number of clock domains remains the same for all
four schemes; the distribution of these domains among and within the
planes, however, changes. The objective is to determine the scheme
with the lowest skew variations within each domain. The sinks located
the farthest within one domain demonstrate the largest skew variation
Smax [5], e.g., Smax = S1,3 between s1 and s3 in Fig. 2. The smallest
skew variation Smin is S1,2, a typical trait of an H-tree.
The variations of the gate length (lmos) of both the NMOS and
PMOS are considered [14]. Other sources of variations can also be
Fig. 4. Waveform of the signal at s1 with different gate lengths of MOSFET.
described by the proposed model. The resulting variations in Rb,
Cb, and the intrinsic delay of the buffers are extracted by SPICE
simulations. Three different scenarios for D2D and WID process
variations are investigated: 1) D2D variations are assumed to be
higher than the WID variations (D2D > WID). The σlmos due to D2D
and WID variations is assumed to be σD2Dlmos = 6% and σ
WID
lmos = 2%,
respectively. 2) The WID variations are dominant (D2D < WID),
σD2Dlmos = 2% and σ
WID
lmos = 6%. 3) The D2D and WID variations are
equivalent, σD2Dlmos = σ
WID
lmos = 5%.
A clock tree of Scheme (A) is simulated through SPICE. The
waveform of the clock signal at sink s1 is illustrated in Fig. 4. The
slew rate at the sinks is well constrained by the buffer insertion. The
delay variation due to the process variations, however, is signiﬁcant.
The delay variation with lmos − 3σlmos and lmos + 3σlmos (variation 1)
is -1.2 ns and 1.1 ns, respectively.
The accuracy of the original statistical model compared with
Monte-Carlo simulations has been demonstrated in [5]. The largest
and lowest skew variations within a clock domain are reported for the
four clock schemes (Schemes A, B, C, and D) and the three variation
scenarios (Scenarios 1, 2, and 3) in Table II. The lowest σ among
different schemes are reported in bold face. The two models of the
WID variations are compared in the following subsections.
A. Uncorrelated WID Variations
In this case, the WID variations are assumed to be independent
among the devices within one plane. As reported in Table II, Scheme
A produces the highest σSmax for all the three scenarios of variations.
This behavior is because the horizontal area (i.e., wire length)
occupied by each tree in scheme A is the greatest among the four
schemes, requiring the largest number of buffers. As described by (1)
and (3), the skew variations of scheme A are higher than the other
schemes.
For clock schemes B, C, and D, σSmax varies signiﬁcantly with
the allocation of 3-D clock trees to the planes. Note that although
reducing the horizontal area of a tree helps to decrease the WID
variations, Scheme D does not produce the smallest σSmax . The reason
is that Scheme D introduces a larger number of buffers connected to
a TSV in different planes. The effect of D2D variations, therefore,
increases. As reported in Table II, Scheme C produces the smallest
σSmax in all the three variation scenarios.
As the number of planes that a clock tree spans increases, the
load capacitance connected to a TSV increases. Consequently, more
buffers are inserted along the path from the last branching point
to the TSV. For these pairs of sinks which are in short distance,
this increase in the number of buffers along this speciﬁc path has
a greater effect than the decreasing number of buffers for the entire
2223
TABLE II
SKEW VARIATION ANALYSIS OF AN EIGHT-PLANE 3-D IC WITH EIGHT CLOCK DOMAINS.
WID uncorrelated Multi-correlation
Per clock tree A B C Dσskew [ps] A B C D A B C D
D2D > WID
σSmin 5.0 8.1 9.5 16.1 4.5 9.1 10.2 16.0 # of buffers 716 533 291 203
σSmax 25.8 23.6 20.0 21.8 50.6 41.3 28.3 28.1
D2D < WID
σSmin 15.1 24.8 29.2 49.5 13.7 27.7 31.1 49.3 # of TSVs 0 512 256 128
σSmax 79.0 69.2 57.7 63.5 154.7 126.1 85.3 85.8
D2D = WID
σSmin 12.6 20.6 24.2 41.0 11.3 23.0 25.8 40.8 μSmax [fs] 0 4 29 151σSmax 65.4 57.3 48.1 52.9 128.2 104.5 70.7 71.1
tree. Consequently, skew variations between the nearest sinks increase
with the number of planes a tree spans.
As WID variations increase, the skew variations of the four
clock schemes increase. In the three investigated variation scenarios,
extending a clock tree of a domain to multiple planes decreases σSmax
up to 26% as compared with Scheme A. σSmin increases, however,
up to 3.3 times.
Guideline 1. For independent WID process variations, extending
a clock domain to multiple planes of a 3-D circuit decreases the
maximum skew variation. Extending the clock tree to the greatest
supported number of planes, however, does not necessarily produce
the smallest skew variations. If most of the data-related sinks are
distributed close to each other, having one domain within each plane
can decrease the skew variations.
B. Multi-Level Correlations of WID Variations
In this case, the correlation of WID variations is modeled by (7). As
reported in the column ”Multi-correlation” in Table II, the behavior of
the investigated clocking schemes differs from the other correlation
models.
For all the three variation cases, extending a clock domain to
multiple planes produces a smaller σSmax , as compared with Scheme
A. For ”D2D > WID”, this decrease in σSmax increases as the number
of planes that a tree spans increases. For ”D2D < WID” and ”D2D
= WID”, extending the clock tree to all the planes does not, however,
produces the smallest σSmax . Consequently, the efﬁciency of extending
a clock tree to multiple planes depends on the relation between D2D
and WID variations. For σSmin , in this correlation model, the behavior
of the four clocking schemes is similar to the independent WID
correlation. σSmin increases as the number of planes that a clock tree
spans increases.
Guideline 2. For multi-level WID process variations, increasing the
number of planes a clock domain spans increases the skew variation
between the sinks with short distance. The decrease in the maximum
skew variation depends on the relation of D2D and WID variations.
V. CONCLUSIONS
The effect of process variations in 3-D ICs with multiple clock
domains is investigated. To accurately model the effect of spatially
correlated intra-die variations, the statistical skew model for 3-D
clock trees is extended to incorporate a method that describes spatial
correlations.
Various approaches to allocate multiple clock domains in a 3-D IC
are investigated. Simulation results show that for different scenarios
of inter- and intra-die process variations and different WID variation
models, these approaches exhibit different characteristics in reducing
the skew variations within each domain. Assigning one clock domain
to each physical plane does not always result in the clock distribution
network with the lowest skew variations. A set of guidelines is
provided to improve the performance of 3-D ICs by limiting the
variations of the clock skew.
REFERENCES
[1] V. F. Pavlidis and E. G. Friedman, “Interconnect-Based Design Method-
ologies for Three-Dimensional Integrated Circuits,” Proceedings of the
IEEE, Vol. 97, No. 1, pp. 123–140, January 2009.
[2] E. Friedman, “Clock Distribution Networks in Synchronous Digital
Integrated Circuits,” Proceedings of the IEEE, Vol. 89, No. 5, pp. 665–
692, May 2001.
[3] S. Nassif, “Delay Variability: Sources, Impacts and Trends,” in Proceed-
ings of the IEEE International Solid-State Circuits Conference, February
2000, pp. 368–369.
[4] P. Zarkesh-Ha, T. Mule, and J. D. Meindl, “Characterization and Model-
ing of Clock Skew with Process Variations,” in Proceedings of the IEEE
Custom Integrated Circuits Conference, May 1999, pp. 441–444.
[5] H. Xu, V. Pavlidis, and G. D. Micheli, “Process-Induced Skew Variation
for scaled 2-D and 3-D ICs,” in Proceedings of the IEEE/ACM System
Level Interconnect Prediction Workshop, June 2010, pp. 17–24.
[6] M. Mondal et al., “Thermally Robust Clocking Schemes for 3D Inte-
grated Circuits,” in Proceedings of Conference on Design, Automation
and Test in Europe, April 2007, pp. 1206–1211.
[7] V. Pavlidis, I. Savidis, and E. Friedman, “Clock Distribution Networks
for 3-D Integrated Circuits,” in Proceedings of the IEEE Custom
Integrated Circuits Conference, September 2008, pp. 651–654.
[8] X. Zhao and S. K. Lim, “Power and Slew-aware Clock Network Design
for Through-Silicon-Via (TSV) based 3D ICs,” in Proceedings of Asia
and South Paciﬁc Design Automation Conference, January 2010, pp.
175–180.
[9] T.-Y. Kim and T. Kim, “Clock Tree Embedding for 3D ICs,” in
Proceedings of Asia and South Paciﬁc Design Automation Conference,
January 2010, pp. 486–491.
[10] G. Katti et al., “Electrical Modeling and Characterization of Through
Silicon Via for Three-Dimensional ICs,” IEEE Transactions on Electron
Devices, Vol. 57, No. 1, pp. 256–262, January 2010.
[11] M. Orshansky et al., “Impact of Systematic Spatial Intra-Chip Gate
Length Variability on Performance of High-Speed Digital Circuits,” in
Proceedings of the IEEE/ACM International Conference on Computer-
Aided Design, November 2000, pp. 62–67.
[12] A. Agarwal, D. Blaauw, and V. Zolotov, “Statistical Timing Analysis for
Intra-Die Process Variations with Spatial Correlations,” in Proceedings
of the IEEE/ACM International Conference on Computer-Aided Design,
November 2003, pp. 900–907.
[13] M. Hashimoto, T. Yamamoto, and H. Onodera, “Statistical Analysis
of Clock Skew Variation in H-tree Structure,” in Proceedings of the
International Symposium on Quality of Electronic Design, Vol. 88,
No. 12, December 2005, pp. 402 – 407.
[14] S. Garg and D. Marculescu, “3D-GCP: An Analytical Model for the
Impact of Process Variations on the Critical Path Delay Distribution of
3D ICs,” in Proceedings of the International Symposium on Quality of
Electronic Design, March 2009, pp. 147–155.
[15] “ASU Predictive Technology Model.” [Online]. Available:
http://www.eas.asu.edu/∼ptm/
[16] G. E. Te´llez and M. Sarrafzadeh, “Minimal Buffer Insertion in Clock
Trees with Skew and Slew Rate Constraints,” IEEE Transactions on
Computer-Aided Design of Integrated Circuits and Systems, Vol. 16,
No. 4, pp. 333–342, April 1997.
2224
