Abstract-The effect of process variations on the clock skew in three dimensional (3-D) circuits with multiple clock domains is investigated. In 3-D ICs, the combined effect of inter-die and intra-die process variations should be considered in the design of clock distribution networks. A statistical clock skew model incorporating spatially correlated intra-die process variations is employed to describe this effect. The clock skew is shown to change in different ways with the allocation of the clock domains within the 3-D circuit. Various schemes to assign the clock domains are investigated. Different scenarios of inter-die and intra-die process variations and an intra-die spatial correlation model are applied to these schemes. An approach where each physical plane corresponds to a single clock domain is shown to be inferior to other clocking schemes for specific variation scenarios. Tradeoffs between the number of clock domains within a physical plane and the number of planes a clock tree spans are discussed and related design guidelines are offered.
I. INTRODUCTION
3-D integration emerges as a potent solution to alleviate the increasing interconnect delay in modern ICs [1] . Considering the important synchronization issue, the reduced interconnect latency can be exploited to either relax the clock skew constraints or further increase the speed of a circuit.
Clock skew is, typically, defined as the difference between the propagation delays of the clock signal from the source to the sinks of the clock distribution network. There is a plethora of methods to manage the excessive clock skew in the design phase [2] . Careful physical design, however, does not eliminate the undesirable skew since the unwanted skew can be also introduced in the fabrication phase. In this phase, the primary sources of process variations include fluctuations of the gate length, doping concentrations, oxide thickness, and inter-layer dielectric thickness [3] , [4] .
The resulting process variations are generally divided into inter-die and intra-die variations. Inter-die variations affect the characteristics of devices independently among dice, but the devices within one die are uniformly affected. Intra-die variations affect the characteristics of devices unequally within one die. The inclusion of both the intraand inter-die process variations is required in the analysis and design of 3-D clock distribution networks.
The effect of process variations on the performance of 3-D clock distribution networks is discussed in [5] , where a single clock domain is considered. The focus of this paper is on the effect of process variations on the clock skew of potential 3-D synchronization architectures with multiple clock domains. The case studies include regular clock networks that globally distribute the clock signal in a 3-D stack [6] , [7] . The proposed model can also be used to analyze synthesized 3-D clock trees [8] , [9] . The resulting skew variations in synthesized clock trees also depend on the efficiency of the synthesis technique. Since the intention is to investigate the effect of process variations rather than the efficiency of a 3-D clock tree synthesizer, such as in [8] , [9] , regular structures, such as H-trees are explored.
The considered 3-D technologies include these manufacturing processes where multiple physical planes bonded with different means are electrically connected by through silicon vias (TSVs) [10] . In such 3-D ICs, a clock tree can span more than one plane, where each plane is fabricated separately.
Simulation results indicate that in 3-D ICs with multiple clock domains, the resulting skew variation depends on the assignment of the clock domains and on the relation between the inter-die and intradie variations. Moreover, the spatial correlation of intra-die variations [11] - [13] is shown to be non-negligible when analyzing the processinduced skew in multiple-domain 3-D clock trees. Consequently, the objectives of this paper are 1) to determine the behavior of skew variations in 3-D ICs with multi-clock domains, 2) to include spatially correlated process variations in statistical skew analysis for 3-D clock trees, 3) and to provide a set of design guidelines, thereby decreasing the variability of clock skew within multi-domain 3-D clock H-trees.
The remainder of the paper is organized as follows. A statistical skew model for 3-D clock trees considering the spatially correlated intra-die process variations is introduced in the following section. The investigated multi-domain 3-D clock trees are discussed in Section III. Simulation results and a comparison of various multi-domain 3-D H-tree topologies are presented in Section IV. Design guidelines are also provided. The conclusions are drawn in Section V. IN 
II. SPATIALLY CORRELATED INTRA-DIE VARIATION MODELS

3-D CLOCK TREES
The statistical skew model for 3-D ICs presented in [5] is employed and extended to include the spatial correlation of intra-die process variations. Since the planes of a 3-D IC are usually fabricated separately, the inter-die process variations are considered independent from plane to plane and uniform for the devices within one plane [14] . For a clock tree spanning N planes, the distribution of the skew S 1,2 between sinks s1 and s2 considering inter-die (die-to-die (D2D)) variations is described by a Gaussian distribution [5] ,
The D2D skew variation in plane j is denoted as S D2D 1,2(j) . The number of buffers along the paths to sinks a and b in plane j is n s 1 (j) and n s 2 (j) , respectively. The delay of the i th buffer in plane j along the path to sink s 1 is denoted as d s 1 (j,i) .
Alternatively, the intra-die process variations affect the delay of buffers within one plane non-uniformly. This effect consists of a random and a systematic component modeled as a Gaussian distribution and an analytic spatial correlation function, respectively [11] Modeling spatial correlations using quad-tree partitioning [12] . [12] . Between planes, the intra-die variations are still considered as independent. The distribution of the skew with intra-die (within-die (WID)) variations is, therefore, written as
The spatial correlation between buffers b1 and b2 within one plane is described by the covariance, Cov(b 1, b2). Both the random WID variations and the spatial correlations are considered herein. For the random WID variations, the covariance between different buffers is always zero, i.e.. Expression (4) can be rewritten as
Consequently, σ S WID
1,2(j)
increases as the number of buffers along the related paths increases.
The spatial correlation model (multi-correlation) is based on the statistical timing analysis method proposed in [12] . A multi-level quad-tree partitioning is used and the intra-die variations of a device are divided into l levels, as illustrated in Fig. 1 [12] . At the l th level, there are 4 l−1 regions. The intra-die variations of a buffer, for example, b 1 are described by the sum of the variations of all the regions that b1 belongs
where Δdi,j is the delay variation caused by intra-die variations in region (i, j) (where b 1 is located) at the i th level, as illustrated in Fig.  1 . The covariance between two buffers b 1 and b2 within one plane is Both the random and multi-correlated skew variation models are implemented for multi-domain 3-D clock trees. The investigated 3-D clock distribution networks are presented in the following section.
III. MULTIPLE CLOCK DOMAINS FOR 3-D ICS
In 3-D circuits, the clock trees belonging to different clock domains can be located in the same or different planes. Various approaches to distribute multi-domain 3-D clock trees are discussed in this section.
A straightforward idea is to assign each clock domain to a single plane, as illustrated in Fig. 2 . For each clock domain, a PLL is assumed to generate the clock signal for the corresponding clock network. In this scenario, excluding any synchronization requirement between different clock domains, the impact of D2D process variations expressed by (1) can be eliminated. Only WID variations need to be considered.
As illustrated in Fig. 2 , the sinks of a clock domain are distributed across the entire plane. Long interconnects and a large number of buffers can be, consequently, required. Each clock tree can be significantly affected by WID variations. An approach to mitigate this problem is to decrease the total wire length of the tree, by distributing the clock registers to other planes. In this case, several clock domains are integrated in one plane, as illustrated in Fig. 3(a) . The design of the 3-D clock H-tree within each domain is based on [5] , [7] .
In Fig. 3(a) , each clock tree spans four planes through TSVs. The skew variation within each clock domain is affected by the D2D variations in all the four planes. The topology illustrated in Fig. 3(b) produced by combining the topologies in Figs. 2 and 3(a) provides another approach to manage the effect of D2D and WID variations. A comparison of different D2D and WID variation scenarios for the investigated 3-D circuits with multiple clock domains is presented in the following section.
IV. SIMULATION RESULTS AND DISCUSSION
The multi-domain 3-D clock trees discussed in Section III are analyzed with the extended models of skew variations. Several combinations of D2D and WID process variations are simulated to investigate the efficiency of different allocations of the clock domains within a 3-D stack.
The PTM model for a 90 nm technology node is used [15] . The characteristics of TSVs are extracted based on [10] . An eight-plane 3-D IC (10 mm × 10 mm per plane), envisioning highly complex 3-D systems, with eight clock domains is simulated. There are 128 clock sinks within each clock domain i.e., 1024 sinks in total. A clock buffer is inserted at each sink driving the downstream devices (e.g., a cluster of flip-flops or a local clock mesh). Clock buffers are inserted into the clock trees after [16] , where the constraint on the slew rate is 8.8 mV/ps. The electrical characteristics of the clock networks are listed in Table I . The output resistance and input capacitance of the buffers, the resistance and capacitance per unit length of the interconnects, and the resistance and capacitance of the TSVs are denoted by R b , C b , r, c, Rv, and Cv respectively. Four schemes of multiple clock domains are investigated: (A) one clock domain per plane (see Fig. 2 ), (B) two clock domains per plane, each spanning four planes, (C) four clock domains per plane each traversing two planes (similar to Fig. 3(b) ), and (D) eight clock domains each extending in all of the planes (similar to Fig. 3(a) ). Note that the total number of clock domains remains the same for all four schemes; the distribution of these domains among and within the planes, however, changes. The objective is to determine the scheme with the lowest skew variations within each domain. The sinks located the farthest within one domain demonstrate the largest skew variation S max [5] , e.g., Smax = S1,3 between s1 and s3 in Fig. 2 . The smallest skew variation S min is S1,2, a typical trait of an H-tree.
The variations of the gate length (lmos) of both the NMOS and PMOS are considered [14] . Other sources of variations can also be A clock tree of Scheme (A) is simulated through SPICE. The waveform of the clock signal at sink s1 is illustrated in Fig. 4 . The slew rate at the sinks is well constrained by the buffer insertion. The delay variation due to the process variations, however, is significant. The delay variation with l mos − 3σ lmos and lmos + 3σ lmos (variation 1) is -1.2 ns and 1.1 ns, respectively.
The accuracy of the original statistical model compared with Monte-Carlo simulations has been demonstrated in [5] . The largest and lowest skew variations within a clock domain are reported for the four clock schemes (Schemes A, B, C, and D) and the three variation scenarios (Scenarios 1, 2, and 3) in Table II . The lowest σ among different schemes are reported in bold face. The two models of the WID variations are compared in the following subsections.
A. Uncorrelated WID Variations
In this case, the WID variations are assumed to be independent among the devices within one plane. As reported in Table II , Scheme A produces the highest σ Smax for all the three scenarios of variations. This behavior is because the horizontal area (i.e., wire length) occupied by each tree in scheme A is the greatest among the four schemes, requiring the largest number of buffers. As described by (1) and (3), the skew variations of scheme A are higher than the other schemes.
For clock schemes B, C, and D, σ Smax varies significantly with the allocation of 3-D clock trees to the planes. Note that although reducing the horizontal area of a tree helps to decrease the WID variations, Scheme D does not produce the smallest σ Smax . The reason is that Scheme D introduces a larger number of buffers connected to a TSV in different planes. The effect of D2D variations, therefore, increases. As reported in Table II , Scheme C produces the smallest σ Smax in all the three variation scenarios.
As the number of planes that a clock tree spans increases, the load capacitance connected to a TSV increases. Consequently, more buffers are inserted along the path from the last branching point to the TSV. For these pairs of sinks which are in short distance, this increase in the number of buffers along this specific path has a greater effect than the decreasing number of buffers for the entire 
B. Multi-Level Correlations of WID Variations
In this case, the correlation of WID variations is modeled by (7) . As reported in the column "Multi-correlation" in Table II , the behavior of the investigated clocking schemes differs from the other correlation models.
For all the three variation cases, extending a clock domain to multiple planes produces a smaller σ Smax , as compared with Scheme A. For "D2D > WID", this decrease in σS max increases as the number of planes that a tree spans increases. For "D2D < WID" and "D2D = WID", extending the clock tree to all the planes does not, however, produces the smallest σ Smax . Consequently, the efficiency of extending a clock tree to multiple planes depends on the relation between D2D and WID variations. For σ S min , in this correlation model, the behavior of the four clocking schemes is similar to the independent WID correlation. σS min increases as the number of planes that a clock tree spans increases. 
V. CONCLUSIONS
The effect of process variations in 3-D ICs with multiple clock domains is investigated. To accurately model the effect of spatially correlated intra-die variations, the statistical skew model for 3-D clock trees is extended to incorporate a method that describes spatial correlations.
Various approaches to allocate multiple clock domains in a 3-D IC are investigated. Simulation results show that for different scenarios of inter-and intra-die process variations and different WID variation models, these approaches exhibit different characteristics in reducing the skew variations within each domain. Assigning one clock domain to each physical plane does not always result in the clock distribution network with the lowest skew variations. A set of guidelines is provided to improve the performance of 3-D ICs by limiting the variations of the clock skew.
