Technology scaling and three-dimensional integration are two design paradigms that offer high device density. Process variations affect these design paradigms in different ways. The effect of process variations on clock skew for a 2-D circuit implemented at scaled technology nodes and for a 3-D circuit with an increasing number of planes is investigated in this paper. An accurate model used to describe the effect of the proper sources of variations on each of these design approaches is proposed. The distribution of the pair-wise skew variation is obtained for single scaled or multi-plane (not scaled) clock distribution networks. The accuracy of the presented statistical skew model is verified through Monte-Carlo simulations. As shown in this paper, the clock skew variation due to technology scaling and/or die stacking exhibits a considerably different behavior. A comparison between these two design paradigms is offered such that the appropriate technology node and number of planes are selected to produce a low clock skew variation and high operating frequency. A popular global clock tree topology is employed in a planar (2-D) circuit where technology scaling is applied and in a 3-D circuit with an increasing number of planes. For this clock tree topology, the maximum supported clock frequency increases from 2.75 GHz to 3.74 GHz by proper die-stacking at a 90 nm technology node. 3-D integration is shown to be an alternative to reduce skew variation without the need of aggressive technology scaling.
INTRODUCTION
Technology scaling has been the driving force to increase integration density for the past decades. Three-dimensional integration emerges as an alternative design paradigm that can also increase integration density [1] . This increase in density is achieved by plane stacking without necessarily scaling the devices, alleviating problems related to scaling, such as process variations. In very deep sub-micrometer technologies process variations significantly complicate the IC design process [2, 3] . One of the affected design tasks is the increasing difficulty to produce clock distribution networks with specific performance characteristics. This situation is due to the increase and variation of clock skew, which can limit the maximum operating frequency supported by a clock distribution network.
Clock skew is, typically, defined as the difference between the propagation delays of the clock signal from the source to the sinks of the clock distribution network. High clock frequencies severely constrain clock skew to allow an as large as possible portion of the clock period for data processing. Reducing the interconnect latency can allow relaxed clock skew constraints or a greater circuit speed.
Clock skew is introduced at the design and fabrication stages and during the operation of ICs. There is a plethora of methods to manage the excessive clock skew in the design phase [4] [5] [6] . Careful physical design, however, does not guarantee the elimination of undesirable skew since the unwanted skew can be introduced in other phases. Process variability is demonstrated to be an important issue in vertically integrated circuits [7] . The effect of process variations on the performance of 3-D clock distribution networks, however, has not been considered.
The primary sources of process variations include fluctuations of the gate length, doping concentrations, oxide thickness, and interlevel dielectric (ILD) thickness [2, 8] . The resulting process variations are generally divided to inter-die (die-to-die) and intra-die (within-die) variations. Inter-die variations affect the characteristics of devices independently among dice, but the devices within one die are uniformly affected. Intra-die variations affect the characteristics of devices unequally within one die. Intra-die variations can be characterized as either systematic or random [9] [10] [11] . Since both intra-and inter-die process variations are present in a 3-D IC, the modeling of these process variations is required in the analysis and design of 3-D clock distribution networks.
The effect of intra-and inter-die process variations on the clock skew for scaled 2-D ICs and 3-D ICs with a different number of planes is discussed in this paper. This effect is investigated for a typical global H-tree topology. The proposed model of skew variation can be also used to analyze other 3-D tree topologies generated by clock tree synthesis techniques [6, 12] .
Results indicate that technology scaling significantly decreases skew variations for clock trees. 3-D integration also decreases skew variation as the number of planes comprising the circuit increases. This reduction, however, is smaller as compared to technology scaling. It is demonstrated that by exploiting both design approaches, the lowest skew variation can be achieved.
The remainder of this paper is organized as follows. Existing techniques for skew analysis in 2-D circuits are briefly reviewed in the following section. A statistical model of the delay of critical paths in 3-D ICs is also discussed. The problem of skew analysis for 3-D clock distribution networks is formulated in Section 3. An accurate clock skew model for 3-D clock trees considering the impact of process variations is described in Section 4. Simulation results and a comparison of 3-D clock trees and scaled 2-D trees are presented in Section 5. The conclusions are drawn in Section 6.
RELATED WORK
Several techniques for analyzing the effect of process variation on the clock skew have been developed for 2-D circuits emphasizing intra-die variations. Some of these techniques are briefly summarized in this section.
For 2-D ICs, clock skew variations are estimated either by corner analysis or statistical analysis [13] . Corner analysis is useful for estimating inter-die process variations for 2-D ICs but this method often introduces pessimism in the timing of a circuit. Another method for statistical clock skew analysis based on Monte Carlo simulation is introduced in [14] ; the computational time of this method is, however, prohibitively high for large scale ICs. Several statistical skew modeling and timing analysis methods considering intra-die variations are presented in [13, 15, 16] to efficiently analyze skew variations. Although statistical skew analysis has been explored in 2-D ICs, the resulting methods cannot be directly applied to 3-D systems. In 2-D ICs, since the inter-die process variations uniformly affect the devices within a circuit, the majority of the skew analysis methods emphasizes the intradie variations neglecting the impact of inter-die variations.
Recent work analyzing the effect of process variations on the performance of 3-D ICs is presented in [3, 17] , where the impact of process variations on the delay of critical paths is studied. The distribution of the maximum delay of datapaths is determined by considering that the datapaths have no common segments. Since the clock paths to different sinks can have some common segments, these models are not directly applicable to determine the skew of multiplane clock distribution networks.
The partial correlation of the clock paths and both the inter-and intra-die process variations should be considered in statistical skew analysis for 3-D clock distribution networks. The problem of clock skew modeling under these variations in 3-D ICs is discussed in the following section. 
PROBLEM DESCRIPTION
The problem of skew analysis for 3-D clock distribution networks considering process variations is described in this section. Since both inter-and intra-die process variations should be considered in 3-D clock distribution networks, the paths starting from the same node cannot be independently treated. Consequently, the previous methods used for determining global skew are not applicable for 3-D clock distribution networks.
Since only the clock skew between the sequential elements which transfer data between each other (data-related sequential elements) affects the performance of a circuit [5, 16] , in addition to global skew, appropriate pair-wise skew distributions are adopted to evaluate the performance of clock distribution networks [16] .
The H-tree is a common topology used to globally distribute the clock signal within a circuit [4, 5] . A typical buffered 3-D H-tree is illustrated in Fig. 1 [1, 18] . The pairwise clock skew used in the following analysis is defined as the skew between every pair of sinks in 3-D clock distribution networks, M skew = {mij|mij = Di − Dj, 1 ≤ i, j ≤ S}. mij denotes the skew between sinks si and sj. The clock delay to sinks si and sj is denoted by Di and Dj, respectively. The number of clock sinks is S.
The number of buffers (determined by the length of interconnects) in clock trees affect the distribution of the delay of clock paths. The area and the number of physical planes constituting a circuit, therefore, significantly affect mij and the highest supported clock frequency. By investigating the effect of process variations on M skew , the appropriate technology node and number of planes is selected producing a low M skew and supporting a high clock frequency.
CLOCK SKEW MODELING FOR 3-D CLOCK TREES
The distribution of clock skew considering inter-and intradie process variations in 3-D clock trees is modeled in this section. The model adopted to obtain the distribution of buffer delay is first presented in Section 4. 
Buffer Delay Distribution
The variation of the buffer delay originates from the device parameters deviating from their nominal values, as statistically modeled in [8, 14] . The fluctuation of the buffer delay is typically approximated as being linear to the device parameter variations [16, 19] . Alternatively, the variation in delay can be determined through the variations of the input capacitance and output resistance [13] . The second method is enhanced in this paper to more accurately obtain the buffer delay and the correlation between different buffers. The interconnects constituting the 3-D circuit are modeled as distributed RC wires.
The delay variation of buffers can be estimated by extracting the variations of the device characteristics [8] . Nevertheless, the variation in buffer delay also depends upon the slew rate of the input signal and the load driven by this buffer. The circuit illustrated in Fig. 2 is utilized to obtain the variation of buffer delay for different slew rates of the input signal and RC load. Let Rin denote the output resistance of a buffer. Interconnects with diverse impedance characteristics are modeled by setting different Rint and Cint. Different slew rates of the input signal to the buffer in Fig. 2 are, consequently, investigated.
For a step input signal, the Elmore delay [20] variation from source S to nodes I and O in Fig. 2 , respectively, is
where ∆C b , ∆R b , and ∆D b are the variation of the input capacitance, the output resistance, and the intrinsic delay of the buffer, respectively. Monte Carlo simulations are performed to evaluate the delay variation at nodes I and O where C l is set to two different values (e.g., 0 and 200 fF). The probability distribution function (PDF) of ∆C b , ∆R b , and ∆D b is obtained through (1) and (2) . An example of a buffer consisting of two inverters based on a UMC 90 nm CMOS technology [21] is simulated. The PDF of ∆C b , ∆R b , and ∆D b is illustrated in Fig. 3 .
The set up of the circuit and the resulting variations of the characteristics of the buffer are reported in Table 1 . Slew 1 and slew 2 denote two different slew rates at node I due to two dissimilar pairs of Rint and Cint. The standard deviation σ of the distribution of ∆R b , ∆C b , and ∆D b is shown to non-negligibly depend on the input slew. When the slew rate increases, the distribution of ∆C b and ∆D b becomes narrower, while the distribution of ∆R b widens. The statistical delay characteristics of the buffers are, therefore, more accurately captured by considering the different slew rate of the input signal. Since the slew rate and load capacitance of the clock buffers are carefully balanced in well designed Table 1 .
clock trees [22] , there is a small set of Rint and Cint which needs to be simulated for a clock tree. The process variations include inter-die (D2D) and intradie (WID) variations, which are independent from each other [10] . ∆R b , ∆C b , and ∆D b are each expressed by the summation of the terms produced by D2D and WID variations.
The values of D2D and WID variations can be obtained from experimental data [10, 11] or statistical parameters model files [21] . The distribution of D2D and WID variations are modeled as a Gaussian distribution. The mean value and standard deviation are obtained through Monte-Carlo simulations with industrial libraries [21, 23] .
The Delay Distribution of a 3-D Clock Path
An example of a 3-D clock path is illustrated in Fig. 4 . The devices on different physical planes are connected by Through Silicon Vias (TSVs) [24] , which, in turn, are modeled as RC wires of different resistance and capacitance as compared to the horizontal wires (e.g., RT SV and CT SV in Fig. 4) .
For a clock path consisting of buffers i − 1, i, and i + 1, through (1) and (2), the component of the variation of the clock delay ∆di related to the delay variation of buffer i but unrelated to the delay variation of other buffers along the same path, is rewritten as Figure 4 : Electrical model of a segment of a clock path.
The prime ( ) denotes the nominal (mean) values. The ∆R b(i) , ∆C b(i) , and ∆D b(i) are assumed to be fully correlated. According to (3), ∆di is approximated as a Gaussian distribution,
Consequently, for a 3-D clock path to a sink s which includes ns clock buffers, the variation of the delay is expressed as the summation of expression (3) applied to each buffer along the path,
∆di.
The WID and D2D sources of ∆Ds are introduced in the following sub-sections, respectively.
WID Variation Model for the Delay of 3-D Clock Paths
The correlation of the impact of WID variations on different buffers can be systematic or independent. The systematic WID variations often exhibit spatial correlation [11, 25] , which can be modeled as a combination of multi-level random WID variations [25] . Consequently, a model for the random WID variations is utilized herein, following the methods used by other statistical skew analysis and process variation analysis papers [13, [15] [16] [17] . The validity of this approach is discussed and demonstrated in these publications. According to (3)- (7), the distribution of the delay of sink s due to WID variations is a Gaussian distribution and the PDF is
D2D Variation Model for the Delay of 3-D Clock Paths
The variation of the delay of 3-D clock paths due to the D2D process variations is the sum of the variations of the buffer delay in all the planes,
where N is the number of the planes that the clock tree spans. ∆D = N (0, (
where n s(j) is the number of buffers located in plane j along this clock path and d for any j = k. Consequently, according to (9) , the distribution of ∆D D2D s is also a Gaussian distribution and the resulting PDF is
).
The distribution of ∆Ds is approximated as a Gaussian distribution through (8) and (11). Some of the above expressions are widely used in modeling process variations. Based on these expressions, a model for the skew variation of 3-D clock trees is proposed in the following section
Clock Skew Distribution in 3-D Clock Trees
For a 3-D clock tree with S sinks distributed in N planes, the skew variation ∆m sk between sink s and sink k is ∆m sk = ∆m 
Skew Model of 3-D Clock Trees with WID Variations
WID variations affect the delay of each buffer independently. According to (8) , the distribution of ∆m
where n s,k is the number of the buffers shared by the clock paths ending at sinks s and k, respectively, as depicted in Fig. 5 .
Skew Model of 3-D Clock Trees with D2D Variations
Unlike WID variations, the skew fluctuation caused by D2D variations is not obtained by adding independent random variables. The correlation between every two terms in the expression of ∆m D2D sk can be one or zero (i.e., fully correlated or uncorrelated, respectively). The skew variation ∆m D2D sk can be written as the sum of the terms in different planes,
where ∆d D2D s(j,i) is the D2D delay variation related to the i th buffer on the j th plane along the clock path ending at sink s. The number of buffers in j th plane along this path is denoted as n s(j) .
In (16) , all the random variables are equally affected by the D2D variations in plane j, which means that the correlation between every two variables is one. Since ∆d is also described by a Gaussian distribution,
According to (13) through (19), the skew variation ∆m sk between j and k in a 3-D clock tree is modeled as a Gaussian distribution,
This model is used in the following section to investigate the skew variation in 3-D ICs and scaled 2-D ICs for different number of planes and technology nodes, respectively.
SIMULATION RESULTS
Based on the analysis results in Section 4.3, a global H-tree topology is investigated highlighting the impact of process variations on the clock skew in scaled 2-D and 3-D circuits. The accuracy of the variation model is verified in Section 5.1. Simulation results for scaled 2-D ICs and 3-D ICs with a different number of planes are presented and compared in Section 5.2. The behavior of the skew variation for these design paradigms is also discussed.
Accuracy of the Clock Skew Variation Model for 3-D ICs
The proposed skew variation model is compared with MonteCarlo simulations in this section. The circuit used for this purpose is an H-tree clock distribution network. This Htree is assumed to be placed in a circuit with area 10 mm × 10 mm.
The circuit is assumed to be implemented at a 90 nm CMOS technology. The parameters of the interconnects are obtained from the PTM 90 nm interconnect model [26] . The clock buffers consist of two inverters connected in series based on a UMC 90 nm library [21] . The circuit parameters used in the following sections are listed in Table 2 . rint and Figure 6 : The H-tree used to examine the accuracy of skew variation model. cint are the interconnect resistance and capacitance per unit length, respectively. The physical and electrical characteristics of the TSVs are also listed in Table 2 and are based on data reported in [24] . The diameter, length, and pitch of TSVs are, respectively, notated as tsv , ltsv, and ptsv. The topology of the H-tree used to examine the accuracy of the skew variation model is illustrated in Fig. 6 . The clock source is located at the center of the chip. There are eight clock sinks, labeled from one to eight. The clock buffers, which are marked with , are inserted following the technique described in [22] . The wire segments between two buffers are simulated using a standard π model. Cadence Spectre is used for the Monte-Carlo simulations [23] . The Monte-Carlo simulation is repeated 1000 times.
Skews m1,2 and m1,8 are measured to illustrate the accuracy of the adopted model. The curves of cumulative distribution functions (CDF) from Spectre and the proposed skew variation model are shown in Fig. 7 .
The difference between the resulting standard deviation σ of Spectre simulation and the skew variation model is within 10% between any pair of sinks in the investigated clock tree. As shown in Figs. 7(a)-7(b) , the distribution of the clock skew determined by the skew variation model has a reasonable accuracy as compared with Monte-Carlo simulations.
Skew Variation of Scaled 2-D and 3-D Clock Trees
The skew variation for a typical global H-tree topology in scaled 2-D and 3-D ICs are investigated in this section. The global H-tree topology is shown in Fig. 1 . This Htree topology is selected among others [6, 12, 27] since this topology provides a fairer comparison between the planar and 3-D global clock trees. In this topology, the clock signal is propagated to the sinks in other planes by multiple TSVs.
Skew Variation of Clock Trees in Scaled 2-D ICs
For scaled 2-D ICs, since the investigated H-tree topology is symmetric, the number, size, and location of buffers along the paths are equal. The D2D variations, therefore, affect these clock paths equally. Consequently, only WID variations affect the variation of pair-wise skew.
The σm ij between each pair of sinks of a 2-D H-tree is illustrated in Fig. 8 . Since σm ij = σm ji , only half of the skew matrix is shown in Fig. 8 for clarity. In this example, the electrical parameters are identical to those parameters used in Section 5.1. The area of the circuit is 10 mm × 10 mm. There are 256 sinks in the clock tree. A sink can be either a local sub-tree, a clock mesh, or a cluster of registers. X and Y axes denote the indices of sinks ordered as described in Fig. 1 . As shown in Fig. 8 , σm ij changes with the location of sinks i and j. The σm ij between the sinks which do not share any segment of a clock path is the largest.
The 2-D H-tree is implemented at 90 nm, 65 nm, 45 nm, 32 nm, and 22 nm technology nodes to investigate the effect of process variations on clock skew with technology scaling. The number of sinks is considered to remain invariable throughout these nodes. The area of the circuit is assumed to scale with the feature size of the transistors. The electrical characteristics of the wires and buffers are obtained from ITRS [28] . The resulting largest σm ij of the 2-D clock tree at different technology nodes including process variations is illustrated in Fig. 9 , where two variation scenarios are investigated.
In Fig. 9 , variation 1 is based on the assumption that
, and
remain constant for all the technology nodes. Variation 2 is based on the assumptions that 3σL ef f = 10%L ef f and 3σV th = 30 mV [26] , where L ef f is the effective channel length and V th is the threshold voltage. As shown in Fig. 9 , σm ij decreases with the technology scaling for both scenarios. This situation is due to the decrease in σR b , σC b , σD b , and the decrease in the circuit area and, consequently, in wire length with technology scaling. The σm ij for variation 2 is larger than variation 1, since for variation 2,
increase with the decreasing feature size. This situation is due to the effect of process variations which is predicted to increase with technology scaling [2, 28] .
Skew Variation in 3-D Clock Trees
As described by (15)- (19) , for 3-D clock trees, the corresponding clock skew is additionally affected by D2D variations. The skew variation between the sinks in different planes vary spatially. The σm ij between each pair of sinks of a 3-D clock tree is illustrated in Fig. 10 .
In this example, a 3-D clock tree spanning eight planes is implemented using the topology shown in Fig. 1 . The electrical parameters of the wires are given in Section 5.1. There are 32 sinks in each plane and 256 sinks in total. Sinks 1 to 32 are located in the first plane and sinks 33 to 64 are located in the second plane, etc. As an example, consider the σ of the skew between sinks 3 and 4. This variance is determined by the z value of the point x = 3 and y = 4. As shown in Fig. 10 , σm ij depends on the location of sinks i and j. The σm ij between the sinks in the same plane is the smallest.
For 3-D ICs implemented at the same technology node, the clock buffers are inserted by uniform buffer insertion techniques under the same constraints of skew and slew rate. Rin and C l of each buffer, therefore, remain approximately the same. For a 3-D clock tree, as described by (14) , the σ m W ID (s,k) between two sinks decreases as the number of non- shared clock buffers (e.g., the buffers after the n s,k buffers in Fig. 5 As the number of planes increases, however, both the number of buffers connected to a TSV and the length of TSVs increase. The load of the buffers driving the TSVs increases, which results in the increase of σ d(i) , as described by (3) . This situation counteracts the benefit of decreasing the number of clock buffers in a plane. Consequently, for the 3-D H-tree, the distribution of pair-wise skew changes non-monotonically as the number of planes increases.
The maximum skew variation of 3-D clock trees spanning different planes is reported in Table 3 . The nominal value (µ) and the standard deviation (σ) of the maximum skew between the first and the topmost planes are reported in Table 3 . Note that µ depends upon the clock tree topology. In 2-D H-trees, µ is equal to zero. In 3-D H-trees, the TSVs introduce larger mean skew with the increasing number of planes. The variance is determined by process variations. The importance of process variations is described by µ/σ. The skew variation is shown to be dominant as µ/σ < 2%.
The maximum supported clock frequency fmax of a circuit is constrained by the skew [5] . Although fmax is typically determined by the critical path delay, the skew criterion is used here to offer a tangible explanation of the effect of process variations on the performance of circuits. The maximum tolerant skew is assumed to be 10% 1 fmax for the simulated 3-D clock trees. To achieve a timing yield higher than 99%, fmax should be smaller than 10%
, where mij(3σ) is the skew at the 3σ point from the mean value.
The fmax determined by the skew variation is reported in the last row of Table 3 . The reported fmax changes nonmonotonically with the number of planes, from 2.75 GHz to 3.74 GHz. Consequently, the maximum supported clock frequency of this 3-D clock tree can be improved up to 36% by selecting the proper number of planes. For a specific technology node, increasing the number of planes can decrease the effect of process variations as compared with a planar circuit, which is also indicated in [3] .
The skew variation due to technology scaling and 3-D integration is reported in Table 4 and illustrated in Fig. 11 . In Table 4 , the first column denotes the technology node and the first row denotes the number of planes of a 3-D circuit. As reported in Table 4 and shown in Fig. 11 , the skew variation decreases monotonically with technology scaling but non-monotonically with 3-D integration.
From Table 4 and Fig. 11 , the skew variation is shown to decrease more with technology scaling as compared to the increase in the number of planes for a 3-D circuit. For example, the skew variation along column two (scaled 2-D ICs) in Table 4 , decreases more as compared to the reduction in skew variation along the third row (multiplane circuit). Note, however, that the speed of the circuit as determined by the critical path delay increases with the number of planes more as compared with the scaled 2-D circuit as also discussed in [3, 17] . In addition, a multiplane circuit with a comparable or smaller skew variation can be used instead of a planar 2-D circuit at a smaller technology node. For example, a three plane 3-D IC at 65 nm exhibits similar skew variation with a planar circuit at 45 nm as listed in Table  4 . The same applies for a four plane 3-D IC at 90 nm and a planar circuit at 65 nm. At deep submicrometer nodes, however, directly increasing the number for a 3-D IC implemented at an older technology node cannot decrease the skew variation as effectively as a deeply scaled planar 2-D circuit.
CONCLUSIONS
The effect of process variations of devices on the clock skew of scaled 2-D ICs and 3-D ICs is analyzed in this paper. An accurate method to estimate the skew variation within both 2-D and 3-D clock trees considering the effect of the input slew on the clock buffers is presented. Employing the proposed skew model and using an industrial 90 nm CMOS technology as the baseline technology, skew variation is observed to decrease in different ways between technology scaling and 3-D integration.
A planar global H-tree and an extension of this topology in three dimensions are investigated. Although the skew variation in 2-D ICs with technology scaling decreases more than 3-D integration, for the investigated 3-D clock trees, the resulting maximum supported clock frequency is enhanced up to 36% by selecting the proper number of planes. 3-D inte-gration provides an alternative to offer high device density and low process-induced skew variation without the need of technology scaling. In addition, a combination of technology scaling and vertically integration produces the lowest skew variation in the clock distribution network.
