I. INTRODUCTION
A N OMNIPRESENT and challenging issue for synchronous digital circuits is the reliable distribution of the clock signal to the many hundreds of thousands of sequential elements distributed throughout a synchronous circuit [1] , [2] . The complexity of this task is further exacerbated in 3-D integrated circuits (ICs) as sequential elements belonging to the same clock domain (i.e., synchronized by the same clock signal) can be located on multiple planes. Another fundamental issue in the design of clock distribution networks is low power consumption, since the clock network dissipates a significant portion of the total power consumed by a synchronous circuit [3] , [4] . This constraint is stricter for 3-D ICs due to the higher power density and related thermal concerns. In 2-D circuits, symmetric interconnect structures, such as Hand X-trees, are widely utilized to distribute the clock signal across a circuit [2] . The symmetry of these structures permits the clock signal to arrive at the leaves of the tree at approximately the same time, resulting in synchronous data processing. Maintaining this symmetry within a 3-D circuit, however, is a difficult task.
An extension of an H-tree to three dimensions does not guarantee equidistant interconnect paths from the root to the leaves of the tree. The clock signal propagates through vertical interconnects, typically implemented by through silicon vias (TSVs) from the output of the clock driver to the center of the H-tree on the other planes. The impedance of the TSVs can increase the time for the clock signal to arrive at the leaves of the tree on these planes as compared to the time for the clock signal to arrive at the leaves of the tree located on the same plane as the clock driver. Furthermore, in a multiplane 3-D circuit, three or four branches can emanate at each branch point, as depicted in Fig. 1 . The third and fourth branches propagate the clock signal to the other planes within the 3-D circuit. Similar to a design methodology for a 2-D H-tree topology, the width of each branch is reduced by a third (or more) of the segment width preceding the branch point to match the impedance at that branch point [2] . This requirement, however, is difficult to achieve as the third and fourth branches are connected through a TSV.
Global signaling issues in 3-D circuits, such as clock signal distribution, have only recently been explored [5] - [9] . Recent studies consider thermal effects on buffered 3-D clock trees [10] and H-tree topologies [11] , [12] . No experimental characterization of 3-D clock distribution networks, however, has yet been presented. Measurements from a 3-D test circuit employing several clock distribution architectures are presented in this paper. The test circuit was fabricated by the MIT Lincoln Laboratories (MITLL) [13] , [14] .
The objective of the paper is to summarize the analysis of different 3-D clock distribution topologies for both skew (and therefore delay) and power consumption. Clock distribution networks of increasing asymmetry in a 3-D stack are also investigated. Analysis of the skew and power consumption provides enhanced understanding of the advantages and disadvantages of each topology, and aides in the design of the synchronous circuitry in 3-D integrated systems. In addition, the effect of the TSVs on distributing the clock signal within a stack of device planes is described. The effect that these structures have on propagating clock signals in a fabricated 3-D circuit is demonstrated and described for the first time in this paper.
In the following section, the design of the 3-D test circuit is described. A brief discussion of the MITLL process is provided in Section III. Experimental results and a discussion of the characteristics of the three clock distribution networks are presented in Section IV. Simulations of the clock distribution topologies including expressions modeling the 3-D via impedance are compared to experimental results in Section V. Some conclusions are offered in Section VI. The closed-form expressions characterizing the impedance of the 3-D via are provided in Appendix A, and the circuit parameters used to model the clock skew within the 3-D clock topologies are summarized in Appendix B.
II. DESIGN OF THE 3-D TEST CIRCUIT
The test circuit consists of three blocks. Each block includes the same logic circuit but implements a different clock distribution architecture. The total area of the test circuit is 3 mm 3 mm, where each block occupies an approximate area of 1 mm . Each block contains about 30,000 transistors with a power supply voltage of 1.5 V. The design kit used for the implementation has been developed by North Carolina State University. 1 The common logic circuitry within each clock module is described in Section II-A, and the different clock distribution architectures are reviewed in Section II-B.
A. 3-D Circuit Architecture
The logic circuit common to the three blocks is described in this section. An overview of the logic circuitry is depicted in Fig. 2 . The function of the logic is to emulate different switching patterns of the circuit and operating conditions for the clock distribution networks under investigation. The logic is repeated in each plane and includes three pseudorandom number generators (PNG), a six-by-six bit crossbar switch, control logic for the crossbar switch, several groups of four-bit counters, and current loads. 1 [Online]. Available: http://www.ece.ncsu.edu/erl/3DIC/pub The pseudorandom number generators use linear feedback shift registers and XOR operations to generate a random 16-bit word every clock cycle once the generators are initialized [15] . The data flow in this circuit can be described as follows. After resetting the circuit, the PNG are initialized and the control logic connects each input port to the appropriate output port. Since the control logic includes an eight-bit counter, each input port of the crossbar switch is successively connected every 256 clock cycles to each output port.
The output ports of the crossbar switch, each 16 bits wide, are connected to four 4-bit counters (see Fig. 2 ). Each of these counters is loaded with a four-bit word, counts upwards, and is loaded with a new word each time all four bits are equal to one. The MSB of each counter is connected to four current loads that are turned on when this bit is equal to one. Since the counters are loaded with random numbers through the crossbar switch, the current loads draw a variable amount of current during circuit operation. This randomness is used to mimic different switching patterns that can exist within a circuit.
The current loads are implemented with cascode current mirrors, as shown in Fig. 3 . The reference current is externally provided to control the amount of current drawn from the circuit. The gate of transistor M5 is connected to the MSB of a four-bit counter, shown in Fig. 3 as the sel signal. This additional device is used to switch the current sinks. The width of the devices shown in Fig. 3 is 600 nm and 2000 nm. Several TSVs are used to connect these circuits. For example, in each block, each PNG is placed on a different plane but at the same location within each plane. Vertical busses connect the output of the PNGs to the input ports of the switch. Additionally, interplane signals connect the current loads with the control signal, which is generated by the MSB of the 4-bit counters. Furthermore, the reset signal is distributed by the TSVs to each sequential element throughout the multiple planes.
Several capacitors are included in each circuit block and serve as an extrinsic decoupling capacitance which is implemented by MIM capacitors [16] . Additionally, each of the circuit blocks is supplied by separate power and ground pads (three pairs of power and ground pads per block) to ensure that each block can be individually tested. Furthermore, one pair of power and ground pads is connected to the pad ring to provide protection from electrostatic discharge and provide power and ground to the I/O drivers.
B. 3-D Clock Topologies
Several clock network topologies for 3-D ICs are described in this section. These architectures combine different topologies which are commonplace in 2-D circuits, such as H-trees, rings, and meshes [2] . Each of the three blocks includes a different clock distribution structure, which is schematically illustrated in Fig. 4 . The dashed lines depict vertical interconnects implemented by groups of through silicon vias. Multiple TSVs at the connection points between the clock networks are used to lower the resistance of the vertical path while enhancing reliability.
As shown in Fig. 4 , these topologies range from purely symmetric to highly asymmetric networks to investigate the different features of these topologies. A primary objective is to determine the effect of the TSVs on the clock skew. The symmetry of the H-tree topology should be sufficient if the effect of the TSVs is small (for this specific technology). Alternatively, load balancing the global rings may reduce the delay of the clock signal caused by the TSVs. Local meshes may be preferable since the distribution of the clock signal to the sinks is primarily limited within a physical plane. Stacks of TSVs subsequently connect the sinks on other planes through local rings. This topology offers the advantage of limiting most of the clock paths within one physical plane, while distributing the signal vertically to localized areas within neighboring planes.
The effect that these topological choices have on the clock skew, power dissipation, and signal slew are experimentally investigated. Since the clock signal is distributed in three dimensions, achieving equidistant signal propagation in a 3-D system is not straightforward. This task is further complicated by the different impedance characteristics of the vertical and horizontal interconnects. Consequently, the objective is to provide a global clock topology that produces sufficiently low skew (or predictable skew for delay compensation) within (intra-plane) and among (inter-plane) the planes of a 3-D circuit. The symmetry of an H-tree and the load balancing characteristics of rings and meshes are thereby exploited. Additionally, the power consumed by each 3-D clock architecture is considered due to the importance of thermal issues in 3-D circuits.
In each of the circuit blocks, the clock driver for the entire clock network is located on the second plane. The location of the clock driver is chosen to ensure that the clock signal propagates through identical vertical interconnect paths to the first and third planes, ideally resulting in the same delay. The clock driver is implemented with a traditional chain of tapered buffers [17] , [18] . Additionally, buffers are inserted at the leaves of each H-tree in all three topologies. The width of the branches within the H-tree is halved at each branch point [19] , with an initial width of 8 m.
Note that in a 3-D circuit employing an even number of planes, the inherent symmetry of an H-tree topology in the vertical direction is not possible, increasing the inter-plane clock skew between specific planes, as depicted in Fig. 4 . Therefore, for a 3-D technology supporting physical planes, where is even, symmetry along the vertical direction is not feasible. Placing the clock driver in plane or results in an increase in skew between the first and th plane equal to the effective delay of the group of TSVs connecting the two successive planes. This increase in delay can be compensated by placing fewer TSVs (with a higher effective impedance) in the vertical direction.
The architectures employed in the blocks are as follows. Block A: All of the planes contain a four level H-tree (i.e., equivalent to 16 leaves) with identical interconnect characteristics. All of the H-trees are connected through a group of TSVs at the output of the clock driver. Note that in Fig. 4 (a) the H-tree on the second plane is rotated by 90 with respect to the H-trees on the other two planes. This rotation eliminates inductive coupling between the H-trees. All of the H-trees are shielded with two parallel lines connected to ground. Block B: A four level H-tree is included in the second plane. All of the leaves of this H-tree are connected by four TSVs to small local rings on the first and third planes, as illustrated in Fig. 4(b) . As in Block A, the H-tree is shielded with two parallel lines connected to ground. Additional interconnect resources form local rings. Due to the limited interconnect resources, however, achieving a uniform mesh in each ring is difficult. Clock routing is constrained by the power and ground lines as only three metal layers are available on each plane [13] , [14] . Block C: The clock distribution network for the second plane is a shielded four level H-tree. Two global rings are utilized for the other two planes, as shown in Fig. 4(c) .
Buffers are inserted to drive each ring, which are connected by TSVs to the four branch points on the second level of the H-tree. The rings on planes A and C are connected to the second level of the H-tree for two reasons; first, to avoid an unnecessarily long ring that would result in a significant capacitive load, and second, to maintain a ring with sides of equal length. Additionally, connecting the ring to the leaves at the perimeter of the H-tree results in a considerable difference in the load among the sinks of the tree, since only the outer leaves are connected to the ring. The registers in each plane are connected either directly to the rings on the first and third planes or are driven by buffers at the leaves of the H-tree on the second plane. With this arrangement, the balancing properties of the rings results in low interplane skew for the first and third planes. In addition, since the interplane path to these planes is the same for both planes, the skew between these two planes is low as compared to the H-tree topology in the second plane. A primary objective of this paper is to evaluate the delay and power characteristics of different clock distribution architectures. A secondary and related objective is to analyze the characteristics of asymmetric topologies in 3-D systems. This objective poses several limitations on the power distribution network within each block. Power and ground rings at the periphery of each block are utilized. Although this architecture is not optimal, the structure is sufficiently small 1 mm . The small size of the blocks does not cause a significant voltage drop across and among the planes. In addition, minimal noise is observed during circuit operation. The power and ground rings on each plane are connected by a large number of TSVs to lower the impedance of the vertical interconnections. The local rings topology requires greater area for distributing power and ground in the first and third planes, where local rings distribute the clock signal. The resulting clock and power networks for planes A and C are illustrated in Fig. 5 for this specific block. In this topology, the power distribution network consists of a coarse mesh of power and ground lines [16] .
III. FABRICATION OF THE 3-D TEST CIRCUIT
The manufacturing process developed by MITLL for fully depleted silicon-on-insulator (FDSOI) 3-D circuits is summarized here [13] , [14] . The MITLL process is a wafer level 3-D [20] , [21] . An intermediate step of the fabrication process is illustrated in Fig. 6 , where some salient features of this technology are also depicted. SOI technologies are particularly suitable for 3-D circuits, since the SOI device layers can be used for both monolithic [22] and wafer level 3-D integrated systems. In the latter case, SOI is a better solution for 3-D circuits because it is possible to aggressively etch the wafers as compared to standard bulk CMOS technologies [23] . This situation is due to the high selectivity of the etching solutions, where a Si to selectivity of 300:1 is possible [23] . This capability results in significantly shorter through silicon vias, a critical component in 3-D systems. The primary obstacle for 3-D SOI technologies is the high thermal resistance of the oxide which impedes the heat removal process.
As depicted in Fig. 6 , this process includes both face-to-face and face-to-back plane bonding. The TSV length, however, is not affected due to the aggressive etching feasible with this technology. Alternatively, employing a bulk CMOS technology can require TSVs of longer length due to the greater thickness of the silicon substrate [24] . In the context of clock skew, the presence of TSVs with different lengths increases the asymmetry in the vertical direction, requiring more careful design to balance the clock signal delay across the plane. The horizontal interconnect is partitioned into segments by the TSVs, which also affects the clock signal delay.
IV. EXPERIMENTAL RESULTS
The clock distribution network topologies of the 3-D test circuit are evaluated in this section. The fabricated circuit is depicted in Fig. 7(a) , where the individual blocks can be distinguished. A magnified view of one block is shown in Fig. 7(b) . Each block includes four RF pads for measuring the delay of the clock signal. The pad located at the center of each block provides the input clock signal. The clock input waveform is a sinusoidal signal with a dc offset, which is converted to a square waveform at the output of the clock driver. The remaining three RF pads are used to measure the delay of the clock signal at specific points on the clock distribution network within each plane. A buffer is connected at each of these measurement points. The output of this buffer drives the gate of an open drain transistor connected to the RF pad.
A clock waveform acquired from the topology combining an H-tree and global rings is illustrated in Fig. 8 , demonstrating circuit operation at 1.4 GHz. The clock skew between the planes of each block is listed in Table I . The delay of the clock signal from the RF input pad at the center of each block to the measurement point on plane is denoted as in Table I . For example, denotes the delay of the clock signal to the measurement point on plane A. The clock delay from the source node on the second plane to each leaf on the three separate planes is listed in Table II . Differences between the data listed in Tables I   TABLE II and II are due to the method of analysis for the clock skew and clock delay, respectively. The clock skew between each leaf is the average skew as determined from data samples collected at 10, 20, 40, 80, 160, 500, and 1000 MHz. The clock delay is the average root to leaf delay from data collected at frequencies of 500 and 1000 MHz. The difference in the delay of the clock signal between two measurement points on planes and is notated as .
For the H-tree topology, the clock signal delay is measured from the root to a leaf of the tree on each plane, with no additional load connected to these leaves. The skew between the leaves of the H-tree on planes A and C (i.e., ) is effectively the delay of a stacked TSV traversing the three planes transferring the clock signal from the target leaf to the RF pad on the third plane (plane C). A schematic of this topology including a path of the clock signal is shown in Fig. 9 . The delay of the clock signal to the sink of the H-tree on the second plane is larger due to the additional capacitance coupled to that quadrant of the H-tree. This capacitance is intentional on-chip decoupling capacitance placed under the quadrant, increasing the measured skew of and . This topology produces, on average, comparable skew to the global ring topology, and less skew than the local rings clock structure.
The measured and average slew for each block is reported in Table III . The measurements are for a clock frequency of 1 GHz, where the time resolution is sufficiently small to produce reasonable accuracy 1.22 ps . From the reported results, any undershoots during the falling edge increase significantly as compared to during the rise time. The mismatch between the size of the devices in the clock buffers also contributes to unbalanced clock edges, although from simulations, equal rise and fall times are demonstrated.
In the H-tree topology, each leaf of a tree is connected to only those registers located within the same plane. Allowing one sink of an H-tree to drive a register on another plane adds the delay of another TSV to the clock signal path, further increasing the delay. Consequently, the registers within each plane are connected to the H-tree on the same plane. Note that this approach does not require the data paths to be within one plane.
The clock skew among the planes is greater for the local ring topology as compared to the H-tree and global ring topologies, primarily due to the imbalance in the clock load for certain local rings. Indeed, this topology has only 16 tap points within the global clock distribution network; three times fewer than the H-tree topology illustrated in Fig. 4(a) . This difference can produce a considerable load imbalance, greatly increasing the local clock skew as compared to the local clock skew within the H-tree and global ring topologies. By inserting the local rings on planes A and C, connected to the 16 sinks of the H-tree on the second plane, the local clock skew is significantly larger than either the H-tree only, or H-tree with global ring topologies.
Consequently, a limitation of the local rings topology is that greater effort is required to control the local skew. The fewer number of sinks driven by the global clock distribution network increases the number of registers clocked by each sink. To better explain this situation, consider a segment of each topology, as shown in Figs. 10(a) and (b), respectively. For the H-tree topology, the clock signal is distributed from three sinks, one on each plane, to the registers within the circular area depicted in Fig. 10(a) . Note that the radius of the circle on planes A and C is slightly smaller to compensate for the additional delay of the clock signal due to the impedance characteristics of the TSVs. The registers located within these regions satisfy specific local skew constraints. Alternatively, in the case of the local ring topology, the clock signal at the sinks of the H-tree on the second plane feeds registers on each of the three planes. Consequently, each sink of the tree connects to a larger number of registers as compared to the H-tree topology, as depicted by the shaded region in Fig. 10(b) . Despite the beneficial effect of the local rings, load imbalances are more pronounced with this topology. Alternatively, the H-tree topology [see Fig. 4(a) ] utilizes a significant amount of interconnect resources, dissipating greater power.
The clock distribution network with the global rings exhibits low skew for planes A and C, those planes that include the global rings. The objective of this topology is to evaluate the effectiveness of a less symmetric architecture in distributing the clock signal within a 3-D circuit. Although the clock load on each ring is non-uniformly distributed, the load balancing characteristic of the rings yields a relatively low skew between the planes. Since the clock distribution network on the second plane is implemented with an H-tree, the skew between adjacent planes is significantly larger than the skew between the top and bottom planes. Note that the sinks of the H-tree on plane B are located at a greater distance from the rings on planes A and C [see Fig. 4(c)] . A combination of H-tree and global rings, consequently, is not a suitable approach for 3-D circuits due to the difficulty in matching the distance that the clock signal traverses on each plane from the sink of the tree or the ring to the many registers distributed across a plane.
The measured power consumption of the blocks operating at 1 GHz is reported in Table IV . The local ring topology dissipates the lowest power. This topology requires the least interconnect resources for a global clock network, since the local rings are connected at the output of the buffers located on the last level of the H-tree on the second plane. In addition, this topology requires a small amount of local interconnect resources as compared to the H-tree and global rings topologies. Most of the registers are connected directly to the local rings. Alternatively, the power consumed by the H-tree topology is the highest, as this topology requires three H-trees and additional wiring for the local connections to the leaves of each tree. In addition, the largest number of buffers is included in this topology. This number is threefold as compared to the number of buffers used for the local ring topology. Finally, the global rings block consumes slightly less power than the H-tree topology due to the reduced amount of wiring resources used by the global clock network.
Although the local ring topology requires the least interconnect resources, a large number of TSVs is required for the interplane connections. Since the TSVs block all of the metal layers and occupy silicon area, the routing blockage increases considerably as compared to the H-tree topology. The global rings topology requires a moderate number of TSVs as only four connections between the vertices of the rings and the branch points of the H-tree are necessary.
Since 3-D integration greatly increases the complexity of designing an integrated system, a topology that offers low overhead during the design process of a 3-D clock distribution network is preferable. From this perspective, a potential advantage of the H-tree topology is that each plane can be individually analyzed. This approach is supported by the H-tree topology since the clock distribution network in each plane is exclusively connected to registers within the same plane. Alternatively, in the local ring topology, registers from all of the planes, which are connected to each sink of the tree on the second plane, all need to be simultaneously considered.
V. MODELS OF THE CLOCK DISTRIBUTION NETWORK TOPOLOGIES INCORPORATING THE 3-D VIA IMPEDANCE
Simulation of the fabricated clock distribution topologies incorporating the modeled electrical impedance of the interplane 3-D vias is described in this section. A comparison between the simulated and experimental results is also presented here. The electrical impedance of the 3-D vias is described for several diameters, lengths, dielectric thicknesses (bulk), and via-to-via spacings [25] , [26] . The extracted parameters are used in the closed-form expressions characterizing the 3-D via impedance [26] . These equations are used here to model the contribution of the 3-D vias to the delay and skew characteristics of the clock distribution topologies and are summarized in Appendix A.
In addition to characterizing the electrical parameters of the TSVs, the electrical characteristics of the clock distribution network on each plane are determined through numerical simulation. This set of simulations has been performed for the three widths used in the fabricated test circuit, and for five different lengths. Trend lines for the capacitance, dc resistance, 1 GHz resistance, dc self-and mutual inductance, and the asymptotic self-and mutual inductance approximate the electrical parameters of different length interconnect segments within the clock network. These simulations include two ground return paths spaced 2 m from either side of the clock line. These return paths behave as ground for the electrical field lines emanating from the clock line, resulting in a more accurate estimate of the capacitance.
The electrical paths of the clock signal propagating from the root to the leaves of each plane for the H-tree clock topology [see Fig. 4(a) ] is depicted in Fig. 11 . The size of the source follower nMOS transistor and the dimensions of the clock buffers at the root, leaves, and output circuitry are included in Appendix B. The clock network on each plane is composed of 50 m segments, where a -model represents the electrical properties of each segment. These 50 m segments model the distributive electrical properties of the interconnect. Similarly, when either meshes [see Fig. 4(b) ] or rings [see Fig. 4(c) ] are used on planes A and C (see Fig. 11 ), each 50 m segment is replaced with an equivalent -model to more accurately represent the single mesh and ring structure within the test circuit. Note that for the mesh structures, the clock signal is distributed to planes A and C from the leaves of the H-tree in plane B while for the rings topology, the clock signal distributed to planes A and C is driven by buffers at the second level of the H-tree. The delay from the root to the leaves of each plane is included in Table V .
The clock delays listed in Table V are compared with the measured values listed in Table II . Good agreement between the model and experimental data is shown. The per cent error between the model and experimental clock delays is listed in Table VI . A maximum error of less than 10% is achieved for the clock paths within the H-tree topology. The larger errors shown in Table VI are due to the small time scale being examined. Tables II and V are less than 550 ps; therefore, any small deviation in delay produces a large error.
All of the values listed in

VI. CONCLUSION
The design of a clock distribution network for application to 3-D circuits is considerably more complex than the design of a 2-D clock distribution network. Three topologies to globally distribute a clock signal within a 3-D circuit have been evaluated. A 3-D test circuit, based on the MITLL 3-D IC manufacturing process, has been designed, fabricated, and measured and is shown to operate at 1.4 GHz. Clock skew simulations incorporating both numerical simulation and analytic expressions produce comparable results to the experimentally extracted clock skew measurements. The clock skew measurements indicate that a topology combining the symmetry of an H-tree on the second plane and global rings on the remaining two planes results in low clock skew in 3-D circuits while consuming a moderate amount of power. This structure, however, produces the largest root to leaf clock delay as compared to the other investigated topologies. Alternatively, for the H-tree and local rings topology, the lowest power is consumed. The performance characteristics of these topologies suggest that the target requirements should be considered when designing a 3-D clock distribution network.
APPENDIX A ELECTRICAL PARAMETERS OF A THROUGH SILICON VIA
The resistance of a 3-D via [26] is (1) where and are the TSV length and radius, respectively, and is the conductivity of tungsten. The effect of the skin depth is included in (2) , which reduces the cross-sectional area of the 3-D via. An empirical constant is used to fit the 1 GHz resistance to the simulation data. The equations for and are provided in (4)-(6), respectively,
For frequencies other than dc and 1 GHz, the values produced by (1)-(3) are adjusted by (7)
The inductance of a 3-D via is described by (8)-(11) [26] . These four equations express the selfand mutual inductance of the vias at both dc and high frequency. The expressions for the high frequency inductance represent the asymptotic value of the inductance. The range of frequencies for which the closed-form inductance expressions are valid is depicted in Fig. 12 . The dc and high frequency self-inductance of a TSV is described by (8) and (9), respectively, while the dc and high frequency mutual inductance of a TSV is described by (10) and (11), respectively, for
for (9) for (10) for (11) The inductance expressions are dependent on the length and radius of the TSV. The radius is replaced by the pitch for the expressions characterizing the mutual inductance between two TSVs. The parameter, used to adjust the partial self-inductance, approaches unity at dc and 0.94 at high frequencies with increasing aspect ratio , where D is the diameter of the TSV. The parameter, used to adjust the partial mutual inductance, is unity at dc and ranges between 0.49 and 0.93 at high frequencies with increasing aspect ratio [26] . Both and are included in (12) and (13), and (14) and (15), respectively. Each parameter is determined at dc and at high frequency [26] ,
The capacitance of a bulk 3-D via [26] is (16) where is the TSV radius, is the TSV length, is the thickness of the dielectric surrounding the 3-D via, is the depletion region depth of p-type silicon, and and are the electrical permittivity of silicon dioxide and silicon, respectively. The depletion region is dependent on the p-type silicon work function , where is the intrinsic semiconductor concentration, is the silicon doping concentration m , The and fitting parameters adjust the capacitance for two physical factors: 1) the distance to the ground plane, and 2) the diminishing contribution of the upper portion of the 3-D via to the total capacitance relative to a ground plane below the via. These fitting parameters are (19) (20)
The resistance, inductance, and capacitance expressions are compared to numerical simulations for the TSV structures used in the MITLL multiproject wafer (for 3-D via parameters, 1 m, 8.5 , and 5 m) in Table VII . The equivalent electrical model of the TSV is shown in Fig. 13 .
APPENDIX B CIRCUIT PARAMETERS TO MODEL THE CLOCK SKEW OF THE 3-D CLOCK TOPOLOGIES
The circuit parameters used to model the skew within the clock network are provided below. The dimensions of the buffer circuits at the root, leaves, and output circuitry are listed in Table VIII . Two sets of transistor widths are provided as each location is double buffered to maintain the same signal logic level. The dimensions of the ring and a single mesh are listed in Table IX . These lengths are composed of 50 m long segments, and each segment is replaced with an equivalent model for a line width of 4 m. The source follower nMOS transistor located in the output circuitry has a length of 180 nm, and a width of 12 m. The interconnect length connecting the output pads and output circuitry to the leaves on each of the three device planes varies from 0 to 150 m depending upon the clock topology (line width of 2 m), and is also represented by an equivalent model. 
