An examination of large-scale stacking of 3D integrated ICs from a power-supply and thermal reliability perspective is presented. Noise characteristics and scaling issues related to through-silicon-via (TSV) size and pitch as well as other power-supply topology characteristics are included. Thermal simulations are carried out assuming the use of micro-fluidic heatsinks to provide cooling to systems with power dissipation of up to 525 watts and 46 integrated silicon tiers. Results indicate that these large systems are feasible given sufficient planning. Power-delivery-bump pitch is identified as the most important factor influencing IR-drop and dynamic noise. Contact resistance also may become a major limiting factor.
I. Introduction
The shift to the multi-core era has increased demand on the memory bandwidth of high performance processor systems. 3D integration of memory and processors has emerged as a potential solution to the memory bandwidth problem. However, there are several unanswered questions regarding 3D integration. In this work we study integration scaling from a power-supply and thermal perspective. As the number of tiers in a 3D system increase, the uppermost tiers become separated from the power supply bumps by increasing resistance and inductance. The cause of these parasitics is the through-silicon vias (TSVs) that provide communication between tiers. Additionally, increasing integration quickly raises the volumetric power density of the IC stack.
The largest factor creating transient noise is the so-called "first droop" noise that results from the interaction of the inductance in the package and the on-chip decap during sleep transitions. Power gating due to sleep transitions causes large transient changes in the current demand in the power supply network. In the case of 3D systems the TSVs add inductance of their own to that which naturally exists in the package.
Thermal impacts are a major factor that must be considered when designing 3D systems. For the purpose of maximum 3D integration studies standard air-cooled heatsinks will be unable to cope with the power density of these systems. Recent work has focused on implementing micro-fluidic channels [1] onto the backs of 3D stacked ICs for the purpose of removing heat using liquid-phase fluids. The heat removal capacity of these systems is very promising. In this work we assume that microfluidic heatsinks are used to dissipate heat, and design our power distribution model around this assumption.
This material is based upon work supported by the National Science Foundation under CAREER Grant No. CCF-0546382 and the Interconnect Focus Center (IFC). Previous work on power supply issues related to 3D stacking has examined the problem mainly from a packaging perspective [2] , [3] . Other works that have investigated the impact of 3D stacking have limited their scope to systems with only a small number of tiers [4] , [5] , [6] . In this work we perform detailed transient simulation of 3D stacks with up to 46 tiers and examine issues related to scaling various components of the power distribution grid topology, as well as scaling of other important components related to power integrity in maximally integrated 3D systems. Additionally, we include results related to thermal scaling of micro-fluidic cooled stacks of the same systems.
II. TSV RLC Parasitic Modeling
The focus in this work is on TSV parasitics as they apply to the power distribution network. Typical dimensions for power distribution TSVs must be much larger than for signal TSVs. In the power distribution network it is more important to reduce parasitic inductance and resistance than to save space. For our base case, and the limit study which is to follow, we assume copper TSVs with a square cross-section of 40X40 μm. For scaling, we consider TSVs that range in size from 20X20 μm to 80X80 μm. The length of the conducting path for our TSVs, equivalent to the thickness of the die through which they pass, is assumed to be 15 μm for thinned dies and 150 μm for dies that contain micro-fluidic channels or are not thinned substantially.
A. Resistance Scaling
The resistance of a metal interconnect is calculated assuming a uniform current density. This assumption is valid in the power supply grid because there is no high-frequency oscillation that would cause skin-effects to become dominant, as could be the case in signal TSVs. The resistance of a TSV, R T SV , is calculated as:
where, l is the conducting path length, ρ is the resistivity of the TSV material, and A is the cross-sectional area of the conducting path.
In this work we assume that TSVs are made from copper and we use a conservative estimation of the resistivity of 21 nΩ · m. This value should account for any thermal effects. Additionally, the sizes of the TSVs we are investigating are large enough that the resistivity should approximate the behaviour of bulk copper. Table I shows values for the resistance of some typical TSVs in the power distribution networks we simulate. The table shows that the resistance values for the thinned die can be very low, less than one milliohm.
B. Inductance Scaling
The main cause of low-frequency first-droop power-supply noise is the interaction between the inductance of the package and the capacitance on-die. By adding large, vertical TSVs for power delivery there is the chance that the power-supply noise problem may be exacerbated. The TSVs in the powersupply network have a much larger pitch than length, so the mutual-inductance of neighboring TSVs is dominated by the self-inductance of the TSVs.
The self-inductance of a TSV, L T SV , with rectangular crosssection can be calculated approximately using the following equation [7] :
where k = f (w, t) and 0 < k < 0.0025, l, w, and t are the TSV length, width, and height in cm, respectively, and L T SV is measured in nH. However, this formula breaks down when the length of the conducting path is smaller than the other two dimensions. C. Capacitance Scaling TSV capacitance can improve the dynamic performance of the power supply network. However, we provide TSV capacitance values here for the sake of completeness. The capacitance of the power/ground TSVs comes solely from horizontally neighboring wires and TSVs. Capacitance can be calculated by breaking the problem into several parts.
The capacitance of two TSVs placed in parallel is calculated as follows:
Where di is the dielectric constant of silicon dioxide, H T SV is the TSV height, H INT is the height of the TSV covered by surrounding signal wires, W T SV is the cross-sectional dimension of the TSV, and S T SV is the separation between the TSVs. The coupling capacitance of two diagonally placed TSVs is calculated as follows:
Where K corner is an empirically-derived constant dependant on geometry and spacing. The capacitance of the TSV due to surrounding wires is calculated by:
Where m sw is the number of metal layers, and C side,i are the sidewall capacitances of the various sidewalls of the TSVs. Finally, total TSV capacitance is calculated as follows:
Detailed calculation of sidewall TSV capacitance is complicated and requires consideration of neighboring wires in many orientations [9] . The values used in our experiments for capacitance were computed using the formulas above as well as with Raphael [8] and are shown in Table I .
III. Many-tier Prototype System
The system targeted by this work consists of a multi-core processor with system memory (DRAM) integrated onto the same 3D stack. This also implies the integration of a memory controller. We chose to use the Intel Penryn [10] architecture as a baseline. In the past, scaling of system memory followed processor speed. In the multi-core era system memory scaling will likely follow number of processors instead. Taking this into account we assume a constant ratio of 2 GB of DRAM per processor core. Using the next-gen memory-density assumptions of Loh [11] we create a "set" of our scalable prototype system that consists of a single tier of processors and 8 tiers of DRAM. Each tier of processors contains four cores and each tier of DRAM contains 1 GB of storage. This "set" can be stacked any number of times along with memory controllers to create a scalable many-tier system. This system is assumed to have a footprint of approximately 300 mm 2 .
Our initial target was to examine a system composed of a large number of tiers. Most previous works consider only two to four tiers, with a few works considering up to eight. We chose to examine a system with approximately 50 tiers. Using our scalable prototype design we choose five "sets" that, along with a memory controller tier, provide a system containing 46 tiers. Using the power consumption assumptions from the next section, this 46-tier system would consume approximately 525 watts. Unlike systems designed from the ground up to consider 3D integration technology, this prototype has a relatively low signal TSV requirement. Only signals that are currently sent off-chip need to be sent between tiers. Given the much finer possible pitch of TSVs with respect to off-chip bumps, this does not stress the limits of the technology.
Our system design target contains a large number of components that dissipate a large amount of power. Combined with the stacked nature of 3D technology the heat removal path through a standard air-cooled heatsink would be far too poor to support full-speed operation of the system. Accordingly, we assume the use of efficient micro-fluidic heatsinks [1] . For our scalable prototype we assume that each set contains a group of micro-fluidic channels fabricated onto the back of the processor tier. Typical 3D stacked circuits have dies that are thinned to very small thicknesses, in the range of 5 to 20 μm. However, to accomodate the size and mechanical stresses of micro-fluidic channels the tiers that contain them are assumed to have a 150 μm thickness, while others have a 15 μm thickness.
An important consideration when designing a large-scale 3D system is the organization of the different types of tiers. Various tiers have differing power consumption, noise-generation, and decoupling capacitor distribution profiles. All of these factors affect the thermal and noise profiles in various ways. In this work we consider three simple organization styles. These organization styles are detailed in Figure 3 . In the cores-first organization style the processor tiers are all placed next to the bumps. For the cores-spread organization style the processor tiers are separated by the memory tiers associated with each set. Finally, the cores-last organization style places the processor tiers closest to the air-cooled heatsink, but farthest from the bumps. 
IV. Modeling
Our scaling studies are targeted at a combination multicore processor and DRAM system. There are several major sections of the final power-supply-noise model. This model includes the power supply grid and the power consumption maps for the processor, memory, and memory-controller tiers. We also include decoupling capacitors (decaps) in a uniform distribution. The decaps on the memory tiers are assumed to have one quarter the density of those on the processor or memory controller tiers.
A. Power Maps
The processor power map used in this work is based on the Intel Penryn architecture. To create this map we examined a publicly released die photo of the 45 nm Penryn and produced a floorplan based on the work from Puttaswamy and Loh [12] We then divided the logical pipeline stages of the floorplan by area and generated a power distribution. This distribution was created by dividing a total power dissipation of 54 watts for a dual-core version into the component stages for two processors. The percentage breakdown for each stage are based on the numbers published by Intel's George et al. [10] .
The power map for the DRAM tiers is broken into two parts. For worst-case noise behaviour we model the IDD07 condition [13] , all banks interleaved read current, which is widely acknowledged as the condition generating the most power-supply noise. For static IR-drop calculations we use an average system power value generated from Micron's system design power calculation spread-sheet. In both cases the values used are projected from currently available DDR3 data to the size of the memory modeled in our targeted processor+DRAM system. The values used in this work for the DRAM tiers are provided in Table II. The power map for the memory controller tier is based on data sheets released for various north bridge chips. There is very little power consumption information available for just memory controllers. In this case we conservatively estimated that 50% of the average power consumption for a north bridge is from the memory controller. We also estimate that the area of a single memory controller is about 50% of the total chip size. The values used in this work for the memory controller tier are shown in Table II .
B. Power Grids
The scaling studies presented in the next section vary many of the design parameters of the power grid. The values presented in this section represent the baseline case.
The power grid for the processor tiers contains two levels of granularity. We assume a flip-chip package with ball grid array chip connections. The power/ground TSVs are connected directly to the power/ground balls and travel vertically through the stack to the top tier. The TSVs are connected to one another in a grid pattern using thick 10 μm wires. Within this large coarse mesh is a fine-grained mesh for local power delivery. There are 20 of these small 5 μm wires for each of the power and ground grids in each direction. The power to power pitch of the bumps, TSVs, and coarse grid is 400 μm. The ground grid is offset from the power grid by 200 μm in each direction. This allows the power TSVs to accomodate the baseline size and pitch of the micro-fluidic channels. The memory controller tiers have the same power distribution grid as the processor tiers.
The DRAM tiers have a power grid that is again based on two levels of granularity. They share the same coarse grid that the processor tiers contain. However, due to the lower power requirements of the DRAM tiers we assume that their fine-grained power distribution wires are at half the density of the processor tiers, that is, there are 10 small 5 μm wires for each of the power and ground grids. This effectively causes the DRAM tiers to have one quarter the power/ground metallization that the processor tiers have.
C. Circuit Models
To model the 3D power distribution grid, we assume that TSVs and package bumps have parasitic resistance, capacitance, and inductance. The 2D distribution grid that exists on each tier of our system is purely resistive. A capacitor (representing decap) and a current source (representing the current demand of the transistors) connects the power and ground grids at each node. The current sources are simulated as a ramp from 0 to the current demand of the particular module that covers that area of the floorplan. The rise-time of the current source ramp is dependent on the type of tier (processor, memory) that the current source is located on.
D. Power Noise Simulation
Simulation of power grids is a major topic of study because of their large size. Typical power grids contain on the order of millions of nodes. Most commercial and free circuit simulation tools cannot efficiently simulate circuits of this size. To deal with this problem we created a custom circuit simulator. This simulator is based on Modified Nodal Analysis [14] (MNA). We also add a modification based on Domain Decomposition [15] (DD).
Domain Decomposition is a technique used mainly for solved partial differential equations. For circuit analysis the basic idea is to designate different domains of the circuit based on their connectivity and then an interface between all domains. For our simulations the TSVs provide a natural (and low node-count) interface boundary. Matrix equations from MNA for the individual domains are then solved and combined with the matrix equations for the interface to arrive at the final solution.
[
The total time for matrix inversion is thus lowered by using the DD technique because matrix inversion is an O(n 3 ) operation. Using these MNA with DD we have simulated networks containing up to 11 million nodes. Unfortunately, our full-size 46-tier system contains over 25 million nodes. To efficiently simulate our largest-size system we therefore limit our studies to an area that covers one core. This area contains one core per processor tier as well as one memory controller and the active areas of the DRAM tiers. Our experiments indicate that this introduces a small amount of error, approximately 5%. However, the error is systematic in nature and should not affect the conclusions of the scaling studies.
This work focuses on a conservative approximation of the power-noise of a given system. Accordingly, our dynamic noise simulations are of an extremely worst-case scenario. We simulate the power noise generated from all processors in the system being powered on from a sleep state at one time at the same time that every tier of DRAM is continuously in the interleaved-read state.
E. Thermal Model
The three dimensional thermal model of Koo et al. [16] is modified for our work to consider the lateral temperature and flow rate distribution caused by a non-uniform power/heat flux distribution. It is assumed that the temperatures of both the fluid and solid are uniform within each grid point. The thermal conductivity of the oxide metal layer is conservatively assumed to be the same as that of the oxide layer. In fact, since those layers are very thin and have low thermal conductivity compared to the silicon layers, heat transfer through these layers is negligibly small, which validate the assumption. Thermal and fluid flow in the micro-fluidic channel, the energy and momentum conservation equations for fluid flow, and the energy conservation equation for solids are established and given by,ṁ
Where T w and T f represent the temperatures of the solid and fluid, respectively.ṁ, i, and h conv are the mass flow rate, enthalpy, and convective heat transfer coefficient of the fluid, respectively. For the micro-fluidic channel, heat is directly transferred only to the channel base. The channel wall is modeled as a fin attached onto the base. Thus, the overall surface efficiency (η o ) is adopted to characterize an array of fins and the base surface. Micro-fluidic channel geometry information is given by channel perimeterP and width w, and the irregularity of the micro-fluidic channel layer placements is also provided into the set of thermal equations so as to see the effect of the number of micro-fluidic channel layers.
Equation (5) represents that the fluid enthalpy gradient is brought by convective heat transfer,q conv , which is a consequence of the temperature difference between the solid and fluid as well as fluid convective motion. The pressure drop across the micro-fluidic channel can be obtained by the fluid momentum conservation equation (6) . P , G, and ρ are the pressure, mass flux, and density of the fluid; f is the fluid friction factor and d h is the hydraulic diameter of the micro-fluidic channel. Equation (7) is the three dimensional thermal conduction equation which has two source terms, i.e., heat generated from the active and oxide-metal layers,q g , and convective heat transfer,q conv . Finally, k represents the thermal conductivity of the solid, in this case silicon.
Deionized water is assumed to be the working fluid and in a single-phase inside micro-fluidic channel. The governing Equations (5), (6) , and (7) are integrated and then discretized using the upwind scheme [17] . The discretized equations are simultaneously as well as iteratively solved using successive under-relaxation (SUR) to deal with the non-linearity. The properties of water are determined using REFPROP 6.0 [18] . Figure 4 shows the voltage drop as a function of time for our 5 sets-stacked cores-first organization style. Each line in the graph represents the noise at one point for one tier. The figure shows several interesting features. The noise of the core tiers has a much larger amplitude than that of the memory tiers and the frequency is also higher, corresponding to the 5 ns rise-time of the core tiers. However, the voltage drop of the entire system follows the 20 ns rise-time of the memory tiers.
V. Experimental Results

A. Power Noise Scaling Results
For clarity of presentation all results in this section relate to the cores-first organization style. Results relating to the other organization styles is presented in Subsection V-B. Figures 5  and 6 show the effect on dynamic noise and IR drop of scaling the TSV dimension from 20X20 μm to 80X80 μm. Larger TSVs provide lower resistance and inductance values, as shown in Section II. The TSV dimension has a much stronger effect on the IR drop than dynamic noise. However, as the number of sets increases the effect becomes more and more pronounced. In Figures 7 and 8 we show the effects of simultaneously changing the TSV and bump pitch on dynamic noise and IR drop. Both the dynamic noise and IR drop are highly influenced by the pitch of the TSVs and bumps. Comparing values for five sets, increasing the pitch from 200 μm to 400 μm doubles the dynamic noise.
Alignment of power/ground bumps and TSVs is an important factor in power supply noise and IR-drop. Figure 9 shows the % change in IR-drop as a function of the offset between TSV stacks and bumps. In the case where offset is nonzero all the current traveling through the TSVs must detour through the relatively small wires in the power distribution grid. This is for the case where TSVs are 40 × 40μm in dimension. The graph shows that misaligning TSVs from the bumps is a very bad idea; for the 5 sets stacked case the maximum IR-drop goes up by 160% for maximum misalignment.
The majority of our results are for cases where the contact resistance between stacked TSVs is much smaller than the resistance of the TSVs themselves. This corresponds to TSVs that are manufactured as a single unit/pillar or are bonded very tightly. To examine the effects of this assumption we performed experiments that vary the contact resistance. Figure 10 shows the IR-drop scaling for increasing contact resistance on a log scale. The graph shows that even 10mΩ is too large of a contact resistance for large 3D stackings. Figures 11 and 12 show the dynamic noise and IR drop, respectively, for various thin wire sheet resistance values. This shows the noise sensitivity of 3D stackings to distribution grid wire geometry. The baseline case corresponds to copper wire geometry of 0.5μm thickness and 1μm width. Increases in wire width and thickness can increase the feasability of higher stackings.
B. Other Organization Styles
This section focuses on differences between the core organization styles. Previous graphs have had roughly similar behavior between the organization styles. The cores-last and cores-first styles have similar responses, but the cores-last style has worse noise than cores-first. In general, the cores-spread organization style has lower dynamic noise than the other styles because of the large amount of decap provided by the memory tiers. Even though the decap density of the memory tiers is much lower than the processor tiers the large number of memory tiers makes the total decap value high.
We begin by examining the effects of changing various parameters relating to the current demand and decap distribution. First, in Figure 13 we demonstrate the result of changing the rise-time of the processor tiers. In this experiment the rise-time for the memory-tiers was kept constant. The baseline rise-time of 5 ns is quite conservative, however, this should not impact the rest of our scaling studies. We observe that the rise-time benefits the cores-spread organization style significantly more than the cores-first style. This is a result of the much larger sensitivity to dynamic noise of the cores-spread style.
In Figure 14 we show the changes in dynamic noise as the decap density in the memory tiers is varied. As expected, the cores-spread organization shows a much higher sensitivity to memory decap spread than the cores-first organization. For very low values of decap density there is a crossover point where the cores-first organization performs better than the cores-spread organization because of the processor's closer proximity to the power delivery bumps. Next, Figure 15 shows the dynamic-noise sensitivity to the processor-tier decap density. This study more clearly shows a crossover point. However, as the number of sets increases the amount of decap required to match the results for the high memory-tier-decap-density cores-spread experiments. Overall, a comparison between Figures 14 and 15 shows that increasing memory-tier decap with the cores-spread organization style is much more effective due to the fact that there are so many more memory-tiers than processor-tiers. Conversely, for the cores-first organization the processor-tier decap density is more important. Figure 16 shows the maximum temperature scaling among the three core organization styles. The horizontal axis denotes number of sets stacked together. The vertical axis denotes the maximum temperature observed within the chip stack. The cores-first organization style has the worst temperature scaling behavior, but the value is still below the thermal packaging limits of modern technology. The cores-spread organization style has the best scaling behavior among the three styles. Figure 17 shows the per-tier temperature for the cores-first organization style. The horizontal axis denotes the tier number and the vertical axis denotes the maximum temperature observed on that tier. The memory tiers have no micro-fluidics and so the temperature of those tiers increases with stacking. It is apparent that with higher memory tier power consumption they may dominate the temperature of the processor tiers. In this case micro-fluidic channels could be added to some memory tiers to lower their temperature. Experiments show that a memory-tier power dissipation of approximately 1.3 watts-per-tier causes the maximum memory-tier temperature to match that of the cores for a five-set cores-first stacking. This value is more than double the value assumed for the rest of our experiments. The cores-last organization style has temperature curves very similar to the cores-first style, we omit those results due to space limitations.
C. Temperature Scaling Results
The per-tier temperature scaling of the cores-spread organization style is shown in Figure 18 . The humps show the location of the processor tier within the stack. The maximum temperature is always observed to be at the tier closest to the bumps and farthest from the air-cooled heatsink. This is explained by the distance between neighboring processor tiers in the cores-spread organization style. This style has the lowest volumetric power density of the three options.
VI. Conclusions
We have presented scaling studies related to through-silicon via (TSV) parasitics and power distribution networks for maximally integrated 3D systems. Our basic model is composed of a scalable prototype system that contains nine tiers, including four high-performance processor cores and eight gigabytes of DRAM. Consideration of micro-fluidic heatsinks is included as it applies to the design of the power supply grid.
Inductance of TSVs is a major factor that impacts power supply integrity in large-scale 3D systems that have diestacking heights larger than 700 μm. Additionally, TSV and package-level power supply bump pitch are one of the most critical factors in reducing power supply noise and IR-drop in these devices. Contact-resistance between TSVs may become a limiting factor. However, this is largely dependent on manufacturing technologies. Overall, our studies indicate that stackings with up to 46 tiers are possible, given that sufficient care is given in the design of the power distribution network. Additionally, a cores-spread tier organization style can generally outperform a cores-first organization style for dynamic noise, however, the relationship is reversed for IRdrop.
Our thermal scaling studies using micro-fluidic heatsinks also indicate that this type of large-scale integration is feasible. Temperature scaling is best for the cores-spread organization style, but stackings with up to five sets in the other organization styles remains within current packaging thermal limitations. 
