Existing thermal-aware 3D placement methods assume that the temperature of 3D ICs can be optimized by properly distributing the power dissipations, and ignoring the heat conductivity of though-silicon-vias (TSVs). However, our study indicates that this is not exactly correct. While considering the thermal effect of TSVs during placement appears to be quite complicated, we are able to prove that when the TSV area in each bin is proportional to the lumped power consumption in that bin, together with the bins in all the tiers directly above it, the peak temperature is minimized. Based on this criterion, we implement a thermalaware 3D placement tool. Compared to the methods that prefer a uniform power distribution that only results in an 8% peak temperature reduction, our method reduces the peak temperature by 34% on average with even slightly less wirelength overhead. These results suggest that considering thermal effects of TSVs is necessary and effective during the placement stage. To the best of the authors' knowledge, this is the first thermal-aware 3D placement tool that directly takes into consideration the thermal and area impact of TSVs.
INTRODUCTION
One of the most critical challenges in 3D IC design is heat dissipation, which has already posed serious problems-even for 2D IC designs [1] . The thermal problem is exacerbated in the 3D ICs for two main reasons: 1) The vertically stacked multiple layers of active devices cause a rapid increase in power density; 2) For face-to-back tier bonding, a dielectric layer exists between each tier to provide insulation. The thermal conductivity of the dielectric layers is very low compared to silicon and metal. For instance, at room temperature the thermal conductivity of the dielectric layer is 0.05 W/mK, while the thermal conductivity of silicon and copper is 150 W/mK and 285 W/mK, respectively [17] . Accordingly, the heat can mainly flow along through-silicon-vias (TSVs) instead of through the entire substrate. Such a decrease in the cross-sectional area of the heat channel further increases the chip temperature. Therefore, it is necessary to consider the thermal integrity during every stage of 3D IC designs, including the placement stage. There are several works that address the thermal issue during 3D placement. The work in [9] applies a force-directed method with thermal forces to move cells away from high temperatures. The transformation-based 3D placement [7] relieves the thermal issues at the legalization stage, where it is preferable to place hot cells close to the heat sink. The partitioning-based 3D placement [11] uses net weights to shorten the high switching nets to reduce power, and uses pseudo nets to pull hot cells to the heat sink to reduce temperature. The work in [18] models and minimizes the unevenness of thermal distribution, in addition to minimizing the wirelength and the unevenness of cell area distribution. A detailed survey of 3D physical design can be found in [3] [4] . It is well known that for 2D ICs, properly distributed power dissipations (e.g., uniform distribution) can result in low temperatures. Most of the aforementioned work simply extends this conclusion to 3D and still focuses on properly distributing power dissipations for temperature reduction. However, as detailed in Section 2, uniform power distribution is no longer a good heuristic for temperature reduction in 3D ICs. Since TSVs are the major channel for heat flow, their distribution also has a significant impact on the temperature. A survey on concurrent TSV planning within thermal-aware 3D floorplanning and 3D routing is given in [19] . Unfortunately, none of the existing work in thermal-aware 3D placement takes the thermal effect of TSVs into consideration, mainly due to the high complexity of such a practice. In this paper we propose a thermal-aware 3D placement method that considers both the thermal effect and the area impact of TSVs. We first devise a simple criterion to guide the placement of TSVs for achieving the lowest temperature. Based on the approximation that the dielectric layer is an ideal heat insulator, we are able to prove that when the TSV area in each bin is proportional to the lumped power consumption in that bin, together with the bins in all the tiers directly above it, the peak temperature is minimized. We then use this result to guide our analytical 3D placement tool. Experimental results show that compared to the methods preferring a uniform power distribution which only result in an 8% peak temperature reduction, our method reduces the peak temperature by 34% on average with even slightly less wirelength overhead. To the best of the authors' knowledge, this is the first thermal-aware 3D placement tool that directly takes into consideration the thermal and area impact of TSVs. The remainder of the paper is organized as follows. Section 2 provides the motivation for our work. Section 3 discusses the optimal distribution of TSVs to minimize the temperature, which is integrated into a 3D placement framework in Section 4. Experimental results are given in Section 5, and concluding remarks are given in Section 6.
MOTIVATION
The stack-die structure has dramatically increased power density compared to conventional 2D ICs, and thus threatens the thermal reliability of 3D ICs. In addition, the low thermal conductivity of the dielectric layers in face-to-back bonding tiers prohibits the heat from flowing vertically. Accordingly, as pointed out in [10] , TSVs are the major channels for vertical heat flow. Such an observation results in the fundamental difference between the thermal-aware placement for 2D ICs and for 3D ICs. In 2D placement, by properly distributing the power dissipations across the chip, heat can flow uniformly through the entire substrate to the heat sink, and the temperature can be minimized [16] . However, in 3D ICs, it is the correlation between the distributions of the TSVs and the power density that has a direct impact on the temperature. For example, compare the two artificial placement results with the relative power values shown in Figure 1 . In Figure  1 (a), the power distribution is uniform while the TSVs are clustered in the center; while in Figure 1(b) , the power distribution is non-uniform with 2 to 8 times higher power density in some regions than the previous case, and the TSVs are clustered proportional to the regional power density. The corresponding temperature maps are shown in Figure 1 (c) and (d), respectively, assuming a 4-tier 3D chip with 6W power in a 1.5mm
2 area with about 1200 TSVs per tier, where the 3D technology parameters for temperature evaluation are the same as in Section 5. From this artificial example, we can see that the locations of the TSVs play a very important role in the thermal integrity of 3D ICs. As expected, it is sub-optimal for existing thermal-aware 3D placement to be targeted at distributing power dissipations and neglect the thermal effect of TSVs. To improve this, a naïve approach would be to compute the optimal locations of the TSVs that can result in the minimum temperature during each iteration of placement. However, it will result in an optimization-in-theloop with significant runtime overhead. Since thermal-aware placement mainly targets large designs, this method is less practical. On the other hand, if we adjust the locations of the TSVs after placement is done to minimize the temperature; it will bring about significant wirelength overhead because these TSVs are also part of the signal nets. We will address this dilemma in the remainder of the paper. There are many different 3D integration technologies, including face-to-face bonding, face-to-back bonding, via-first, via-middle, via-last. Different techniques can have totally different thermal models. In this paper we focus on the face-to-back bonding with via-first technology. In addition, although it is possible to insert additional thermal TSVs [10] after placement to further suppress the temperature, it brings in extra area overhead. In this paper we focus on exploring the opportunities of temperature reduction by utilizing the signal TSVs in 3D placement. Our experimental results show that signal TSVs alone can already reduce the temperature significantly, with minimal wirelength or runtime overhead. 
PROPERTIES OF A THERMALLY-OPTIMAL TSV DISTRIBUTION
As discussed in Section 2, the fundamental problem in thermalaware placement can be stated as follows: Given a power distribution, what will be the optimal distribution of TSVs so that the temperature is minimized? While this problem seems to be complicated, we will show that the answer is surprisingly simple. We can derive an analytical solution without involving any optimization tools. For simplicity of presentation, we summarize the key notations used in this section in Table 1 . To start, we assume steady-state analysis to calculate the temperature, where the chip is thermally modeled as a resistive network. We also lump the TSVs in each bin as a thermal conductor, with its conductance proportional to the total TSV area. The temperature-temperature relation can be expressed as BT P =
A two-tier example of the thermal resistive network is illustrated in Figure 2 , where the nodes (labeled with numbers) are connected by thermal conductors (labeled with subscripted symbols), and the bin numbers are in gray color. Take node 3 (bin 3, tier 1) for example, the power-temperature relation is expressed as 
Thus, the network can be written in a matrix form as equation (1), where each row corresponds to one node.
Figure 2. A two-tier example of the thermal resistive network
If we treat TSV size as variables, the thermal conductance matrix B of the network can be expressed in a parameterized form as
where 0 B is the constant thermal conductance matrix without TSVs, and the variable 
=+ and all the other elements in j i M are zeros. Again, take node 3 for example, let 3, 7 b be the thermal conductance between node 3 and node 7 when there is no TSVs, TSV g be the conductance of a unit-area TSV, and the variable 1 3 a be the area of a lumped TSV in bin 3, the conductance becomes
In this example, the stamping matrix 1 3 M only has non-zero elements
Now, we can mathematically state the problem for optimal TSV placement as
where j tot A is the total area of the TSV connecting tier j and tier j+1, and is determined once the floorplanning is done. The infinity norm is defined as 1 2 max{ , , , }.
The objective function is obtained by simply substituting (3) into (1).
M is the stamping matrix for the lumped TSV in the last tier connecting to the heat sink.
The two constraints are also self-evident: the total TSV area in each tier is a fixed number, and the lumped TSV area in each bin should be non-negative. Note that we have relaxed the constraint that the TSV area j i a in each bin should be discrete. As such, the TSV areas mentioned in the theorems and corollaries proposed below should be rounded.
Problem (P1) is non-linear in nature. Integrating nonlinear optimization engines in a placement tool directly would be impractical due to the high complexity. Before we directly tackle (P1), we resort to a simpler version of the problem: For a one-tier 3D IC (footnote 2) with a given power distribution, what will be the optimal locations of TSVs so that the temperature is minimized?
In this case, each TSV is directly connected to the heat sink. As such, (P1) can be rewritten as
where i M is the stamping matrix for the TSV in bin i, i a is the total TSV area in bin i, and tot A is the total TSV area.
At first look, this problem is still non-linear and difficult to solve. But intuitively we should place more TSVs in the bins with higher power density to provide lower impedance to thermal ground. This leads to the conjecture that the optimal TSV area i a * in bin i should be proportional to the power consumption . 
In the interest of space, we will only outline the proof for the Theorem. From the fact that TSVs are the major vertical heat flow channel ( ,
b is the inter-tier conductance without TSVs), we can get (8) and (9), we have
In order for T ∞ to attain the above minimum, the inequalities in (9) must become equality. According to Holder's inequality, such a condition is 1 2 ... n T T T = = = (11) Substitute it back to (8), and we can get 
Note that in the above theorem, we neglected the fact that the total TSV area in each area is discrete, that the dielectric layer is not an ideal thermal insulator, and that the total TSV area allocated in each bin cannot exceed the area of that bin. In reality, the optimal condition needs to be tailored to fit into these constraints. We can also easily derive a corollary based on this theorem. Corollary 1. When the TSVs are placed proportional to the power consumption in each bin, the temperature in each bin is identical, i.e.,
Corollary 1 has a particularly important meaning, as it allows us to generalize Theorem 1 (which is limited to the single-tier case) to the general multi-tier cases. Take a two-tier case as an example. According to Theorem 1, as long as we place the TSVs connecting the bottom tier and the package proportional to the power density in each bin, the temperature is minimized in the bottom tier. Now, since such optimized temperature distribution is also uniform based on Corollary 1, the bottom tier can be treated as ground. Accordingly, we can again apply Theorem 1 to the top tier, and place the TSVs connecting the top tier and the bottom tier according to the power distribution. Such an observation leads to the following theorem.
Theorem 2 (Multi-Tier Case). If we denote the bottom tier (with connection to the heat sink) as tier K, and the top tier as tier 1, then to minimize the temperature, the TSV area in bin i of tier j connecting to tier j+1 should be proportional to the lumped power in bin i of tier 1, 2, …, j. In other words, the optimal solution of problem (P1) shall satisfy
The proof can be derived based on the induction on the number of tiers with Theorem 1; this is because the optimized temperature in a tier is uniform and can be treated as thermal ground to further optimize upper tiers. Figure 3 shows a simple two-tier ( 2) K = example to illustrate the theorem, where each tier is divided into four bins ( 4) . n = Similar to Corollary 1 for the single-tier case, we also have the following corollary for the multi-tier case.
Corollary 2. When the TSVs in each tier are placed proportional to the lumped power consumption in each bin and the same bins in all the tiers above, then each tier shall have a uniform temperature distribution. The temperature in the tier j can be expressed as
To summarize this section, we would like to point out that all the theorems and corollaries are based on the assumption that TSVs are much more effective in conducting heat than the dielectric layer. And accordingly, we have treated the dielectric layer as an ideal heat insulator. In reality this is not correct, and thus the theorems are only an approximation. However, our experimental results show that they work pretty well. 
THERMAL-AWARE 3D PLACEMENT
Our 3D placement flow is similar to the one in [14] , but in this section we mainly focus on the 3D placement step in the TSV coplacement flow. We assume the tier assignment of each cell is given, either by manually partitioning or automatic partitioning. An automatic partitioning method by 3D floorplanning will be explained in Section 4.2. The 3D placement step is called after 3D net splitting and TSV insertion.
Thermal-Aware Cell/TSV Co-Placement
Based on the optimality condition in Theorem 2, we are able to effectively reduce the temperature during the 3D placement step by an analytical method like the following, is the area density constraints for overlap removal, HPWL( , ) x y is the total half-perimeter wirelength as the objective function, TSV distribution cost COST( , ) x y is to measure the "distance" between the current solution and a thermally optimal distribution, and β is a user-defined parameter for tradeoffs between wirelength quality and temperature reduction. The TSV distribution cost is also normalized by a factor as the ratio between the gradient norm of the initial HWPL function and the gradient norm of the initial COST function.
Please refer to Chapters 7, 10 and 11 in [15] for the algorithms that solve problem (P3) by the quadratic penalty method when 0, β = and refer to [2] for the parameter tunings when 0.
β > In this section we focus on the definition of the TSV distribution cost function COST( , ).
x y
The TSV distribution cost is constructed with the property that COST( , ) 0 = x y if and only if the optimality condition in Theorem 2 is satisfied. In detail, the cost is constructed as the following:
N be the number of TSVs in the bin i, tier j, and we assign a negative "power" value j TSV p to all the TSVs on tier j. The negative power value is defined as
Under this assignment, the total negative power of the TSVs in the bin i, tier j is
Therefore, the total TSV power and the lumped cell power in the bin i, tier j is ( 1)
It is obvious that this amount of power value is equal to zero if and only if the TSVs are optimally distributed, as in Theorem 2. Thus, the TSV distribution cost can be defined as
which is a sum of squares of the total TSV power and the lumped cell power in each bin. This quadratic penalty method is an easyto-use, common method in engineering practice to satisfy the equality constraints. Since the existence of a solution that satisfies both the area density constraint and the TSV distribution constraint is not easy to determine, we only penalize the COST function by a finite number β instead of pushing it to . +∞
Overall Thermal-Aware Placement Flow
To obtain the tier assignment, we use a 3D floorplanner [6] on a coarsened circuit, which is produced by doing an 80-way partition using hMetis [13] . The number of partitions for floorplanning is determined empirically, so that the runtime is under control for circuits of various sizes. The tier assignment obtained in the 3D floorplan is locked before 3D placement.
Given the tier assignment, we split the circuit into tiers, and insert one TSV per 3D net. The cells and inserted TSVs are co-placed by solving problem (P3) with a modified placement engine of [8] . Finally, the cells and TSVs are legalized, tier-by-tier, to complete the flow using the mPL [5] detailed placement engine.
EXPERIMENTAL RESULTS
We implement the algorithm in C++ and run on an Intel Xeon 2. Table 2 , where the utility rate (Util.) is the total cell area divided by the total chip area.
We synthesize the circuits with a standard cell library for the MIT Lincoln Lab 130nm 3D SOI technology. The target 3D technology is a 4-tier 3D IC, with TSV size 6 6 m m μ μ × and TSV pitch 12 12 . m m μ μ × The 3D chip temperature is measured by the compact model in [17] , assuming that the height of the silicon layer is 300 m μ on the bottom tier and 25 m μ on the other tiers.
The placement area is set as a square with 20% to 28% whitespace in total, and the I/O pins are placed uniformly along the boundaries in alphabetical order. The power dissipation of each cell is generated as follows: The circuit is partitioned into eight parts by hMetis. Each part is assigned a random number between 0 and 1 as a relative power number. These relative numbers are scaled to power values such that the overall power density is on the order of magnitude of 1 W/mm 2 , which is the projected power density for the high-performance chips at the 14nm generation by ITRS [20] . The advantage of our thermal-aware 3D placement is compared to other thermal optimization methods. The results are presented in Table 3 , and the results of wb_conmax are visualized in Figure 4 . The x-axis shows the normalized half-perimeter wirelength (HPWL), and the y-axis shows the temperature. The "baseline" in Table 3 is a wirelength-driven placement generated by solving (P3) with 0.
β =
Three thermal optimization methods, uniform power, postprocessing, and co-placement are compared.
The uniform-power method mimics the thermal-aware 3D placement methods [11] [18] that do not consider the thermal effects of TSVs. Although a uniform power distribution is not a thermal optimal solution, the difference is only a few degrees according to the Hotspot [12] simulation for 3D ICs. Thus, uniform power is a fair replacement for the previous thermalaware 3D placement methods. It is able to be implemented by solving problem (P3) with TSV power 0.
TSV p = In this way, the TSV distribution cost becomes purely a power distribution cost. When the per-tier total power is assumed to be a constant, the minimizer of the cost function in equation (21) is a uniform pertier power distribution. The cost weight β is set to 1 in the implementation. The post-processing method is a direct application of Theorem 2 at the post-placement stage. After 3D global placement, an optimal TSV distribution is computed according to the power distribution, regardless of overlaps. The assignment of TSVs to the TSV slots in the target distribution is computed by a linear assignment method to minimize the wirelength overhead. The resulting overlaps are removed by a legalization step. The co-placement presents our method, which optimizes the TSV distribution during 3D placement. In Figure 4 , the left endpoint of the curve is the result with TSV distribution cost weight 0.00, β = and the right endpoint is with 1.00.
From Figure 4 , it is clear that co-placement outperforms the other two optimization methods, and reduces more temperature within a similar amount of wirelength overhead. As discussed in Section 2, if the thermal benefits of TSVs are not being considered, a uniform-power distribution is not effective for temperature reduction. The "average" rows in Table 3 show the average results normalized by the baseline results. Our co-placement method is able to reduce temperature by 34%, which is 4X greater than the uniform-power method that reduces temperature by only 8%. Although the post-processing method makes use of the heat conductivity of TSVs, it is likely to cause congestion due to displacement. Thus, the legalized results have either higher temperature, or longer wirelength. Moreover, our co-placement method provides a mechanism for wirelength and temperature tradeoffs, as shown in Figure 4 . The data points are generated with different β values labeled in the figure. When the performance is critical, the acceptable wirelength degradation is limited. Our method is still able to reduce temperature with a negligible amount of wirelength degradation (e.g., 2%).
Figure 4.
Comparison of thermal optimizations on wb_conmax Table 3 . Results of our co-placement method and comparisons with other methods
