Through-silicon vias (TSVs) are required for transmitting signals among different dies for the three-dimensional integrated circuit (3D IC) technology. The significant silicon areas occupied by TSVs bring critical challenges for 3D IC placement. Unlike most published 3D placement works that only minimize the number of TSVs during placement due to the limitations in their techniques, this paper proposes a new 3D cell placement algorithm which can additionally consider the sizes of TSVs and the physical positions for TSV insertion during placement. The algorithm consists of three stages: (1) 3D analytical global placement with density optimization and whitespace reservation for TSVs, (2) TSV insertion and TSV-aware legalization, and (3) layer-by-layer detailed placement. In particular, the global placement is based on a novel weighted-average wirelength model, giving the first model in the literature that can outperform the well-known log-sum-exp wirelength model theoretically and empirically. Further, 3D routing can easily be accomplished by traditional 2D routers since the physical positions of TSVs are determined during placement. Compared with state-ofthe-art 3D cell placement works, our algorithm can achieve the best routed wirelength, TSV counts, and total silicon area, in shortest running time.
INTRODUCTION
The three-dimensional integrated circuit (3D IC) technology has emerged as one of the most promising solutions for overcoming the challenges in interconnect and integration complexity in modern and nextgeneration circuit designs. The 3D IC technology can effectively reduce global interconnect length and increase circuit performance; however, this technology brings some challenges with through-silicon vias (TSVs), used to make interconnections among different layers, thermal effects, packaging, power delivery/density, etc.
In a generic 3D IC structure, each die is stacked on top of another and communicated by TSVs [7] [8] [9] [10] [11] 15] . These TSVs are responsible for the interconnections among devices on different layers, but they could cause some significant problems. Under current technologies, TSV pitches are very large compared to the sizes of regular metal wires; as a result, a large number of TSVs will consume significant silicon areas and degrade the yield and reliability of the final chip. Further, TSVs are usually placed at the whitespace among macro blocks or cells, and thus TSVs might affect the routing resource and increase the overall chip or package area. The significant silicon areas occupied by TSVs and the induced yield and reliability issues become critical problems for 3D IC placement.
Previous Work
The 3D IC placement problem has attracted increasing attention in the recent literature. By reusing modern 2D placement results, a folding/stacking based 3D placement method was proposed [9] . This method performs layer re-assignment for cells to further improve 3D placement solutions. A partitioning-based approach integrates wirelength, temperature, TSV counts, and thermal effect into the min-cut objective [11] . As known for the 2D placement problem, however, a partitioning-based approach is not as competitive as an analytical one. A multilevel analytical placement method is proposed in [7] for 3D ICs to relax discrete layer assignment so that the movements of cells are continuous in the z-direction. Its basic idea is to use an inter-layer density penalty function to remove cells between layers; however, the area occupied by TSVs is not considered during placement. A forcedirected 3D placement method was proposed recently [15] . A partitioning process is applied to assign cells to different layers, and then a force-directed quadratic algorithm is used to place cells and TSVs in a 3D IC. Two design schemes are proposed to handle TSVs: (1) TSV-site: it places cells with regular fixed TSVs, and a TSV assignment stage then assigns these pre-placed TSVs to cells; (2) TSV coplacement: it simultaneously places TSVs and cells. The sizes of TSVs are considered; however, since no physical information is considered during partitioning, the placement solution quality is usually limited.
There is also an existing work for the mixed-size placement that particularly considers large macros for 3D IC designs [8] . Since this work is focused more on the handling of big macros (instead of cells), it is beyond the scope of this paper.
Our Contributions
One of the common deficiencies in the previous works [7, 9, 11] is that the sizes of TSVs are not considered. As mentioned earlier, however, TSVs usually occupy significant areas and should be considered during 3D IC placement. Traditionally, TSVs are inserted during the routing stage by searching whitespace in the whole 3D IC, and thus the quality of a routing result strongly depends on the remaining whitespace after the placement stage. Figure 1 illustrates the importance of considering whitespace reservation for TSVs during placement. If whitespace is not reserved for TSVs during placement, we observe that the available whitespace for TSV insertion is usually located along the chip periphery, especially for analytical placement; consequently, the resulting placement would incur longer wirelength since a TSV might be inserted in a whitespace far from its connected cells, as shown in Figure 1(a) . In contrast, with whitespace reservation for TSVs as illustrated in Figure 1(b) , a TSV can be inserted among cells to reduce the total wirelength.
Considering the physical locations of TSVs and their sizes and counts, we develop a new analytical cell placement algorithm for 3D IC designs. The main contributions of this paper are summarized as follows:
• A new 3D cell placement algorithm which considers the sizes of TSVs and the physical positions for TSV insertion is proposed. This algorithm consists of three stages: (1) 3D global placement with density optimization and whitespace reservation for TSVs, (2) TSV insertion and TSV-aware legalization, and (3) layer-bylayer detailed placement.
• A novel weighted-average wirelength model for analytical global placement is presented. Compared with the well-known log-sumexp wirelength model [16] that has dominated modern placement research for a decade, the proposed weighted-average wirelength model gives the first model in the literature that can outperform the log-sum-exp one theoretically (with smaller estimation errors) and empirically.
• Instead of using cell areas to evaluate placement density, a new density cube is introduced to model the density of 3D placement.
• In addition to TSV count minimization, the density introduced by the sizes of TSVs are modelled in the 3D analytical placement formulation. To the best of our knowledge, this is the first work that handles the sizes of TSVs during 3D analytical placement with cell movement between layers.
• A TSV insertion algorithm based on the overlapping whitespace area between neighboring layers is proposed to determine the location of every required TSV.
• Since the physical positions of TSVs are determined during placement, 3D routing can easily be accomplished with traditional 2D routers. Compared with the state-of-the-art 3D cell placement works [7, 15] , our algorithm can achieve the best routed wirelength, TSV counts, and total silicon area, in shortest runtime.
The remainder of this paper is organized as follows. Section 2 formulates the 3D placement problem. Section 3 presents the overall flow of the TSV-aware 3D analytical placement. Sections 4 and 5 detail the techniques used for the TSV-aware 3D analytical placement. Section 6 shows the experimental results. Finally, Section 7 concludes this paper.
PROBLEM FORMULATION
The 3D placement problem can be formulated as a hypergraph H = (V, E) placement problem. Let vertices V = {v 1 , v 2 , ..., vn} represent blocks, and hyperedges E = {e 1 , e 2 , ..., en} represent nets. Let x i and y i be the respective x and y coordinates of the center of block v i , and z i be the device layer to which v i belongs. Given a placement region and the number of device layers k, we intend to determine the optimal positions of movable blocks so that the total wirelength and the number of required TSVs are minimized while satisfying the non-overlapping constraints among blocks and TSVs. Like traditional 2D placement, the 3D placement problem is usually solved in three stages [7] , (1) global placement, (2) legalization, and (3) detailed placement. Global placement evenly distributes the blocks and finds the best position and layer for each block to minimize the target cost (e.g., wirelength, TSV counts). Then, legalization removes all cell overlaps for each layer. Finally, detailed placement refines the 3D placement solution.
TSV-AWARE 3D ANALYTICAL PLACEMENT
There are three stages in our TSV-aware 3D placement method: (1) 3D global placement with density optimization and whitespace reservation for TSVs, (2) TSV insertion and TSV-aware legalization, and (3) layer-by-layer detailed placement. In 3D global analytical placement, in addition to 3D placement for cells, we transform the sizes of TSVs into density constraints such that after the 3D global placement, the whitespace required by TSVs is reserved. In the TSV insertion and TSV-aware legalization stage, we first legalize cells in the circuit with minimum displacement, and then insert TSVs to their best positions such that the overlaps between cells and inserted TSVs are minimized. We fix the positions of TSVs after insertion, and legalization for cells is then applied to remove overlaps. In the layer-by-layer detailed placement stage, 2D detailed placement techniques are applied to further improve the solution quality, such as cell matching for wirelength optimization and cell sliding for density optimization. Since cells have been assigned to different layers and TSVs have been inserted, the total wirelength of a net equals the summation of the wirelength of its sub-nets in each layer. (Note that the wirelength associated with TSVs is considered by the TSV count.) The placement results with inserted TSVs can then be routed layer-by-layer by traditional 2D routers. 
TSV-AWARE 3D GLOBAL PLACEMENT
In this section, we present the 3D analytical global placement engine and discuss the whitespace reservation for TSVs.
3D Analytical Global Placement Engine
Since the analytical placement framework has been shown very effective for the 2D placement problem, we shall extend this framework to solve the 3D placement problem. After the placement region is divided into non-overlapping uniform bin grids on each device layer, the 3D analytical global placement problem can be formulated as a constrained optimization problem as follows:
where 
where t density is a user-specified target density value for each bin, w b,k (h b,k ) is the width (height) of bin b of layer k, and P b,k is the area of pre-placed block in bin b of layer k. Unlike traditional 2D placers and previous 3D placers [7, [9] [10] [11] that only consider the density D b,k of movable blocks, the sizes of TSVs are also considered in our formulation. Since the actual positions of TSVs are not determined, T b,k is a dynamic value during the placement process. T b,k will be explained in more detail in Section 4.2.
665
35.5
Wirelength and TSV Models
The wirelength W (x, y) is defined as the total half-perimeter wirelength (HPWL),
Since the exact TSV positions are unknown during global placement, the number of TSVs is an estimation. There are two major types of TSVs, (1) via-first and (2) via-last TSVs. While via-first TSVs interfere with device layer only, via-last TSVs interfere with both device and metal layers and should be aligned between neighboring device layers [15] . For both types of TSVs, the number of TSVs used for each net could be approximated by the number of layers it spans. Consequently, the number of TSVs Z(z) is estimated through a similar way like wirelength [7] :
Since W (x, y) in Equation (2) is not smooth and non-convex, it is hard to minimize it directly. As a result, several smooth wirelength approximation functions have been proposed. The log-sum-exp (LSE) wirelength model,
proposed in [16] often achieves the best result among recent 2D academic placers [6] . When γ approaches zero, the LSE wirelength is close to the HPWL [16] . (Note that Z(z) can be smoothed in a similar way.) Due to the computation precision, however, γ cannot be arbitrarily small to avoid arithmetic overflow, and thus an estimation error is inevitable. Given a set of x coordinates for calculating the wirelength of a net e, xe = {x i |v i ∈ e}, let ε LSE (xe) be the estimation error of the LSE wirelength model with respect to the x coordinate. From [18] , it is not difficult to derive the error bounds that 0 ≤ ε LSE (xe) ≤ γ ln n, where n is the number of x coordinates. The error bounds for the y and z coordinates can be derived similarly.
Weighted-Average Wirelength Model
In this paper, we propose a novel weighted-average (WA) wirelength model to approximate the respective maximum and minimum functions in Equations (2) and (3) with smaller estimation errors than the LSE wirelength model. Given a set of x coordinates, xe, for calculating the wirelength of net e, the weighted average is given bȳ
where F (x i ) is the weighting function of x i and is non-negative. It is intuitive that x min ≤X(xe) ≤ xmax, where xmax and x min are the respective maximum and minimum values of xe.
To approximate the maximum value in xe, F (x i ) should grow fast and can separate larger values from smaller ones. To achieve this goal, the exponential function is used
where γ is the same as that in Equation (4) . Note that other functions with a similar property to the exponential function can also be used. The estimation function for the maximum value is then defined as
The estimation function for the minimum value can be defined similarly. Therefore, the WA wirelength model is given by
) .
The WA wirelength model converges to the HPWL in Equation (2), as γ converges to 0. It can be shown that the WA wirelength model is strictly convex and continuously differentiable by differentiating Equation (7) twice with respect to variables x i ∈ xe. Similarly, the estimated minimum function is also strictly convex and continuously differentiable.
Since the WA wirelength model is a linear combination of estimated maximum and minimum functions, we have the following theorem for the WA model.
Lemma 1. The WA wirelength model is strictly convex and continuously differentiable.
Let ε W A (xe) be the estimation error of the WA wirelength model with respect to the x coordinate. We have the following estimation error bounds for this model. 
where ∆x = xmax − x min . Similarly, we have the same bounds for the minimum function. Figure 3 shows the estimation errors for the LSE and our WA wirelength models. As shown in Figure 3 , our WA wirelength model has smaller estimation errors than those of the LSE wirelength model, especially when the number of variables grows. For example, the respective error bounds for the WA wirelength model for n = 2, 3, 5, 10 are 0.46γ, 0.60γ, 0.81γ, and 1.16γ, while those for the LSE one are 0.69γ, 1.10γ, 1.61γ, and 2.3γ, respectively.
Density Model
To evaluate the placement density, overlaps among bins and blocks are calculated. Unlike traditional 2D placers [4-6, 13, 14] and previous 3D placers [7, [9] [10] [11] that usually use only horizontal and vertical overlaps to calculate the placement density for each layer, we introduce a density cube model to evaluate the density of 3D placement. Adding the z dimension, rectangular bins and blocks become cubes and cuboid blocks, respectively. The density of 3D placement is then calculated by the overlaps among cubes and cuboid blocks in all the x, y, and z directions. The density of a cube b of layer k can be defined as respectively. In such a way, blocks can be distributed evenly among layers under the density constraints. The bell-shaped function [13] can be extended to transform the density function D b,k (x, y, z) into a smooth and differentiable function for our analytical placement. The quadratic penalty method is used to solve Equation (1), implying that we solve a sequence of unconstrained minimization problems of the form
with increasing λ's. The solution of the previous problem is used as the initial solution for the next one. We solve the unconstrained problem in Equation (12) by the conjugate gradient (CG) method.
Whitespace Reservation for TSVs
To estimate the spaces occupied by TSVs, in addition to cell density D b,k , the density of TSVs T b,k is also added to the density constraints for global placement. Since the actual positions of TSVs are not determined during global placement, T b,k is a dynamic value. A reasonable assumption is that the communication between neighboring layers of a net is through one TSV. The net-box is defined as the range spanned by a net. Just like traditional 2D routing, for a net, placing the corresponding vias inside its net-box leads to fewest routing detours. Given a net and its connected pins, we can derive its net-box. We distribute required spaces for TSVs into density cubes inside the net-box evenly, such that there are enough spaces for TSV insertion inside this net-box.
After 3D analytical global placement, the amounts of whitespace needed for TSV insertion are reserved as much as possible.
TSV INSERTION & TSV-AWARE LEGALIZA-TION
In this step, we attempt to insert TSVs among legalized cells without overlaps. Since the remaining whitespace after 3D global placement may not be enough for inserting TSVs, we use a three-step scheme to insert required TSVs and legalize both cells and TSVs such that there are no cell-to-cell, TSV-to-TSV, cell-to-TSV overlaps. First, we legalize cells in each layer with minimum displacement without considering TSVs. Second, a greedy TSV insertion method is applied to insert required TSVs while minimizing cell-to-TSV overlaps. Finally, post legalization is applied to remove cell-to-TSV and cell-to-cell overlaps with TSVs being fixed. We detail each step as follows.
Layer-by-layer Standard Cell Legalization
To perform the layer-by-layer standard-cell legalization, traditional 2D legalization techniques, such as [12, 17] , can be applied to each layer. Unlike 2D legalization, however, connections among different layers need to be considered during the legalization for 3D ICs.
TSV Insertion
Given a legalized placement, we try to insert TSVs to the positions with minimum overlaps with legalized cells such that, after TSV insertion, the total movement of cells for overlap removal is minimized.
Initially, each net is decomposed into 2-pin nets by a minimum spanning tree (MST) algorithm, and we insert TSVs for a net at one time.
The TSV insertion proceeds in a nondecreasing order of the net-box sizes of 2-pin nets. Since a bigger net-box typically has a larger whitespace area in the net-box for TSV insertion, the TSV insertion process starts from the 2-pin net with the smallest net-box. In the following, we detail the minimum spanning tree (MST) generation and TSV position determination.
Minimum Spanning Tree (MST) Generation
We adopt the spanning graph approach in [19] which is efficient and effective for constructing spanning trees. Note that all of the cells are projected to a single layer such that the geometric relation in the spanning graph is captured by the planar distance between cells. Different from traditional spanning tree construction, TSV counts are also considered as costs for the construction. The cost of an edge e is defined as β · L(e) + δ · Z(e), where L(e) is the planar wirelength of e, Z(e) is the number of TSVs required for e, and β and δ are user-specified parameters. Kruskal's algorithm is then applied to construct a minimum spanning tree. It should be noted that, since we use only one TSV on each layer for each net, a large δ is adopted such that edges connecting between vertices in the same layer can be selected first.
TSV Position Determination
After each net has been decomposed into 2-pin nets, we start to insert required TSVs into the placement region. We sort the 2-pin nets in a nondecreasing order of their net-box sizes, and at each time, the 2-pin net with the smallest net-box is chosen for TSV insertion. In this paper, we consider the alignment constraint for TSVs. As mentioned in Section 4.1.1, there are two major types of TSVs. Via-last TSVs which interfere with both device and metal layers should be aligned between neighboring device layers, and thus are more restricted. In contrast, via-first TSVs interfere with device layer only, and thus can be inserted layer-by-layer independently; with the alignment consideration of via-first TSVs, however, interconnections between TSVs of neighboring layers can be minimized.
For each net, we first evaluate the whitespace on each neighboring layer spanned by its net-box. Then, we calculate the overlapping whitespace area among these layers enclosed by the net-box. The regions enclosed by the net-box are further divided into smaller bins such that at most one TSV can be inserted into a bin. Note that, if the minimum spacing constraints for TSVs are considered, the bin size should be slightly larger than the TSV size, such that TSVs can be inserted into bins without violating the spacing constraints. After calculating the overlapping whitespace area, the TSV positions are decided by searching a bin in the enclosed region such that the overlaps between cells and inserted TSVs are minimized, and there is no overlap between any two TSVs. If there is not enough whitespace in the net-box, the searched region is doubled, and the search process continues.
TSV-Aware Legalization
After TSVs are inserted, post legalization is applied to remove overlaps between cells and TSVs. The legalization techniques presented in Section 5.1 are applied again; at this time, however, TSVs are considered as fixed blockages when performing legalization.
EXPERIMENTAL RESULTS
We conducted four experiments to evaluate our algorithm. All experiments were performed on the same PC workstation with eight Intel Xeon 2.5 GHz CPUs and 26 GB memory. We implemented our algorithm in the C++ programming language and integrated our code into NTUplace3 [6] , which is a leading academic 2D placer. We also modified the legalizer and detailed placer of NTUplace3 to support the layer-by-layer legalization and detailed placement in our proposed 3D placement flow. The weight of TSV counts in Equation (12) , α, was set to 10 as in [7] , and β and δ for minimum spanning construction were set to 0.4 and 0.6, respectively.
In the first experiment, we examined the quality of our 3D analytical placement engine by comparing with the state-of-the-art 3D analytical placer [7] . (Note that the mixed-size placer [8] is focused more on the handling of big macros and applies the 3D placer [7] for standard-cell placement; for fair comparison, we thus should compare with [7] directly.) The second experiment compared with a recent force-directed 3D placer with TSV area consideration [15] . In the third experiment, we examined the effectiveness of our whitespace reservation during global placement. The fourth experiment compared our weighted-average [7] were conducted on a Linux machine with AMD Opteron 1.8GHz CPUs and 8GB memory.) [7] Our (WA) wirelength model with the log-sum-exp (LSE) one. We report the results in the following subsections.
667

3D Analytical Placement Comparisons
We compared with the 3D analytical placer [7] based on the IBM-PLACE benchmarks [2] used in [7] . As in [7] , a 4-layer implementation of 3D IC was assumed, and the floorplan size was scaled by dividing the original area by 4, and then each layer was enlarged to obtain 10% whitespace. Note that, since the work [7] does not consider TSV area, TSV insertion is not applied in our placer here for fair comparison. Table 1 summarizes the experimental results, where total wirelength (WL), numbers of TSVs, and runtime are compared. Columns 4 to 6 give the experimental results reported in [7] , and Columns 7 to 9 list our results. Compared to [7] , our algorithm can effectively reduce the wirelength and TSV counts by 13% and 16%, respectively. Although the work [7] was implemented on a Linux machine with AMD Opteron 1.8GHz and 8GB memory, the results also reveal that our algorithm is more efficient.
The work [7] uses an additional interlayer density function and unconnected filler cells for density control, which enlarges the placement problem size and complexity, while our algorithm directly optimizes the wirelength and TSV counts under density constraints by using the density cube model to spread blocks in the whole 3D chip. The experimental results justify the effectiveness and efficiency of our placement algorithm.
TSV-Aware Placement Comparison
Based on the IWLS [3] and industry benchmarks used in [15] , we also compared with the state-of-the-art force-directed 3D placer [15] which considers TSV sizes during placement. As in [15] , 45nm technology and via-first TSVs were used, TSV cell size was set to 2.47µm × 2.47µm, 4-layer implementation of 3D IC was assumed, and Cadence SoC Encounter [1] was used to route each layer after 3D placement. For each layer, landing pads were inserted as the top mental wire to connect TSVs of the layer above, such that interconnections among layers can be complete. For fair comparison, TSV pitches are not considered as in [15] ; however, we note that TSV pitches can easily be modelled by changing the sizes of TSVs, which will not affect our algorithm. Table 2 summarizes the experimental results, where routed wirelength, numbers of TSVs, total silicon area, runtime for placement are compared. Columns 4 to 7 give the experimental results reported in [15] , and Columns 8 to 11 list our results. As shown in the table, our algorithm can achieve averagely 10% shorter wirelength, 21% fewer TSV counts, 18% smaller total silicon area, and significantly shorter runtime, compared with the force-directed quadratic placement algorithm [15] . Especially for larger circuits, our algorithm can achieve even better results than [15] , implying that our algorithm has better scalability. (Note that the work [15] conducted its experiments on a Linux machine with eight Intel Xeon 2.5GHz CPUs and 16GB memory, similar to our environment except that our machine has 26GB memory.) The quality differences might lie in the fact that the work [15] applies partitioning for layer assignment before placement. Since there is no physical information during the partitioning, its placement quality is usually limited. 
Whitespace Reservation Comparison
In this experiment, we examined the effectiveness of our whitespace reservation method during global placement. We compared our method with and without whitespace reservation. Table 3 lists the results. As mentioned before, the sizes of TSVs are significantly large, and TSVs are usually placed at the whitespace between macro blocks or standard cells and might affect the routing resource. If the whitespace for TSVs is reserved earlier, it can facilitate later routing. Our whitespace reservation can effectively reserve whitespace for TSVs and thus reduce the final routed wirelength. Besides, our whitespace reservation considers the density of TSVs during global placement, and it can also reduce the total number of required TSVs by 2% and the total silicon area by 2%. As shown in Table 3 , our method with whitespace reservation requires shorter runtime for routing and achieves 3% shorter final routed wirelength. Although performing the whitespace reservation for TSVs during global placement inevitably incurs overheads on the runtime for global placement, it reserves appropriate whitespace for TSV insertion and thus significantly facilitates the routing process. Figure 4 shows the TSV insertion results and the routing congestion maps of the second layer of circuit des perf without and with whitespace reservation. The results reveal that TSVs are often inserted at whitespace far from the center of the placement region if no whitespace is reserved during global placement (see Figure 4 (a)), while many more TSVs are inserted close to the center of the placement region with the whitespace reservation (see Figure 4(c) ). As shown in Figures 4(b) and (d), routing congestion can substantially be reduced with whitespace reservation. The results show the effectiveness of our whitespace reservation.
Wirelength Model Comparison
In Section 4.1.2, we showed that the WA wirelength model has smaller estimation error upper bound than the LSE one. In this experiment, we examined the empirical results for the two models, based on the IWLS and industry benchmarks. To reduce the control factors for fair comparison, the whitespace reservation algorithm was not applied. Table 4 lists the placement results of the LSE model. As shown in Table 4 , our WA model can achieve averagely 1% shorter final routed wirelength and 1% fewer TSV counts than the LSE model. It should be noted that, since the LSE model has been shown to achieve the best result among recent academic placers and the placement results have been fine tuned, even 1% improvement is significant, as they are best known results.
CONCLUSIONS
We have presented a new TSV-aware placement algorithm for 3D IC designs. The global placement is based on a novel WA wirelength model; to the best of our knowledge, it is the first model in the literature that can outperform the well-known LSE wirelength model theoretically and empirically. Unlike the previous works that only minimize the 668 35.5 TSV count during placement, our algorithm additionally considers the whitespace reservation for TSVs and the physical positions for TSV insertion. Since the whitespace for TSVs is reserved and the physical positions of TSVs are determined during placement, 3D routing can easily be accomplished by traditional 2D routers. Experimental results have shown that our algorithm can achieve the best routed wirelength, TSV counts, and total silicon area among all published works, with the shortest runtime. 
