Abstract -In this paper we propose a multilevel non-linear programming based 3D placement approach that minimizes a weighted sum of total wirelength and TS via number subject to area density constraints. This approach relaxes the discrete layer assignments so that they are continuous in the z-direction and the problem can be solved by an analytical global placer. A key idea is to do the overlap removal and device layer assignment simultaneously by adding a density penalty function for both area & TS via density constraints. Experimental results show that this analytical placer in a multilevel framework is effective to achieve trade-offs between wirelength and TS via number. Compared to the recently published transformation-based 3D placement method [1], we are able to achieve on average 12% shorter wirelength and 29% fewer TS via compared to their cases with best wirelength; we are also able to achieve on average 20% shorter wirelength and 50% fewer TS via number compared to their cases with best TS via numbers.
I. Introduction
Three-dimensional (3D) IC technologies can offer the potential to significantly reduce interconnect delays and improve system performance. Furthermore, the shortened wirelength, especially that of the clock net, also lessens the power consumption of the circuit. 3D IC technologies also provide a flexible way to carry out the heterogeneous system-on-chip (SoC) design by integrating disparate technologies, such as memory and logic circuits, radio frequency (RF) and mixed signal components, optoelectronic devices, etc., onto different layers of a 3D IC.
Device layers in a 3D IC are connected using through-silicon vias (TS via). However, TS vias are usually etched or drilled through device layers by special techniques and are costly to fabricate. A large number of the TS vias will increase the area overhead and the cost of the final chip. Also, under the current technologies, TS via pitches are very large compared to the sizes of regular metal wires, usually around 5-10 m. In 3D IC structures, TS vias are usually placed at the whitespace between the macro blocks or cells, so the number of TS vias will not only affect the routing resource but also affect the overall chip or package areas. Therefore, the number of TS vias in the circuit is constrained and needs to be controlled during physical design.
In recent years, 3D IC physical design attracts more and more attention. Along with the technology updates, there are several published works targeted on the 3D placement problem. A thermal-driven force-directed 3D placement method [2] was proposed, where the temperature profile is interpreted as thermal forces to guide the cell placement. In their work, a 3D force-directed placement engine is used to place each cell in a true 3D space, with z position being a real number. Rounding is needed for layer assignment, which may introduce rounding errors. A folding/stacking based 3D placement [1] is proposed to reuse the 2D placement results and perform device layer assignment and other optimizations. A partition-based approach [3] was also applied to the 3D context, where the temperature and TS via counts and thermal effect are modeled in the min-cut objective together with the total wirelength. Although it is a convenient way to consider these effects with this partition-based method, recent comparative analysis suggests that partition-based methods are not as competitive as analytical methods for modern 2D placement problems [4] . A quadratic programming approach [5] for 3D placement was also proposed, which finds an overlap-free placement by modeling the cell distribution with a discrete cosine transformation based cost function. It shares the same problems as [2] , which only distribute the cells in a cuboid space but not into layers.
All these techniques try to explore the trade-off among wirelength, TS via number and temperature. But the quality of the solution may be sub-optimal in cases. The goal of our work is to first develop a high-quality solver for the simple 3D placement problem with objective of wirelength and TS via number, so that it can be used as a basic engine to consider other constraints and/objectives in 3D placement.
In particular, we develop a 3D placement approach using a nonlinear optimization method to handle the 3D global placement. The main idea is to do the overlap removal and device layer assignment simultaneously by adding a density penalty function for both area & TS via density constraints. The minimization of this density penalty function have the tendency to achieve overlap-free condition in (x,y)-direction and also legal device layer assignment in z-direction. The TS via number is also considered by adding TS via number penalty function to the objective. The contributions of our work are listed as following:
• A novel density penalty function is proposed to do the cell distribution in a 3D placement region. The minimization of this function would result in a close-to-legal 3D global placement. A formal proof is also given. Our method relieves the rounding problem of previous analytical 3D placement methods, and thus reduces the effort for the detailed placement phase.
• We observe that multilevel scheme is effective to control 
4B-4
TS via number, which provides extra TS via number reduction to the weighting factor.
• Experiments are performed to compare with [1] on the trade-off curves of wirelength and TS via number. The experimental results show that our method outperforms theirs by achieving on average 12% shorter wirelength and 29% fewer TS vias compared to their cases with best wirelength; we are also able to achieve on average 20% shorter wirelength and 50% fewer TS via number compared to their cases with best TS via numbers.
The remainder of this paper is organized as follows. Section II formulates the 3D placement problem and describes the placement flow. Section III introduces our 3D analytical placement engine and gives formal proofs for the equivalence between the minimizer of our density penalty function and a legal solution. Section IV describes the multilevel framework with the analytical placement engine. Section V presents the experimental results to demonstrate the effect of our method on the trade-offs between wirelength and TS via number. Finally, section VI concludes the paper and discusses about the future work.
II. Problem Formulation and 3D Placement Flow

A. Problem Formulation
Given a circuit represented as a hypergraph ( , ) H V E = , the placement region R (scaled to [ 
Because the routing information is unknown during the placement process, TS via number is also estimated through a similar model. The TS via number of a net is calculated as the height of the bounding cube of the cells belong to that net. The TS via number ( ) v e of a net e is calculated as follows:
The weighting factor α is used to achieve the trade-offs between the wirelength ( ) l e and the TS via number ( ) v e . In modern analytical placers (e.g. those described in [6] ), the non-overlap constraints are usually transformed to density constraints during global placement, and are legalized during detailed placement. The formal descriptions will be given in Section III.D.
B. 3D Placement Flow
The overall placement flow is shown in Fig. 1 . The global placement starts from scratch, or takes in the given initial placement. The global placement incorporates the analytical placement engine (Section III) into the multilevel framework that is used in [7] . The global placement is then processed layer-by-layer with the 2D detailed placer [8] to obtain the final placement.
III. Analytical Placement Engine
The analytical placement engine solves problem (1) by transforming the non-overlap constraints to density penalties. 
The wirelength ( ) l e (Section III.B), the TS via number ( ) v e (Section III.C), and the density penalty function ( , , ) Penalty x y z (Section III.D) will be described in the following subsections in detail.
In order to solve this constrained problem, penalty methods [9] are usually applied:
OBJ x y z l e v e Penalty x y z
This penalized objective function is minimized at each iteration, with a gradually increasing penalty factor μ to reduce the density violations. It can be shown that the minimizer of equation (5) is equivalent to problem (4) when μ → ∞ if the penalty function is non-negative.
A. Relaxation of Discrete Variables
As mentioned in Section II.A, the placement variables are represented by triples ( , , )
The range of i z is relaxed from the set {1, 2, , } K to a continuous interval [1, ] K . After relaxation, a nonlinear analytical solver can be used in our placement engine. The relaxed solution is mapped back to the discrete values before the detailed placement phase. 
B. Log-sum-exp Wirelength
4B-4
The half-perimeter wirelength ( ) l e defined in equation (2) is replaced by a differentiable approximation with log-sum-exp function [10] i i
x y are in the range between 0 and 1, and the parameter η is set to 0.01 in implementation as [11] .
C. TS Via Number
The TS via number ( ) v e estimation defined in equation (3) is also replaced by the log-sum-exp approximation:
D. Density Penalty Function
The density penalty function is for overlap removal in both the (x,y)-direction and the z-direction. The minimization of the density penalty function should lead to a non-overlap placement in theory.
Assume that every cell i v has a legal device layer
, then we can define K density functions for these K device layers. Intuitively, the density function ( , ) is not aligned to any of two device layers. We borrow the idea from the bell-shaped function [12] to define the density function for this case: ( , ) ( , ) ( , ), for 1 
We call (10) the bell-shaped density projection function, which extends the density function (8) from integral layer assignments to the definition (9) for relaxed layer assignments. It is obvious that (9) is consistent with (8) when the layer assignments { } i z are integers. An example of how this extension works for a 4-layer 3D placement is given in Fig. 3 . The x-axis is the relaxed layer assignment in z-direction, while the y-axis indicates the amount of area to be projected in the actual device layers. The four curves colored in red, green, blue and purple represents device layer 1, 2, 3 and 4 respectively. As in this example, a cell is temporarily placed at 2.316 z = (yellow triangle) between layer 2 and layer 3. The bell-shaped density projection functions project 80% of its area to layer 2 (green) and 20% of its area to layer 3 (blue). In this way, we establish a mapping from a relaxed 3D placement to the area distributions in discrete layers. minimizer of ( , , ) P x y z . The proof of Lemma 1 is trivial and thus is omitted. Therefore, minimizing ( , , ) P x y z provides a necessary condition for a legal placement. However, there exist minimizers that cannot form a legal placement. An example is shown in Fig. 4 , where placement (b) also minimizes the density penalty function but it is not legal.
Fig. 4 -Two Placements with the Same Density Penalties
To avoid reaching such minimizers, we introduce the interlayer density function: ( , ) ( 0.5, ) ( , ), for 1 1
and also the interlayer density penalty function:
Similar to the density penalty function ( , , ) P x y z , the following Lemma 2 is also true.
Lemma 2. Assume the total area of cells equals the placement area, every legal placement is a minimizer of ( , , )
Q x y z .
Combining the density penalty functions ( , , ) P x y z and ( , , ) Q x y z , we define the following density penalty function: k E u v for differentiability. As in [11] , the densities are smoothed by solving Helmholtz equations: 
is used in our implementation, whose gradient is computed efficiently with the method in [14] .
IV. Multilevel Framework
The optimization problem below summarizes our analytical placement engine: 
This analytical engine is incorporated into the multilevel framework in [7] , which consists of coarsening, relaxation, and interpolation.
The purpose of coarsening is to build a hierarchy for the multilevel diagram, where we use the best-choice hypergraph clustering [15] .
After the hierarchy is set up, multiple placement problems are solved from the coarsest level to the finest level. In a coarser level, clusters are modeled as cells and the connections between clusters are modeled as nets, so that there is one placement problem for each level. The placement problem at each level is solved (relaxed) by the analytical engine (17) .
These placement problems are solved in the order from the coarsest level to the finest level, where the solution at a coarser level is interpolated to obtain an initial solution of the next finer level. The cell with highest degree in a cluster is placed in the center of this cluster (C-points), while the other cells are placed at the weighted average locations of their neighboring C-points, where the weights are proportional to the connectivity to those clusters.
V. Experimental Results
Our experiment is performed on the IBM-PLACE benchmark [16] , which is a standard cell circuit without I/O ports, as was done in [1] . In this experiment, we also assume a 4-layer implementation of 3D IC. The floorplan size is scaled from [16] by dividing the original area by 4, and then enlarging it to obtain 10% white space. Filler cells without connection to any other cells are added to ensure the existence of feasible solutions for the equality constraints in problem (4).
4B-4
In TABLE I, we list the statistics of the 10 out of 18 circuits in the benchmark, which are the only circuits with available experimental results in [1] . We also list partial results from the previous transformation-based 3D placement method [1] for later comparison. Here only the results of the three typical transformation schemes are listed, including the local-stacking transformation "LST (r=10%)", window-based transformation "LST (8x8 win)" and a folding transformation "Folding-2". The geometric averages are computed to measure the overall results. These results listed are for trade-offs between wirelength and TS via number without awareness of temperature, thus the following comparisons are valid.
TABLE II presents the results for our multilevel analytical placer with the TS via weight 10 α =
. Due to the comparability and the page limit, we only show our results on those 10 circuits in the benchmark. Three sets of results are collected, for 1-level placement, 2-level placement and 3-level placement, respectively. The 1-level placement is to run the analytical placement engine directly without any clustering, while 2-level or 3-level placements construct a 2-level or 3-level hierarchy by clustering. The wirelength after global placement (GP WL), the wirelength after detailed placement (DP WL), the number of TS vias (#TSV), and the runtime are all collected.
In TABLE II , we see that with the same weight for TS via number, 1-level placement achieve the shortest wirelength, while the 3-level placement achieve the fewest TS via number. We compare our multilevel analytical placement method and the previous transformation-based placement method [1] by comparing our 1-level placement with the "LST (r=10%)" (the best wirelength case), and comparing out 2-level placement with the "LST (8x8 win)", and compare the 3-level placement with the "Folding-2" method (the best TS via case). From the data shown in TABLE I and TABLE II , it is clear that our 1-level placement achieves on average 12% shorter wirelength and 29% fewer TS via than "LST (r=10%)"; our 3-level placement also achieves on average 30% shorter wirelength with slightly fewer TS via number than "Folding-2".
In order to obtain a more complete comparison, different trade-off curves are generated for the circuit ibm13 in [16] . In the curves "1-Level", "2-Level" and "3-Level", the TS via number control is achieved by increasing the weighting factor α , from 10 to 10000. In the meantime, the results of ibm13 with "LST (r=10%)", "LST (r=20%)", "LST (8x8 win)", "Folding-4" and "Folding-2" in [1] are also connected by a curve to visualize the trade-offs. In this example, it is very clear that the analytical placement engine outperforms the transformation-based method both in wirelength and TS via number. Moreover, the multilevel framework enables a sharp reduction of TS via number, without much degradation in the wirelength. Be aware that the results for other circuits have similar behaviors.
We also notice that in the comparison between "Folding-2" and our 3-level placement, the latter does not achieve as few TS via number as "Folding-2" for two large circuits. It is because the analytical engine minimizes the objective ( ( ) 10 ( )) 4B-4 method. By measuring the results with this objective function, the 3-level placement still outperforms the "Folindg-2" method.
Fig. 5 -Trade-off Curve for the Circuit ibm13
Moreover, to demonstrate the ability of controlling the TS via number with our multilevel analytical 3D placer, we run the experiments by a 3-level placement with 1000 α = .
The experimental results are shown in TABLE III. Compared to "Folding-2", this set of results achieves on average 20% shorter wirelength and 50% fewer TS via number. In addition, the TS via number obtained by 4-way mincut using hMetis [17] is also listed in this table for reference, which focuses on TS via minimization only by minimizing the total cutsize. Our TS via numbers are very close to the mincut results which can be viewed almost as a lower bound.
VI. Conclusions and Future Work
In this paper a multilevel analytical 3D placement algorithm is proposed, which minimizes the weighted sum of wirelength and TS via number with careful handling of overlap removal and device layer assignment in an analytical solver. The overlap removal and device layer assignment are handled by a density penalty function, whose minimizer is guaranteed to be a legal placement in the z-dimension. Experimental results demonstrate that our multilevel framework combined with the analytical engine is very effective in controlling the TS via numbers.
We note that the multilevel placement does not obtain wirelength as short as the flat placement. A possible explanation is that clustering limits solution space by requiring all cells in a cluster to be in the same layer. Further study to overcome this limitation is underway. 
