Abstract-Thermal analysis of integrated circuits (IC) is a high performance computing problem because the nanoscale spatiotemporal features of the problem result in a large discrete problem. In previous works, compact models of ICs were introduced to speed up the modeling process. However, such methods have limited accuracy as they approximate the underlying physics. They are also ill-suited to simulate the thermal characteristics of an IC at the cell-level. The finite element method (FEM) is an appropriate computational technique for providing both fast and accurate thermal analyses. Considering that the number of cells in modern ICs is on the order of millions, thermal analysis at this abstraction level is a formidable task. Consequently, handling the computational meshes and computing thermal profiles of an IC at the cell-level requires substantial computing power. In order to provide accurate cell-level thermal simulations at a lower computational cost, this work introduces advanced techniques that judiciously trade off mesh granularity with simulation accuracy which allows fast analysis of celllevel floorplans. The proposed cell-homogenization techniques start with a flat cell-level floorplan and a related power trace and produce reduced order meshes that accelerate thermal simulations with a negligible loss in accuracy. Results show that the proposed techniques achieve up to a 90% reduction in the number of nodes in the mesh with less than 5% error in the temperature compared to the full scale mesh. The simulation time is also reduced by an order of magnitude.
Abstract-Thermal analysis of integrated circuits (IC) is a high performance computing problem because the nanoscale spatiotemporal features of the problem result in a large discrete problem. In previous works, compact models of ICs were introduced to speed up the modeling process. However, such methods have limited accuracy as they approximate the underlying physics. They are also ill-suited to simulate the thermal characteristics of an IC at the cell-level. The finite element method (FEM) is an appropriate computational technique for providing both fast and accurate thermal analyses. Considering that the number of cells in modern ICs is on the order of millions, thermal analysis at this abstraction level is a formidable task. Consequently, handling the computational meshes and computing thermal profiles of an IC at the cell-level requires substantial computing power. In order to provide accurate cell-level thermal simulations at a lower computational cost, this work introduces advanced techniques that judiciously trade off mesh granularity with simulation accuracy which allows fast analysis of celllevel floorplans. The proposed cell-homogenization techniques start with a flat cell-level floorplan and a related power trace and produce reduced order meshes that accelerate thermal simulations with a negligible loss in accuracy. Results show that the proposed techniques achieve up to a 90% reduction in the number of nodes in the mesh with less than 5% error in the temperature compared to the full scale mesh. The simulation time is also reduced by an order of magnitude.
Index Terms-Thermal analysis, homogenization, DEF, LEF, FEM, Mesh
I. INTRODUCTION
Thermal issues are important in modern CMOS technologies due to self-heating, high power densities, and the low thermal conductivity of dielectric materials [1] , [2] , [3] . These problems lead to performance degradation and excessive leakage power that can hinder higher integration densities [4] . Emerging 3-D integration technologies enable long term performance scaling in the post device-scaling era. However, these technologies have inherent thermal issues due to the multi-tier nature of 3-D circuits which can impede heat dissipation [1] , [5] , [6] .
Consequently, thermal analysis tools are crucial for the robust and reliable operation of ICs and many academic IC thermal simulators have been introduced [7] , [8] , [9] , [10] , [11] . In general, these tools take as input a floorplan of the functional blocks of an IC and produce a computational mesh, i.e., subdivide the IC into smaller tetra or hexahedral (cuboid) elements for simulations. These tools are suitable to early stage design specification and validation. However, existing tools are impractical for post-Automatic Place and Route (APR) thermal simulations. Considering the role of temperature in the reliability and aging mechanisms of circuits, post-APR simulations are especially important [12] , [13] , [14] . Existing tools use relatively coarse (cuboid) elements to describe the physical implementation of the circuit, which fundamentally limits the granularity of the analysis. As a result, these tools do not provide sufficiently accurate analysis within reasonable computational times. In addition, cells known to potentially become hot spots (e.g., clock buffers) cannot be individually analyzed.
The computational methodologies used in these tools are ill-suited to solve the large linear systems generated from the discretization of a post-APR floorplan. The poor flexibility to simulate the complex geometries of a modern IC is another significant limitation. In order to bypass these limitations, other methods, such as the finite element method (FEM) can be used to solve the heat equation [15] . The FEM discretizes the heat equation directly, avoiding the accuracy pitfalls inherent in compact models. The sequence of resulting linear systems can then be solved with fast preconditioned iterative solvers [8] , [15] , [16] . Furthermore, FEM-based computational methodologies can model more versatile geometries, while accurately and efficiently solving the post-APR heat problem when compared to compact models.
Simulating the post-APR thermal characteristics with the FEM requires a computational mesh generated from the cell-level floorplan of a circuit. The number of vertices or nodes in this mesh, commonly referred to as degrees of freedom (DOF), is typically several orders of magnitude larger than the number of cells in the circuit. This trend can severely restrict the size of circuit that can be analyzed efficiently. For example, using standard finite element libraries, a machine with 32 GB memory can generally handle problem sizes on the order of 10 million DOFs. However, cell-level floorplans can easily generate meshes with hundreds (even thousands) of millions of DOFs, making such computations infeasible without massive parallel computing power. To reduce the number of DOFs, this paper introduces novel techniques that create a finite element mesh from a cell-level floorplan by carefully merging circuit cells to curtail the number of DOFs in the problem without compromising the accuracy of the thermal profile.
The specific contributions of this work to the problem of cell-level thermal analysis for an IC are:
• advanced cell-level homogenization (merging) techniques to effectively lower the number of DOFs of a mesh, • a tunable homogenization algorithm based on spatial and power information, • a standalone framework that can be integrated with the commercial EDA exchangeable files, such as DEF and LEF, and FEM-based thermal analyzers down to the standard cell-level, • the ability to convert the industrial standard design exchange format (DEF) and library exchange format (LEF) files to a computational mesh, which also supports future vertical integration technologies. The remainder of the paper is organized as follows. Section II reviews the FEM and motivates the proposed meshing techniques. Section III describes the cell homogenization algorithms. Section IV presents the results of the numerical experiments which illustrate the advantages of the proposed approach. Section V offers some concluding remarks.
II. BACKGROUND AND MOTIVATION
The temperature u(x, t) (in [K]) at point x = (x, y, z) and time t in a material body is governed by the following initial boundary value problem
The physical domain of the circuit is denoted by Ω and its boundary by ∂Ω. 3 ] is the power density dissipated by the active layer(s) of the system. The Robin boundary condition described by (1b) represents Newton's law of cooling, i.e., the heat flux at the boundary is proportional (with the constant heat transfer coefficient η) to the difference between the ambient temperature ua and the temperature at the wall.
At the core of the thermal analysis process is the discretization and numerical solution of (1) . In contrast to previous approaches that invoke the electrothermal duality to derive compact models to discretize the heat equation, such as [9] , [10] , the finite element method (FEM) is used [15] . The FEM is a more accurate representation of the underlying physics of the problem, i.e., the heat equation is directly discretized, rather than discretizing a model that is an approximation of (1). In addition, the FEM supports the discretization of more general geometries and handles different boundary conditions.
The FEM first partitions the domain into a union of non-overlapping elements Ω = ∪ k Ω k . The elements Ω k are typically tetra or hexahedra (cuboid) with a characteristic length h called the mesh width. The discretization of the domain into the finite elements is the mesh generation process. The finite element solution is obtained by interpolating the computed discrete nodal values at the vertices of the elements, i.e., the solution has the form u h (x, t) = 
. , un]
T and this vector of unknowns is determined by solving a linear system of n equations. Finely resolved computational meshes with many nodes translate into large (sparse) linear systems. Despite fast solution methods, working with such fine meshes and large problem sizes results in prohibitive computational costs. Therefore, reducing the order of the meshes while still maintaining the accuracy of the original finer (larger in size) meshes is paramount for effective thermal analysis.
A. Structured Meshes
The proposed cell merging techniques operate on structured grids consisting of hexahedral (cuboid) elements and are applied prior to the thermal simulation. Structured grids generally consist of hexahedral elements (though tetrahedrons are also possible) that follow a repeatable pattern. There are several advantages to using a structured grid over an unstructured (non-repeatable) grid. First, due to the regular pattern, the underlying data structures require less memory and make refinement of the mesh (uniform or adaptive) much faster. In addition, these structures yield regular sparsity patterns in the coefficient matrices which allow for better caching and faster execution of both the assembly and solution phases of the discrete problem [17] . Lastly, the regular cuboidal structures of integrated circuits are better supported by these meshes.
The proposed techniques are integrated with the mesh generator of the MTA 1 tool [18] . Ensuring quality of the mesh is extremely important and the mesh generator in the MTA offers a number of pre-simulation refinement routines for this. Certain circuit designs may produce meshes with stretched elements and reentrant corners at overhanging boundary surfaces such as heat sinks. Without additional local mesh refinement, the initially generated meshes for these structures can produce solutions with lower accuracy, e.g., unphysical damping of the temperature. As a result, the mesh generator handles the surroundings of a circuit by supporting local mesh refinement of these structures.
III. CELL-LEVEL HOMOGENIZATION
This section describes the proposed cell-level homogenization techniques for producing computationally efficient meshes with a minimal loss of accuracy in the thermal analysis. Modern CMOS technologies can be viewed as a stack consisting of multiple layers of materials in the z-direction. The device and backend layers are only a small portion of the stack and are modeled as a thin planar layer in thermal simulations. According to the design rules of a semi-automatic flow, placed cells are located on multiple parallel rows, flanked by power rails, i.e., GND and VDD. The placed cells have the same height and varying width according to their function and driving strength.
Creating a structured computational mesh for an IC with complete geometric information of the placed cells requires the processing of the DEF and LEF files, which contain the physical information and material properties of the circuit floorplan. The varying widths of the placed cells and the geometry of other components of the system (e.g., the package and/or heat sink) lead to a vast number of boundary lines for the initial mesh of the circuit floorplan. Thus, each placed circuit cell is typically associated with several grid cells (called henceforth atomic cells) where a placed cell cannot be smaller than one atomic cell. The process generating the initial grid performs the appropriate mapping of IDs between placed and atomic cells ensuring that the related power information of each placed cell is maintained during the merging and/or splitting operations.
The placement of cells into rows suggests the mechanism for grid reduction to be performed only in the x-direction (or the direction along which cells are aligned), otherwise the accuracy of thermal simulations will be significantly affected by simultaneously removing grid lines in the y-and z-directions. The following subsections introduce two grid line removal methods, one assigning spatial weights to the grid lines and the other driven by the power density of the cells adjacent to each grid line. In addition, a hybrid technique is proposed combining both of these schemes. Lastly, the critical issue of computing the power density of the merged cells is discussed.
A. Spatially Weighted Grid Lines
Maintaining the information of the placed cells is fundamental when generating a structured mesh. Each grid line is defined at least once by a placed cell or by the package components. Thus, an initial grid contains a large number of DOFs. An algorithm to reduce this number based on assigning weights to the grid lines in the x-direction is presented in this subsection.
The key idea of the method is to record the number of times each grid line is incident to placed cells and this information is used to eliminate grid lines in the x-direction, thereby decreasing the number of DOFs. A vector the size of the number of grid lines in the x-direction is initially created to record the grid line weights. These weights are defined as the number of boundary edges of placed cells associated with each grid line. Grid lines are removed whenever the weight is less than a user-supplied threshold. A low weight for a grid line means that only a few cells across the floorplan of the circuit share this grid line as a boundary. Setting this threshold depends on the characteristics of the design, such as the number and size of the placed cells. This threshold is an adjustable parameter that sets the granularity level between the block and cell-level floorplans. Moreover, by providing a list of exempted cells from merging (e.g., high power cells), the technique can maintain high accuracy at potential hot spots while reducing the DOFs in those regions of the circuit where the temperature is low. for all z in old z-grid do
for all y in old y-grid do
4:
for all x in old x-grid do 5:
if x is valid then
7:
for all S = x + 1 to old x-grid.end do
if S is valid then
shift ← S − x − 1 if shift = 0 then
17:
for all x to x + shift do 18: record(Atomic Cell ID) 19: record(distance) 20: end for Figs. 1(a) to 1(c) illustrate Algorithm 1 on an example circuit where the objective is to remove grid lines with a weight lower than 4 to construct a new reduced grid. Active and filler cells in a 2-D DEF layout are illustrated in Fig. 1(a) . Converting the circuit floorplan to a structured mesh first requires the grid space associated with the atomic cells of the circuit, which is illustrated by the grey dashed lines in Fig. 1(b) . The grid line ID numbers are given at the top of the figure. The grid lines at the boundary of the mesh are exempted from the elimination by denoting these lines with negative numbers. The weight of each grid line is denoted by the black number at the bottom of the figure. The grey numbers in parentheses are the weights before omitting filler cells. The reason to omit filler cells is that these cells contribute none or minimal energy to thermal simulation. For example, the weight of grid line 6 is 3. This value is calculated from the contributions of cells b46, d46, and d68, while omitting the contribution from the filler cell adjacent to b46. The contribution of the filler cell is reflected by the number 4 in the grey parentheses. Fig. 1(d) shows the new grid space and atomic cells. The DOFs are reduced from 84 to 54, i.e., by 35%, roughly translating to a 35% decrease in the simulation time.
In Fig. 1(c) , observe that some boundaries of the placed cells, shown in grey dashed lines, are removed. This removal indicates that the placed cells relating to these boundaries are either merged or split. For example, the placed cell a14 in Fig. 1(a) is split into two new cells. The left cell is merged with a01 to form a new placed cell (i.e., a02). The updated mesh is illustrated in Fig. 1(d) , where the newly created cells are highlighted in grey and underlined. The related power density information of the atomic cells is also shown in Fig. 1(d) by the IDs of the placed cells. The cells highlighted in grey and underlined should be assigned an appropriate power density generated from the existing power trace of the circuit. The technique to modify the power trace to match the new grid is discussed in Section III-D.
B. Power Density Weighted Grid Lines
Another technique to remove grid lines along the x-direction is based solely on the power trace information. This technique identifies which cells contribute significant power to thermal simulations in order to determine whether a grid line is eliminated or not. The core idea of this technique is that a hot spot is likely to appear in locations close to high power cells and avoiding their merging allows for the determination of potential hot spots with the highest spatial accuracy.
The merging process is similar to the method of the weighted grid lines. A vector the size of the number of grid lines in the x-direction is created to record the power density weight per grid line. However, this weight is not determined by simple enumeration of placed cells along a boundary line. Rather, each atomic cell has an associated power density from all placed cells incident to it. The power density weight is thus defined as the sum of the power densities contributed by each atomic cell attached to a grid line. The method employs the average power for all active components, which can be obtained from standard EDA tools, such as PRIMETIME PX [19] . The power of each placed cell is uniformly distributed among the atomic cells composing the placed cell. The power density of the cells along each grid line are summed and inserted at the appropriate location in the power density vector. Each entry in the vector represents the significance, in terms of power, of each grid line in the x-direction. Fig. 1(e) is an example of the power density evaluation during the labeling process. The placed cells a14 and c35 highlighted in grey and underlined have a power density (PD) of 10 and 1, respectively. While labeling the ID of a14 to its atomic cells, the process links the power density to the corresponding location in the vector. The grid line 1 and 4 contribute 10 each, but lines 2 and 3 add 20 because they are used to create atomic cells twice during the labeling process. The same mechanism is applied to the placed cell c35, where grid line 3 and 5 add 1 while 4 adds 2. The grey numbers at the bottom of the figure are the sum of power density for each grid line after the labeling process of a14 and c35.
After labeling all required atomic cells in the grid space, the vector contains the total power density for each grid line which is used to eliminate grid lines. A user-defined lower-bound power density threshold is applied to remove grid lines with a low power density. This threshold depends on the characteristics of each circuit and the desired accuracy. Setting a small threshold value produces finer meshes for more accurate thermal simulations. The algorithm that outputs the reduced mesh is similar to Algorithm 1 but with a vector of power density in the x-direction.
C. Hybrid Mode
Combining the two previous methods to eliminate grid lines can yield a more effective approach for further decreasing the grid size. The presented techniques require the user to directly input the reduction threshold. This usually requires a-priori knowledge of the circuit characteristics. The user driven approach can provide flexibility for specific applications, i.e., aggressively eliminating grid lines based on one technique over the other.
Alternatively, the hybrid mode automatically generates thresholds to reduce the number of nodes in the mesh. The related thresholds are generated based on a preset scale factor for reducing the DOFs in a mesh. For example, a scale factor 0.2 means that the mesh generator will create a new mesh that has roughly 20% of DOFs of the original, full mesh. The tradeoff between the accuracy and simulation time of the scaled meshes is reported in Section IV-A. The hybrid mode is similar to the previous methods in that it creates vectors for both weighted grid lines and summed power densities. However, this hybrid method has two additional vectors that duplicate, respectively, the original vectors of the spatial weight and power density associated with the grid lines. These newly created vectors are sorted in descending order. The technique advances both vectors simultaneously from location 0 to seek a proper threshold for both schemes. Since the proposed homogenization is based on the elimination of grid lines in the x-direction which linearly decreases the DOFs, the percentage of valid grid lines in the x-direction is the stopping criterion when searching for proper thresholds in both schemes. The convergence of the scheme is guaranteed by the analytical error bounds of the FEM. However the error can potentially be large if the mesh is too coarse or not judiciously constructed.
D. Power Trace
The cells that result from splitting and merging are labeled as new atomic cells. For example, the grey and underlined cells in Fig. 1(d) are the newly created cells. These atomic cells have no corresponding power information required for thermal simulations. In Algorithm 1, the new cells are created and their size determined based on the distance between the kept grid lines and their IDs are recorded. This information is used to interpolate the power trace and assign power values to the newly created atomic cells. Let f = f (x) denote the power density of a newly merged cell, the power density of this new cell is computed as the normalized sum of the power density from the merged atomic cells. Specifically,
where the sum is taken over the m merged atomic cells and dm is the length (in the x-direction) of atomic cell m.
IV. RESULTS AND DISCUSSION
This section discusses the quality of the meshes generated by the proposed homogenization algorithms for cell-level thermal analysis. Design space exploration is also performed presenting the tradeoffs between simulation accuracy and reduced mesh size. A LDPC circuit using a TSMC 65 nm technology is the benchmark problem investigated in the following experiments. This circuit has 67K cells and is attached to a heatsink for cooling. The meshing process of the full structure for thermal analysis leads to 9.3M DOFs where the circuit contains of 3.9M DOFs. All thermal simulations are performed with the MTA [18] . The format of the generated meshes are compatible with the open source software GMSH [20] , which is incorporated in many existing FEM-libraries. As a result, any FEM-based thermal analyzer compatible with GMSH can incorporate the proposed cell-level homogenization techniques.
A. Quality of Meshes
To demonstrate the tradeoff between the speed and accuracy of the mesh reduction techniques, the maximum point-wise error in the temperature between coinciding nodes of the original non-merged mesh (9.3M DOFs) is computed for a series of reduced order meshes at a fixed vertical level of the die, i.e., uo −ur ∞. The hybrid mode is applied to reduce the size of the meshes by 10%, 25%, 50%, and 75% for a steady state simulation. The results are reported in Table I . The error increases as the number of total DOFs (including the heatsink) decreases. For the full scale thermal simulation, the temperature range was 2.32 Centigrade. As a result, the relative error for the 10% reduced mesh is only 4.4%. In addition, the original simulation time is 466 s which decreases to 49 s for the 10% reduced mesh, i.e., a roughly ten-fold reduction in the simulation time with only a 4.4% error. Fig . 2 shows a histogram of the number of nodes at each temperature which has been normalized by the total number of DOFs of the original, full scale mesh. Observe that the values for the 75% reduced mesh are very close to 1 after normalization which indicates that most of the characteristics of the thermal profile of the full mesh are preserved. This result demonstrates that the original mesh size is efficiently reduced without sacrificing accuracy. The 50% and 25% reduced meshes are also close to 1 for most of the temperatures demonstrating that a 25% reduced mesh largely maintains the thermal characteristics of the full mesh. On the other hand, the discrepancy of the 10% reduced mesh to the full mesh shows that setting a small scale factor can produce less accurate temperature profiles. However, based on the errors reported in Table I , the point-wise error is still within 0.103 degrees. Most of the temperatures are preserved for the 75%, 50%, and 25% reduced meshes. However, there is a discrepancy between the nodes at the high and low end of the temperature range. This situation results from the application of the hybrid mode which preserves most grid lines near hotspots but eliminates grid lines near lower temperatures. A discussion of this behavior is provided in the following section.
B. Design Space Exploration
The different grid reduction methods, i.e., the spatially weighted grid lines (Grid), power density weighted grid lines (Power), and hybrid modes (Hybrid) are compared in this subsection. The effect of applying these techniques to meshes with different levels of homogenization is listed in Table II . The values reported in the table are the root-mean-square distances between the normalized number of nodes at each temperature value of the homogenized meshes to the original mesh. Note that the method of spatially weighted grid lines is close to 1 for all reduced meshes. This result indicates that this technique uniformly reduces the size of each mesh. This behavior is expected since the method is solely based on spatial information. Alternatively, the hybrid mode has a larger discrepancy from 1, especially close to the higher and lower temperatures. This phenomenon is depicted in Fig. 2 . The non-normalized distribution of the nodes at different temperatures is shown in Fig. 3(a) with an enhanced view of the higher temperature range in Fig. 3(b) . Note that the merging based on the power density and hybrid mode preserve the high temperature nodes better than the spatially weighted grid line scheme. Since it is preferable to accurately capture the thermal profile at potential hotspots, the weighted power density and hybrid methods are more suitable for reducing the mesh size while still accurately detecting the hot spots. In addition, the hybrid mode preserves the nodes at the lower end of the temperature range better than the power density driven technique. 
V. CONCLUSION
In this paper, efficient mesh reduction techniques are introduced for cell-level thermal analyses, which scale down mesh sizes while maintaining the characteristics of the thermal profile. These techniques are based on weighting grid lines based either on spatial or power density information. A hybrid mode is proposed that automates the homogenization process. This hybrid method requires no prior knowledge of the design and combines both the power and physical information of the circuit to reduce the mesh size. The experimental results show a less than 5% error to the temperature of the original, full mesh, with a 90% reduction in the size. This low error demonstrates the efficiency of the proposed cell-level homogenization algorithms. Statistical analysis shows a reduced order mesh that is 25% of the size of the original mesh that captures most of the characteristics of the thermal profile. Consequently, a 75% reduction of the simulation time is achieved with no qualitative loss of accuracy.
