Thermal analysis is crucial for determining the propagation of heat and tracking the formation of hot spots in advanced integrated circuit technologies. At the core of the thermal analysis for integrated circuits is the numerical solution of the heat equation. Prior academic thermal analysis tools typically compute temperature by applying finite difference methods on uniform grids with time integration methods having fixed time step size. Additionally, the linear systems arising from the discretized heat equation are solved using direct methods based on matrix factorizations. Direct methods, however, do not scale well as the problem size increases. Moreover, most of the tools support only 2-D or a limited number of 3-D technologies. To address these issues, this paper presents a novel thermal analyzer with the ability to model both 2-D and 3-D circuit technologies. The analyzer solves the heat equation using the finite element method for the spatial discretization coupled with implicit time integration methods for advancing the solution in time. It also offers fully adaptive spatio-temporal refinement features for improved accuracy and computational efficiency. The resulting linear systems are solved by a multigrid preconditioned Krylov subspace iterative method, which gives superior performance for 3-D transient analyses. The analyzer is shown to accurately capture the propagation of heat in both the horizontal and vertical directions of integrated systems.
INTRODUCTION
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. For advanced CMOS technologies below the 90 nm node, heat issues incurred due to higher power lead to performance degradation and excessive leakage currents [1] . Device technologies, such as FinFET and SOI, have been introduced to suppress leakage currents in order to maintain silicon technology scaling. However, they can be more susceptible to thermal issues due to self-heating and the low thermal conductivity of their constituent materials [2] . Designs of "post-Dennardian" scaling also face challenges of exponentially increasing power density [3, 4] . Emerging 3-D technologies (i.e., multi-tier systems with vertical interconnections) also suffer from thermal issues since 3-D ICs have complex heat dissipation paths and more active regions than a single tier in a package [1, 5, 6] .
To deal with these challenges, early stage thermal characterization of ICs, (e.g. microarchitectures of processors) and the corresponding physical structures are required. The aim is to provide a starting point for efficient thermal design that eliminates thermal hazards. Fine-grain, or high resolution 3-D transient thermal analysis tools are promising for the simulation of modern and future ICs. Existing tools, e.g. [7, 8, 9, 10, 11] , can be used to detect the formation of hot spots both in steady-state and transient regimes. These tools use compact models to describe heat transfer and are efficient only for a narrow class of problems, namely, simple small-scale problems where the spatial and time discretization remain fixed, resulting in linear systems that do not change at each time step. Additionally, existing simulators cannot accurately model various 3-D structures of future ICs. This paper introduces a versatile thermal analyzer, which can accurately capture the flow of heat in the various 3-D structures of an IC.
The contributions of this paper to the problem of thermal analysis for 2-D and 3-D circuits are:
• the ability to model complex 2-D and 3-D circuit geometries,
• the fully automated mesh generation of such circuit geometries from a corresponding floorplan description,
• the use of second order accurate numerical schemes both in space and time,
• the solution of the resulting linear systems using advanced preconditioned iterative methods,
• and the support of fully adaptive spatio-temporal refinement. The proposed thermal analyzer uses the finite element method (FEM) [12] in conjunction with an implicit time stepping scheme to solve the heat equation. The FEM is a standard numerical technique that is well-suited to accurately capture the flow of heat in the complex geometries of an IC. The computation of transient thermal profiles with sufficient granularity requires the solution of a large, sparse linear system at each time step, which is then solved using multigrid preconditioned Krylov subspace methods [12, 13] . Such preconditioned iterative solvers offer faster solution times and use less memory [14] than sparse direct methods based on matrix factorizations [9] . The iterative solver also offers the potential for linear scaling with problem size.
Additionally, the proposed thermal analyzer offers fully adaptive spatio-temporal refinement features. This added flexibility gives our method computational advantages over other analyzers as discussed in Section 5.2.1. The main benefit of adaptive spatial refinement is that smaller discrete problems can be solved without a loss of accuracy. In addition, with adaptive time stepping methods, the simulator can apply larger time steps during the periods of low (or no) power dissipation and smaller time steps to accurately capture the transient thermal effects during rapid changes in the power and temperature profile. This adaptation does not require user intervention.
The rest of the paper is organized as follows. Section 2 discusses related work and the motivation for the proposed analyzer. Section 3 introduces the thermal model and FEM for computing transient thermal profiles of a given 2-D or 3-D IC. Section 4 describes the implementation details, which include a description of the iterative solver and preconditioner, and the adaptive spatio-temporal features. Section 5 presents the results of the numerical experiments and a discussion of the advantages of the proposed method. Section 6 offers some concluding remarks.
RELATED WORK AND MOTIVATION
Thermal analysis tools for integrated circuits have been previously introduced, for instance HotSpot 6.0 [9] , 3D-ICE [10] , and ISAC(2) [7, 15] . These tools invoke the analogy between heat flow driven by temperature differences, and electrical current flow driven by voltage differences. The electro-thermal duality supports compact thermal models comparing lumped RC networks with heat dissipation represented as current sources. In each of these tools, the transient temperature profiles are computed using a time integration method. The time integration methods are either explicit [9] , semi-implicit [11] , or low order implicit [10] . Such approaches either restrict the maximum allowed step size, or introduce spurious damping, compromising accuracy over long time intervals. These tools offer fast transient thermal analysis, but are effective only for a limited range of problems, specifically, simple circuit geometries that lead to small-scale linear systems that do not change at each time step, i.e., the spatial and time discretizations are fixed. Practically, these tools can model 2.5-D geometries, where each layer of material cannot be freely discretized in the vertical direction. However, these tools provide limited support for modelling the surrounding structures of an IC, such as packaging and heat sink, which does not allow for design and package space exploration.
Effective exploration is enabled by appropriately modelling package and cooling materials and structures, which complicate thermal analysis and can worsen computational speed. Two common ways to improve this speed while not sacrificing accuracy are adaptive mesh refinement and adaptive time stepping methods [7, 15] . The benefit of adaptive spatial refinement is that the same level of the solution accuracy can be obtained at a lower computational cost. Similarly, adaptive time integration methods allow for the time step size to adapt to the temporal behaviour of the system without loss of accuracy.
A visual comparison of the computational features of the proposed analyzer against other available thermal analysis tools is shown in Fig. 1 .
The figures are generated based on the point system outlined in Table 1 . Each tool is compared based on the following set of attributes: computation time, the solver type and accuracy, adaptivity features, the problem complexity it handles, ease of use, and the supported circuit structures. If the tool possesses a particular property, a point is awarded.
Note that the proposed analyzer covers the largest area of the hexagon defined by the different features, and therefore covers the broadest set of computational capabilities. The only other tool that shares the same range of fully adaptive spatio-temporal refinement features is ISAC. The adaptive refinement strategy in ISAC is based on temperature differences across cell interfaces, which is in contrast to our proposed approach based on flux differences at element interfaces (cf. Section 4.5). Hotspot only offers adaptive time stepping on uniformly refined spatial grids. However, both ISAC and Hotspot employ explicit time stepping schemes and as a result their time step size selection is constrained by the spatial discretization. This constraint gives our proposed analyzer an advantage, since unconditionally stable implicit time stepping schemes are employed, which do not impose any constraints in this context [16] .
Additionally, the proposed analyzer utilizes preconditioned iterative methods for the solution of the linear systems arising from the discretized problem.
Both Hotspot and 3D-ICE use the Super-LU solver, a direct method that scales poorly with problem size. ISAC uses a geometric multigrid solver. The proposed analyzer uses state-of-the-art algebraic multigrid preconditioned iterative methods which converge rapidly, yielding improved solution times and iteration counts than the previously mentioned solvers.
More importantly, the proposed analyzer is the first academic tool with the capability to realistically model both 2-D and 3-D structures, such as multiple tiers, TSVs, and finned heat sinks. In this way, our analyzer offers a more comprehensive set of computational features and a robust solution methodology, which supports the thermal analysis of complex and advanced circuits.
THERMAL MODEL
Let T (x, t) denote the temperature (in [K]) at time t and a spatial point x = (x, y, z) in an IC. The temporal and spatial evolution of the temperature are governed by the heat equation subject to appropriate boundary and initial conditions, specifically,
The physical domain of the circuit is denoted by Ω and its boundary by ∂Ω. The function ∇T = (Tx, Ty, Tz) is the temperature gradient and ∇ · T = Tx + Ty + Tz denotes the divergence operator, where, for example, Tx = 2 K] at the boundaries. Note that cV = ρC is the volumetric specific heat. The function f (x, t) [W/m 3 ] is the power density dissipated by the active layer(s) of the system. The Robin boundary condition described by (1b) represents Newton's law of cooling, i.e., the heat flux at the boundary is proportional (with the constant heat transfer coefficient η) to the difference between the ambient temperature Ta and the temperature at the wall ∂Ω. In our model, the Robin boundary condition is applied to the top and bottom of the circuit. The four lateral walls are considered adiabatic, with zero flux (η = 0) at these boundary faces, although the proposed method can handle Robin conditions over the entire boundary ∂Ω. Furthermore, we assume the parameters C, ρ, κ, and η are independent of the temperature. The values of C and κ are assumed, however, to be different specific to the disparate floorplan components of the circuit, including, for example, TSVs and finned heat sinks.
The heat equation (1) is solved numerically using the FEM [12] . The FEM first partitions the domain into a union of non-overlapping elements Ω = ∪ k Ω k . The elements Ω k are typically tetra or hexahedra (cuboid) with a characteristic length h called the mesh width. The finite element solution is obtained by interpolating the computed discrete nodal values at the vertices of the elements, i.e., the solution has the form T h (x) = n j=1 Tjφj(x), where Tj ≈ T (xj) are the n unknown nodal values, and φj(x) is a global Lagrangian basis function that equals 1 at node xj and zero at every other node. This method is different from compact circuit-based thermal models in which the solution is constant in each thermal element, with discontinuities at inter-element boundaries.
The n unknown nodal values {Tj} n j=1 , also called degrees of freedom (DOF), can be collected into a column vector
T . This vector of unknowns is determined by solving a linear system of n equations. For the steady state (i.e., setting ∂T ∂t = 0 in (1a)), this linear system has the form
The size of the coefficient matrix K ∈ R n×n grows as the mesh is refined. The entries of the matrix K and the vector f h in (2) are
Note that the locality of the FEM basis set implies that the matrix K is sparse, since Kij = 0 if and only if node i is connected to node j. The matrix K and vector f are assembled by looping over the elements and computing these integrals via appropriate Gaussian quadrature rules [12] . For transient problems (
∂T ∂t = 0), after discretizing in space by the FEM, the heat equation (1) becomes a system of n initial value problems (IVPs) of the form
where M is a mass matrix with entries Mij = Ω φjφi dx.
To solve this system of IVPs the time derivative is approximated by a backward difference formula (BDF method). For further details on various time integration methods see, e.g., [16] .
If an implicit integration method is used, the solution of a linear system is required at each time step. Higher order methods, such as the second order BDF method, can take larger time steps than the first order method with the same level of temporal accuracy leading to appreciable computational savings in time. The computational gains of using this higher order formula are demonstrated in Section 5.
The repeated assembly and solution of these linear systems accounts for the majority of the computational effort in transient thermal analysis. Strategies to address this issue have been considered before, for example in [11] , where Alternating Direction Implicit (ADI) methods are used. If the resulting linear system does not change between time steps, i.e., there is no spatial refinement or adaptive time stepping, a single LU factorization of the system matrix can be computed at the first step. At subsequent time steps, the solves are performed using backward and forward substitution with the factors. The transient simulations in Hotspot are computed with this method, which yields fast solve times [9] . However, this approach is also the primary limitation of that strategy. In order to consider all the spatio-temporal changes a sufficiently refined spatial grid and time step size must be fixed in advance throughout the entire simulation, which can be a severe restriction and cause excessive computational overhead and accuracy issues (see Section 5.1).
The use of adaptive spatio-temporal features require the assembly and solution of linear systems at each time step.
However, the size of the resulting linear systems and the number of time steps, (i.e., the number of repeated linear solves) are typically much smaller than in Hotspot [9] for a given accuracy, leading to potentially significant computational savings. In addition, the adaptive approach systematically supports more complex structures that better fit the physical structure of planar or 3-D integrated systems.
IMPLEMENTATION DETAILS
This section describes the implementation details of the thermal analyzer. In Section 4.1, the tool flow is presented. In Section 4.2, the hierarchical XML file which encodes the geometry and thermal properties of the IC is described. In Section 4.3, the FEM implementation is presented and the corresponding linear solver is discussed in Section 4.4. The adaptive spatio-temporal refinement features of the method are described in Sections 4.5 and 4.6, respectively.
Tool Flow
The proposed thermal analyzer has five main stages in the tool chain: structure definition, mesh generation, mesh optimization, thermal simulation by the FEM solver, and visualization. In the structure definition stage, the user describes the circuit geometry as an .xml file in a hierarchical structure similar to the design hierarchy of the IC. This hierarchical structure allows the user to provide detailed information required for the simulation, or to select the required granularity level. In the mesh generation stage, the mesh file, which discretizes the computational domain for the thermal analysis, is generated using the open-source mesh tool GMSH [17] . However, the mesh generated by GMSH is not appropriate for the IC structure due to duplicate definitions of physical volumes. Thus, the mesh optimization stage, a new tool to surpass these limitations is developed, which removes the duplicates and addresses floating point rounding problems for the nodal coordinates of the mesh.
The created mesh is an initial input to the FEM solver in the thermal simulation stage. The power density source function is provided from a power trace file. The power trace file is generated by architecture simulators coupled with power trace converters, or power estimators for ICs, such as PrimetimePX [18] . In the visualization stage, results are plotted in Paraview [19] , an open-source visualization tool, which allows the users to view cross-sectional views of the IC to observe the full 3-D flow of heat.
Versatile Structures in XML
The hierarchical structure of the IC is encoded in an XML file, with the added benefit of describing the subdomain structures of the circuit. The open source XML parser [20] is extended to ensure that the physical traits and thermal properties of the disparate components of the circuit can be systematically processed to both automatically generate the computational mesh and initialize the FEM solver for thermal simulations.
The proposed analyzer differs from previous academic and open-source tools in that the mesh generation process is completely automated and able to represent advanced structures such as TSVs and microbumps. 3-D structures are natively supported by defining multiple tiers in the vertical direction. The analyzer also has the ability to support more complicated structures, such as heat sinks with fins. Furthermore, users can create their own XML files defining the physical dimensions and thermal properties of the circuits and subsequently generate the corresponding computational domain [21] .
FEM Implementation
The core of the thermal simulator uses the functionalities of the finite element library OOMPH-Lib (Object Oriented Multi-Physics Library), developed at the University of Manchester [22] . Several new features are implemented to support the capabilities of the proposed thermal analyzer. The physical domains can be discretized using either first order or second order elements. The time integration methods employed by the analyzer are first and second order implicit backward differentiation formulas. In experiments, the asymptotic order of the spatial discretization and the time integration method are matched. The first order scheme offers slightly faster simulation times but at the cost of lower accuracy. Thermal analysis for complex structures, however, requires second order methods to satisfy accuracy critera as shown in Section 5.1.
Linear Solver
The software library platform used to build the simulator is designed to solve more complicated nonlinear problems and employs Newton's method to solve the nonlinear systems of equations arising in such cases. In our case, the problem is linear and the Newton iteration converges in one step. Though it may seem counterintuitive to solve a linear problem using a nonlinear solver, in reality the problem parameters are temperature dependent, i.e., cV = ρC(T ), κ = κ(T ), making the proposed model in (1) nonlinear. Although this kind of model is beyond the scope of this paper and is left as future work, the built-in nonlinear functionality allows the proposed methodology to be extended seamlessly to cope with the more physically relevant cases of temperature-dependent thermal parameters and large temperature differences in 3-D ICs.
The resulting linear systems, i.e., (2) for steady-state analysis, or the time discretized version of (5) for transient analysis, are large, sparse and symmetric positive definite (SPD). Preconditioned Krylov subspace iterative methods are a natural choice for a solver in such cases. Specifically, the Krylov solver is the conjugate gradient (CG) method [23] with an algebraic multigrid (AMG) preconditioner [24] . The CG method only requires computationally inexpensive sparse matrix-vector products. Moreover, it is based on a three-term recurrence to create the orthogonal Krylov basis, i.e., only three vectors need to be stored at each iteration of the algorithm, giving the CG algorithm linear storage costs. The fixed storage costs make this technique more suitable for solving sparse linear systems than sparse direct methods based on matrix factorizations. To improve the convergence of the CG method, it is preconditioned with the AMG method with standard Ruge-Stüben coarsening [25] . These solves are implemented by calling the software library hypre [26] .
Adaptive Mesh Refinement
Temperatures can vary significantly across the area of a circuit. To accurately model features of the thermal profile, the computational grid must be sufficiently refined locally. For uniformly refined meshes, achieving this level of accuracy requires globally small mesh sizes, resulting in unnecessarily large linear systems that require longer assembly and solve times. To minimize computational effort without losing accuracy, the mesh can be adaptively refined only where the temperature varies significantly, i.e., where there are large temperature gradients. The elements to be refined are determined from computed error estimates.
In the proposed method, these estimates are computed by the Z2 error estimator [27] . The idea behind this estimator is to compute a more accurate representation of the heat flux (or gradient) of the finite element solution, i.e., ∇T h . The more accurate flux is computed by projecting the computed finite element flux function, ∇T h = n j=1 Tj∇φj(x), onto a set of continuous basis functions defined on small patches of the domain. The mesh is refined if the difference in the fluxes is greater than a user prescribed tolerance εmax, i.e., |∇T h − ∇T h | > εmax. In addition, already refined elements can be merged (unrefined) when the error estimate satisfies |∇T h − ∇T h | < εmin, where εmin is the user prescribed minimum error tolerance. This functionality is extremely useful in maintaining accuracy of the solution while keeping a moderate size of the linear systems.
Note that when spatial adaptivity is used, more than one linear system is solved at each time step. The fully adaptive spatio-temporal method first advances the solution to the next time step, i.e., T , where T h now denotes the solution on the adaptively refined grid. This reassembly increases the computational cost per time step, but offers the added flexibility of finding the optimal spatial discretization which produces the solution of the problem with a prescribed level of accuracy.
Adaptive Time Stepping
Another important technique for increasing the computational efficiency of transient thermal simulations is adaptive time stepping. In these methods, the time step size ∆t is determined based on the local truncation error estimate of the time integrator. This is usually done using predictor-corrector methods. Let e denote the local truncation error (LTE) [16] and εt a prescribed LTE tolerance. Then, a standard estimate for computing the adaptive time step of a second order method is
where e 2 is the standard Euclidean norm.
RESULTS AND DISCUSSION
In this section, the features of the proposed thermal analyzer are demonstrated in several numerical experiments. The first pair of experiments demonstrates asymptotic accuracy of the space and time discretization schemes used in this work, when uniform grids or time steps are used, together with the advantages of adaptive refinement or time stepping.
The second set of experiments demonstrates the performance of the analyzer for realistic 3-D circuit geometries that cannot be tackled or solved accurately by the existing solvers. For these test problems, the adaptive spatio-temporal refinement capabilities are applied. All of the experiments are performed on an Intel i7 4790 processor with 32 GB DRAM and the CentOS 7 operating system. The computed thermal profiles are visualized in the open-source software Paraview [19] .
Numerical Experiments
We first review the accuracy and efficiency of the adaptive time-stepping method. To this end, we solve an initial value problem y = (0.25 + 10ı)y, y(0) = 1 over the interval t ∈ [0, 8π]. The real part of the solution is a damped (modulated) oscillatory wave y(t) = e −0.25 t cos(10t). The problem is solved using the BDF2 method, both with fixed and variable steps and the results are summarized in Fig. 2 . We observe a second-order asymptotic behavior of the scheme (i.e. when the step size is reduced by an order of magnitude the solution error norm y(tn) − yn ∞ (maximum error) falls by two orders of magnitude. However, a more important observation is that the adaptive method gives almost an order of magnitude lower error for a similar number of steps (i.e. similar computational cost), or, equivalently, the same level of accuracy is achieved with fewer steps (for this example the same level of the solution error is obtained with half as many adaptive steps, compared to the fixed step case).
The second experiment is designed to demonstrate asymptotic spatial accuracy of the FEM discretization and the advantages of using an adaptive mesh refinement. A steady-state problem (1a) ( ∂T ∂t = 0) is solved with a known analytical solution
2 )/0.05 .
The solution (7) represents a steep Gaussian distribution curve. Such choice is relevant to the thermal IC design where the hotspots appear in small parts of the domain, with the temperature profile flat or slowly changing in large portions of the domain. In this case, Dirichlet boundary conditions are applied on the faces of the unit cube, and the domain is discretized with both linear and quadratic elements on uniform or adaptively refined grids. The accuracy of the solution T − T h ∞ is reported as a function of the discrete problem size in Fig. 3 . As in the case of temporal accuracy, adaptively refined grids produce lower errors for the same discrete problem sizes (i.e. the same level of error can be achieved with fewer unknowns than in the uniform case). The computational times also depends greatly on the type of solver, e.g. direct or iterative, that is used. In the case when no spatial and temporal adaptation is allowed and the linear model in (1a) is considered (i.e. cv and κ are constant independent on T ), the coefficient matrix does not change between different time steps. This situation means that if a direct sparse solver (such as SuperLU) is used to solve the linear systems, the coefficient matrix needs to be factorised only once (at the first time step). This factorisation, which is by far computationally the most expensive part, is reused at the subsequent time steps. This approach is utilized in Hotspot [9] . Transferring this strategy to our method, in the case of a first-order FEM scheme with 250,047 unknowns the factorization of the coefficient matrix takes over 2,500 seconds, while for the second-order FEM with 2 million unknowns the factorization was unable to complete on our target architecture, due to the memory constraints. By contrast, the AMG-preconditioned CG solver takes only 1.28 seconds in the former and 28 seconds in the latter case. However, the caveat here is that the solution phase of a direct solver (the forward and the backward substitution) is shorter that the time required for the iterative solver. Thus, if no spatio-temporal adaptivity is used, a direct solver can potentially outperform an iterative solver if sufficiently many time steps are required to complete a simulation (thus the cost of a matrix factorization is amortized). The iterative solver can be accelerated further in this context by reusing the setup (coarsening) phase over multiple time steps.
However, the main advantage of iterative solvers is the linear scaling of the execution time and memory requirements with the problem size.
When used in conjunction with spatio-temporal adaptivity, they lead to numerical schemes that are robust, versatile, accurate and sufficiently fast.
3-D Thermal Analysis
In this section, two experiments are presented that illustrate the features and functionalities of the analyzer. The first experiment considers a simple case study based on the Nehalem processor, the microarchitecture of which is assumed to be implemented at a 45 nm technology node [28] . To generate the power trace of this microarchitecture, the Sniper multicore simulator 6.1 [29] is used to simulate benchmarks of Splash-2 [30] and Parsec-2.1 [31] . McPAT 1.0 [32] converts the trace to power trace as input for the thermal simulation. Corresponding parameters of the physical structure and thermal coefficients are obtained from [9, 33, 34] . The structure of the circuit in this case study consists of 5 material layers, an active layer with different subdomains defined by the Nehalem architecture, a silicon substrate layer, thermal interface material (TIM) layer, spreader, and a heat sink. In this problem, a simple circuit geometry is assumed so that the computational domain is a rectangular cuboid. These experiments also illustrate the fully adaptive spatio-temporal features of the model. In the second experiment, a similar but more realistic structure is considered. This circuit has the same 5 layer structure, however this time both the heat spreader and heat sink are larger than the active and silicon substrate layers and overhang the circuit. Additionally, the heat sink is endowed with fin structures. This structure highlights the versatility of the analyzer to model complex 3-D structures, which is not available in existing thermal analysis tools.
Rectangular Cuboid Circuit
For this set of experiments, the thermal simulation was first run using a fixed time step size ∆t = 10 −6 s for 500 time steps (t end = 0.0005 s) with a fixed grid, consisting of 25,725 unknowns. This simple, small-scale problem serves as an illustrative example for demonstrating the adaptive capabilities of the proposed analyzer. Each circuit component of the chip is considered to operate at full power capacity when active. Snapshots of the bottom face of the active layer (located in the bottom layer) at time t = 2.5×10 The same simulation is also performed, again with 500 time steps of fixed step size ∆t = 10 −6 s, however for this case the grid was adaptively refined. The refinement parameters are max = 10 −2 and min = 10 −6 . The initial coarse mesh was the same as the first simulation, consisting of 25,725 temperature unknowns. At the final time step, the adaptively refined grid consists of 59,023 unknowns. The adaptively refined thermal map at times t = 2.5 × 10 of the circuit where the power and variation in temperature are highest.
Lastly, the same structure is simulated to a final time t end = 5.0 × 10 −4 s on a fixed grid but with adaptively chosen time steps. The fixed grid is the same as the first simulation, consisting of 25,725 unknowns. The reduction in the number of total time steps taken is presented in Table 2 for two different temporal error tolerances. As the error tolerance is increased, fewer time steps are taken, yielding improved simulation times. The minimum and maximum temperatures are also reported in both cases, illustrating that even with the large reduction in the number of time steps provided by the proposed adaptive time stepping strategy, there is almost no loss of accuracy.
Fin Structure
For this experiment, a realistic circuit geometry is investigated. A thermal simulation is run for 1,000 seconds with fixed ∆t = 1 s on a fixed grid consisting of 72,445 unknowns. The larger time steps and simulation time are chosen to fully demonstrate the vertical propagation of heat in the IC over a longer period of time. This experiment shows the detailed vertical flow of heat in the circuit and package. The results at the end of the simulation are plotted in Fig. 5 . Three views of the circuit are provided. The bottom view in Fig. 5(b) illustrates the formation of a hot spot in the active layer of the chip. The top view of the circuit in Fig. 5(a) shows how the heat from this hotspot has propagated through to the fins of the heat sink. Lastly, a cross-section along the x-direction of the circuit in Fig. 5(c) shows the vertical propagation of the hotspot through the interior of the circuit. Often the heat sink is modelled as several thermal resistors connected in parallel and collapsed at a single node on its outer face. Such a model cannot capture the detail of heat diffusion within the volume of the heat sink, which is depicted in Fig. 5(c) . 
TSV Structure
The analyzer can model detailed heat characteristics of vertical channels. An example of a thermal simulation performed on part of a 3-D circuit with signal TSVs is shown in Fig. 6 for 16 TSVs in 45 nm technology. The simulation is executed with a time step ∆t = 10 −6 s for 500 time steps, where each signal TSV has random switching activity during the thermal simulation. In Fig. 6(a) , the metal filling of the TSVs (assumed to be copper in this example) is shown by the red color and the liner of the TSVs (SiO2 in this example) is depicted by the blue color. Inverters are assumed to drive the TSVs and the load of each TSV is also a 1x drive strength inverter. A cross-section of the resulting thermal map is depicted in Fig. 6(b) . This example demonstrates that the flow of heat within each TSV (which depends on the corresponding switching activity) is captured in detail, instead of treating this part of the 3-D stack as a homogeneous material with an effective (averaged) thermal conductivity (which depends on the number of TSVs within this part of the circuit). Consequently, our tool accurately describes the thermal behavior of any TSV distribution, also enabling appropriate thermal analysis of 3-D ICs.
Complex Structure
An example of the Intel Xeon processor (Nehalem Architecture) in a FCLGA (flip-chip land grid array) package referring to the thermal reference guide [35] is shown in Fig. 7. Figs. 7(a)-7(c) illustrate the assembly of a heat sink with 72 fins and TG (thermal grease), a CPU die, and package components. The thermal simulation for this IC is performed with ∆t = 1 s for 500 time steps. A cross-section cut along x= 7.2 mm through the center of the IC at the final time step is plotted in Fig. 7(d) . The thermal map shows the secondary heat dissipation path of the package-PCB, which further illustrates the capabilities of the proposed analyzer to model complex heat dissipation paths.
CONCLUSION
In this paper, a flexible thermal simulator based on the FEM is presented, which uses fast multigrid preconditioned Krylov subspace iterative solvers.
The preconditioned iterative solvers offer near linear scaling in simulation times as the mesh is refined. Users can easily configure the simulator with the desired 3-D structure of an IC with various physical parameters and import the corresponding power information from microarchitecture simulators and power estimators. By deploying spatio-temporal adaptivity, the proposed method provides accurate thermal profiles within acceptable computation times. 
ACKNOWLEDGMENTS

