In a microelectronic device, thermal transport needs to be simulated on scales ranging from tens of nanometers to hundreds of millimeters. High accuracy multiscale models are required to develop engineering tools for predicting temperature distributions with sufficient accuracy in such devices. A computationally efficient and accurate multiscale reduced order transient thermal modeling methodology was developed using a combination of two different approaches: "progressive zoom-in" method and "proper orthogonal decomposition (POD)" technique. The capability of this approach in handling several decades of length scales from "package" to "chip components" at a considerably lower computational cost, while maintaining satisfactory accuracy was demonstrated. A flip chip ball grid array (FCBGA) package was considered for demonstration. The transient temperature and heat fluxes calculated on the top and bottom walls of the embedded chip at the package level simulations are employed as dynamic boundary conditions for the chip level simulation. The chip is divided into ten function blocks. Randomly generated dynamic power sources are applied in each of these blocks. The temperature rise in the different layers of the chip calculated from the multiscale model is compared with a finite element (FE) model. The close agreement between two models confirms that the multiscale approach can predict temperature rise accurately for scenarios corresponding to different power sources in functional blocks, without performing detailed FE simulations, which significantly reduces computational effort.
Introduction
The integrated circuits (IC) industry is driven by scaling to smaller and higher performing devices to enable lower cost and higher speed. However, major challenges exist in maintaining performance and reliability while facing fundamental scaling limitations. The current chip and package architectures are subjected to higher density of the heat dissipating elements and elevated total power generation rates. This can result in local hot-spots that are layout and/or workload dependent, leading to significant variation in the performance and leakage current of devices. Moreover, cyclic thermal events as a result of Joule heating in the metallic interconnect and transistors can lead to fatigue failure, due to the thermal expansion coefficients mismatch among different materials in the device. Thus, it is essential to develop fast and accurate multiscale models to calculate the thermal response of circuits for advanced technology nodes.
Various approaches have been proposed in the literature for predicting temperature distributions with sufficient accuracy in chips and packages [1, 2] . Among these multiscale methodologies, the traditional bottom-up approaches are extensively used for transient thermal modeling. Perhaps, the best known of this class of methods is the resistance-capacitance network, which is constructed using thermal impedances [3, 4] . The accuracy of the models decreases for complex geometries, complex boundary conditions, and nonlinearity in the heat conduction equation [5] .
Another common bottom-up approach utilizes compact models, which can be finite volume (FV) or FE based. In a traditional FE or FV analysis, the domain is discretized in a way that each element is homogeneous. It can, however, have anisotropic thermal conductivity. Compact models do not require conventional bilinear rectangular or homogeneous elements and can have elements comprising both metal and dielectric region. Some of the first compact modeling work was done by Kreuger and Bar-Cohen in 1992 [6] . They modeled a chip package with a simplified resistor network and shorter simulation times. However, the network topography of compact models becomes complex with increase in model size, also potentially compromising the accuracy of the model [7] . Another limitation of such compact models is the difficulty in handling fluid/solid interactions. In general, these bottomup approaches have primarily addressed the steady-state Joule heating in interconnects. However, pulsed currents and the resulting transient heat conduction in interconnect arrays remain a key concern in the design for reliability for the next generation highperformance chips.
Top-down approaches are another category of multiscale thermal modeling in microelectronics. A recent approach is behavioral thermal modeling, which is a combination of the generalized pencil-of-function (GPOF) [8, 9] and subspace methods [10, 11] . GPOF was developed in the communications community to estimate poles of an electromagnetic system by solving a generalized eigenvalue problem. These methods are mainly used for highperformance multicore microprocessor design. In general, they potentially suffer from a lack of predictability problems. Therefore, there is a need for the development of a new thermal simulation methodology that overcomes the challenges faced by existing thermal models.
In this study, a novel, computationally efficient, and accurate multiscale reduced order transient thermal modeling methodology is developed, which comprises two parts: (1) progressive zoom-in and (2) POD. The analyses at various length scales are integrated via the progressive zoom-in approach, which is illustrated in Fig. 1 and will be further discussed in Sec. 2.2. POD is a robust and elegant method of data analysis that provides lowdimensional but accurate descriptions of a high-dimensional system. It was first introduced by Lumley [12] in the field of turbulence; Holmes et al. [13] provided a thorough summary for applications of POD in various fields. As shown by Barabadi et al. [14] , for any linear system, the method is capable of predicting 1 transient temperature distribution regardless of the temporal or spatial dependence of the applied heat source. This feature provides the ability to predict temperature distributions for arbitrary heat inputs, by using a smaller sample set of applied heat sources and power maps, resulting in considerably decreased simulation time. Combining POD with the progressive zoom-in approach can further enhance the computational efficiency.
The proposed methodology has the capability of modeling several decades of length scale from package to "chip component" and potentially the "interconnect" (not included here) levels, at a significantly lower computational cost than currently available methods. This characteristic of the method also applies for time scales from seconds down to microseconds, corresponding to various transient thermal events. The suggested approach provides the ability to rapidly predict thermal responses under different power input patterns, based only on a few representative detailed simulations, while maintaining adequate spatial and temporal accuracy.
In this paper, an FCBGA package with an embedded die is considered for thermal modeling. Random dynamic power distributions were considered for the total chip power, as well as for the function blocks that compose the entire chip to demonstrate the capability of the POD method. To validate this methodology, the results were compared with an FE model developed in COMSOL [15] . It is demonstrated that the computational time is reduced by at least two orders of magnitude at every step of modeling.
Hybrid Scheme for Multiscale Thermal Modeling
A hybrid scheme has been developed in this paper, which combines the implementation of POD and progressive zoom-in approach, as summarized below.
2.1 Fundamentals of POD Method. POD offers an optimal set of basis functions, also known as POD modes, which are empirically determined from an ensemble of observations. These observations are obtained either experimentally or from numerical simulation, as in this study. The POD method characterizes and captures the overall behavior and complexity of a physical system by using a reduced number of degrees-of-freedom. This results in a much lower computational cost than a full-field simulation method. The most remarkable characteristic of the POD is its optimality, i.e., it provides the most efficient way of capturing the dominant components of an infinite-dimensional process, with only finite number of basis functions [13] . In developing the POD model, data sets are expanded for modal decomposition on empirically determined basis functions in a way that minimizes the least square error between the true solution and the truncated representation of the POD model. Therefore, it makes the POD method the most efficient method of capturing the dominant components of a large-dimensional system with a finite number of modes [16, 17] .
In this technique, the temperature distribution is determined from the expansion 
where T 0 is the time average of temperature (i.e., the mean vector of the observation matrix), u i (x, y, z) is the ith POD mode, and b i (t) is the ith POD coefficient [14] . A detailed procedure to generate a two-dimensional (2D) POD based reduced order model is provided in Ref. [14] . The primary steps to generate a POD based reduced order model are outlined below:
(1) Generating the observation matrix. As demonstrated in Ref. [14] , the POD coefficients, b i , can be determined by solving the discretized matrix of coupled ordinary differential equations, Eq. (2), using the sixth-order Runge-Kutta method shown below: 
Coefficients A ij , B ij , c i , and q i in Eq. (2) were derived and presented for 2D POD model in Ref. [14] . For this study, coefficients in Eq. (2) were determined for 3D analysis as 
The last two terms on the right-hand side of Eqs. (3b) and (3c) are the boundary terms. If the boundary conditions are homogeneous or insulation, these are eliminated and B ij and c i are simplified to
(4) Generating the POD temperature field.
A sufficient number of POD modes and POD coefficients need to be calculated, which can then be used in Eq. (1) for the determination of the temperature field anywhere in the domain and at any instant of time.
The number of retained POD modes is quite critical in capturing the physics of the problem. It is shown that an insufficient number of POD modes can cause significant phenomena not to be detected [18] . On the contrary, taking too many POD modes can produce unexpected behavior or make the model unstable. The last POD modes are generally associated with low energy terms in the model and have rapid localized fluctuation throughout the domain. If too many modes are considered in the POD reconstruction, the accumulation of these rapid fluctuations results in an increase in the numerical error and can potentially cause the solution to diverge [19, 20] . The energy captured by the ith basis function in the problem is relative to its corresponding eigenvalue, k i . Sorting these eigenvalues in a descending order results in an ordering of the corresponding POD modes [14] . Therefore, the first POD mode captures the largest portion of energy relative to the other basis functions. To determine the truncation degree of the POD method, the cumulative correlation energy, E m , captured by the first m POD modes is defined by Bizon et al. [21] 
To be able to generate a reliable POD model, in the present study, the number of POD modes is determined in such a way that the cumulative energy of the modes, calculated from Eq. (5), is larger than 99.9%.
Progressive Zoom-In Approach.
The progressive zoomin method integrates package and chip level analyses, acquiring the advantages of each. Figure 1 shows a flowchart of the approach used in this study for multiscale transient thermal modeling of a representative FCBGA package. The overall hybrid approach is outlined below:
(1) Thermal simulation at the package level: The first step is to model the entire structure, i.e., the package, including the surrounding mold, underfill, solder bumps, and substrate. This simulation is performed in the commercial code COM-SOL. It is important to note that at this level, the chip is modeled as a solid block with effective material and thermal properties, without considering internal details. (2) Applying POD technique to package level: Once the temperature distribution at package level is determined, a POD model is developed. The POD model provides the ability to predict dynamic temperature distribution for different power maps and types of power sources, without developing any further full-field FE models, which can significantly decrease computational cost and potentially be used to define a criterion for the optimal distribution of the current density in the domain. (3) Transferring the solution from package level to the chip level: Once the temperature distribution at the package level is obtained, a combination of temperature and heat flux at the top and bottom walls of the chip is extracted and linearly interpolated on a 2D grid with higher spatial resolution. These data were then applied as boundary conditions for the chip level simulation. (4) Chip level thermal simulation: At this level, the chip is no longer treated as a solid block. It is divided into subdomains called function blocks. Each block represents a specific component with unique functionality on the chip and consists of three sublayers: (1) top Si layer, (2) middle device layer, and (3) interconnect/dielectric multilayer (see Fig. 2 ). Function blocks were simulated based on the assigned power generation and calculated effective material/thermal properties for each layer within that block. At this level of thermal simulation, the spatial resolution is limited to the sublayers. Once the chip is divided into subdomains, the power map needs to be determined at any instant of time for each individual function block. (5) Continue to the desired resolution on the chip: This method can be continued to multiple levels, such that the desired spatial resolution is achieved. Only representative results for two steps (package and chip level) are presented in this paper.
Results and Discussion
Figure 2(a) shows the schematic of the simplified FCBGA package used in this study for the package level modeling. This model is for low power portable systems, where heat sinks and forced cooling are not employed due to the compact form factor. As described in Sec. 2, the first step is to model the package for which the material properties and dimensions are required. Table 1 lists these for the die, solder bumps, underfill, mold, and substrate. These values were mainly provided by Mentor Graphics Corporation, and the rest were chosen based on Ref. [22] . Reference [23] is used as a guideline for the dimensions of the FCBGA package. Underfill is a specially engineered epoxy that fills the area between the die and the carrier surrounding the solder bumps. Effective density and specific heat of the underfill layer are calculated based on volume averaging. It is assumed that 60% of the surface area between the die and substrate is covered with underfill and 40% is solder bumps. The effective vertical (K veff ) and horizontal (K heff ) thermal conductivity values are calculated based on thermal resistor network formulation
where 8 tot is the entire volume, 8 U and 8 S are volumes of underfill and solder bumps, respectively. Similarly, A hU and A hS are the cumulative horizontal cross-sectional areas of the underfill and solder bumps. K U and K S are the thermal conductivities of the underfill and solder bumps, respectively. Considering that the solder bumps are made of conductive material and electrically connect the chip to the underlying substrate while underfill is an insulating material, it is expected that the vertical effective thermal conductivity of the underfill layer will be significantly higher than its horizontal value. The computed values are 20.2 and 1.47 W/mÁK, respectively. The package dimensions, also listed in Table 1 , were provided by Mentor Graphics Corporation. Natural convection boundary condition is imposed on the top surface and vertical boundaries of the package with a heat transfer coefficient of h ¼ 15 W/m 2 K in the typical range for air cooling [24] . A constant temperature boundary condition is applied to the bottom surface. The initial temperature and the surrounding temperature were assumed to be equal to the room temperature T amb ¼ 300 K.
A detailed FE model is developed in COMSOL using a time step of dt ¼ 0.05 s. The convergence of the FE model is verified with respect to the solver type, time step, and time integration method. The FE model of the package consists of 75,919 elements, of which 343 are for the chip (die). This grid size is determined after performing mesh independence analysis. For the grid independence study, the mesh resolution of the model is continuously refined until there is less than 1% difference in the computed temperatures. This analysis indicated that the grid size of 75,919 elements is sufficient. Total chip power is Q ¼ 3 sin 2pt þ 3 (W), which is applied for 1 s. The temperature rise in the simulation domain is represented by DT (K) throughout the paper. Figure 3(a) shows the spatial distribution of the temperature rise in the FCBGA package extracted from the FE model after 1 s. The temperature rise of the chip is plotted separately in Fig. 3(b) . Table 2 demonstrates the numerical solution parameters and specifications used in package level FE model, POD technique, and chip level FE model.
After obtaining the transient temperature field at the package level, the POD model is developed using the algorithm demonstrated in Sec. 2.1. Twenty-six observations of the transient temperature solution were taken in the first 0.5 s using the package level FE model. These observations correspond to the temperature solutions obtained at different time instants using total chip power of Q ¼ 3 sin 2pt þ 3 (W). It is important to note that the observations are generated only for this case, and results for any different power dissipation are calculated without any new observations. In fact, the POD solutions of these transient thermal scenarios are independent of the initial observations. Essentially, for any linear system, once the solution to a sample case of chip total power is obtained, there is no need to generate new observations or perform full-field FE simulations. The ability of the POD method to predict other cases based on a smaller sample set can significantly decrease computational cost. After the observations have been generated, the POD basis functions (POD modes) are calculated. In order to build a reliable but fast reduced order model, only four POD modes are used in the present model. This is chosen such that the cumulative correlation energy, E m , Eq. (5), is greater than 99.9%. The first two modes alone capture over 96% of the energy. The results will not have the desired accuracy if the number of initial observations, n, is less than the minimum required POD modes (four in the present case).
Since the POD modes are three-dimensional, for better visualization, 2D contours of the first four POD modes at height z ¼ 1.33 mm across the center of the die are illustrated in Fig. 4 . This height is chosen because it has the highest temperature gradient, due to material inhomogeneity and the application of power source only to the die. The POD modes are normalized with the total sum of the modes for a more accurate comparison.
To have a realistic and accurate thermal simulation, a detailed dynamic power map of the embedded chip is required. However, one of the major challenges in microelectronics is the determination of the dynamic power dissipation in the chip, since power values and temperature distribution are coupled in an electrothermal loop. A randomly generated function is assumed for the dynamic chip power in this study to illustrate the application of the POD formulation. Figure 5(d) shows the randomly generated power distribution for the chip for the first 1 s. The minimum and maximum allowed values for the power were chosen to be 3 W and 18 W, respectively. In essence, there are three changes in the nature of the previously used power source (Q ¼ 3 sin 2pt þ 3 (W)) and the current random chip power:
(1) The first case is only applied for 0.5 s, whereas the second case used for POD approach models the entire 1 s. (2) The magnitude of the maximum value for the second case is 18 W versus 6 W for the initial FE simulation.
(3) The temporal behavior of the power has changed from a well-defined sinusoidal function to a randomly generated step function.
The benefit of using the POD model to predict the transient thermal profile for a different power source than the original one is that no new observation or full-field simulation is required. The POD coefficients were calculated as functions of time using the method of Galerkin projection [14] .
Once the POD modes and the b-coefficients are calculated, the transient temperature field can be determined using Eq. (1). Figure 6 (a) displays the 3D spatial distribution of temperature extracted from the POD model at 1 s. For higher precision, the domain is sliced vertically along the XZ plane and four of these slices are presented. The right-most slice is the A-A cross section across the center of the die (Fig. 6(b) ). To validate the results of the POD model, a full-field FE model with a time step of 0.05 s is developed in COMSOL using the same grid points and elements used in the POD model. The results are shown in Fig. 6(c) . It can be inferred that the POD model closely predicts the transient thermal behavior of the system, not only for the given time domain but also for projected future time (>0.5 s) using just a few POD modes. The mean absolute error between the POD and FE model is 7.2% over the entire space and time domain. Required computation time for the fullfield FE simulation is 23.7 mins versus 40 s for the POD simulations. The first POD simulation run-time is 40 s, while additional simulations with different power sources take 15 s each. The computations are performed on a workstation using an Intel (R)
Core
(TM) i7 @ 2.20 GHz with 8 GB RAM. For a more comprehensive comparison between POD and FE results, the time-dependent temperature rise at four different points in the FCBGA package (center of the mold, die, underfill, and substrate) is considered ( Fig. 7(a) ). The maximum error occurs at the center of the die between t ¼ 0.833 and t ¼ 1 s. As illustrated in Fig. 7(b) , this is the time period when the maximum 26.27 mins jump in the total chip power occurs. The dotted arrow in Fig. 7 points to the time of this maximum jump in the temperature plot. After obtaining the transient thermal solution at the package level and with the POD model, the next step in the hybrid scheme is to transfer the solution to the chip with the higher spatial resolution in the form of boundary conditions. Due to the transient nature of this analysis, temperature on the top surface and heat flux on the bottom surface of the die are extracted at ten different time intervals between 0 and 1 s (every 0.1 s). The extracted data are then applied as temporal boundary conditions for the chip level model. The four side walls of the die are assumed to be adiabatic considering the high aspect ratio of the die. The solution is linearly interpolated on a 2D grid with much higher spatial resolution at this level (268,033 elements to model the chip at this level versus 343 elements to model the chip at the package level).
At the chip level simulation, the die is no longer treated as a solid block. It is segmented into ten subdomains called function blocks. In practical applications, each block represents a specific component with unique functionality on the chip. In this study, the blocks were artificially created for illustration of the proposed methodology [25] . As demonstrated in Fig. 2(b) , each block has three layers: (1) top Si layer with the thickness of 0.249 mm, (2) middle layer which is a 5 lm-thick device layer, and (3) interconnect/dielectric multilayer at the bottom with the thickness of 16.72 lm. The third layer consists of 21 sublayers including ten metal layers.
Due to the high level of geometrical complexity, a combination of directional volume and surface averaging methods was used to determine the effective properties of the functional blocks. Table 3 indicates the calculated material properties of the blocks at the chip level simulations. Density and specific heat are calculated using the volume averaging method. In-plane thermal conductivity is determined based on the ratio of the volume of the interconnects to the total volume, due to the fact that the in-plane thermal transport is governed mainly by the interconnects. On the other hand, vias are the dominant paths of through-plane heat transfer in each block. Therefore, for the vertical thermal conductivity, the values are calculated based on the ratio of the volume of the vias to the entire volume of each block. At this stage, the spatial resolution is limited to the sublayers of the blocks. Once the chip is divided into subdomains, the dynamic power grid needs to be assigned to individual function blocks. For this study, the Joule heating produced in the third layer is neglected and the only powered layer is the device layer. Using the same method as described earlier, ten random power sources with minimum and maximum values of 0 and 3 W were generated between 0 and 1 s. The power sources for blocks 1, 2, and 10 are presented in Figs. 5(a)-5(c) as representatives. The block power sources are generated in such way that their sum will equal the total chip power used for the package level simulation as shown in Fig. 5(d) .
After allocating the power sources to the function blocks, an FE model is developed using the time step of dt ¼ 0.05 s for the final step of the hybrid scheme. As mentioned, the model consists of 268,033 elements. The computational time to run the transient simulation for 1 s is 26.27 mins. Figure 8 displays the 2D spatial distribution of temperature rise extracted from the FE solution at various times between 0 and 1 s at height z ¼ 16.72 lm, which is the plane between the device layer and interconnect/dielectric multilayer (plane between layers 2 and 3). Based on the one-dimensional simplified resistance-network model of the chip, it can be seen that the majority of heat generated at the device layer will be dissipated through the top silicon layer and only about 5% of the heat is dissipated through the underlying interconnect/dielectric multilayer; i.e., ðR through Si layer =R through interconnet=dielectric layer Þ $ 0.06.
Summary and Conclusion
In this study, a computationally efficient and accurate multiscale reduced order transient thermal model is developed which has the capability of modeling several decades of length and time scales at a considerably lower computational cost, while maintaining satisfactory accuracy. In particular, by using the proposed model, the computational time is reduced by at least two orders of magnitude at every step of zooming into the geometry. It is also shown that the hybrid scheme accurately predicts the transient thermal behavior of the system for not only the time domain considered for the initial observations, but also for time outside of the specified initial time domain. The mean absolute error between the proposed and FE model is 7.2% over the entire space and time domain.
A distinct benefit of the proposed method is that, for any linear system, the POD solution is independent of the transient power profile. In other words, once the solution to a sample power input is obtained, there is no need to generate new observations or fullfield FE simulations. This important feature can drastically decrease computational cost for parametric numerical simulations, making POD a fast and robust method for reduced order model of transient heat conduction in microelectronic devices. An additional unique characteristic of this model is that the initial observations can be obtained experimentally, which creates the ability of modeling a potentially complex system without generating any numerical model.
The hybrid scheme proposed in this study is not limited to the two levels considered in the present study and can potentially extended from package to "interconnect level." One of the strengths of this method is that the algorithm can be scaled to multiple levels and can be used to simulate more detailed structures on the chip, while taking advantage of the capabilities of POD method to avoid any further full-field simulation. In essence, without losing the desired resolution, the hybrid scheme proposes a new approach to further decrease the computational cost by orders of magnitude.
The integration of the proposed method into the commercially available software packages can create a powerful tool for both academic and industry applications. It will address the lack of physical models for multiscale thermal problems, relating potential performance variation to critical layout parameters. Another possible application of this method would be in the IC design industry. A POD model can be developed for a specific chip structure using output signals of the embedded temperature sensors on the chip as the original observations. By incorporating this model into a closed-loop on-chip control system, the possible locations of hot-spots can then be predicted and potentially avoided. 
