Abstract: Physics-based compact electrothermal macromodels of standard cells are developed for fast dynamic simulation of three-dimensional integrated circuits (3DICs). Such circuits can have high thermal densities and thermal effects often limit their performance. The macromodels developed here use fewer state-variables than a discrete transistor-level implementation while retaining transistor-level accuracy. This results in significant speed-up over transistor-level simulation for large-scale circuits. The macromodel-based methodology enables robust and significantly faster dynamic electrothermal simulation over the long times required for thermal transients to subside. Consequently, transient junction temperature can be examined in the design phase. Simulated junction and measured surface thermal transients are compared.
Introduction
Three-dimensional integrated circuits (3DICs) provide enhanced performance and device density gains [1] beyond that available from technology scaling alone. The 3DIC architecture is one of the potential circuit architectures that overcomes some of the challenges of interconnect scaling by exploiting the vertical dimension [2] . However, the increased volumetric density leads to large heat-fluxes and the 3D stack increases thermal resistances relative to that of conventional 2DICs. These result in increased on-chip temperature which can adversely affect performance, reliability and cost of packaged 3DICs [3] . In order for the technology to be adopted it must outweigh the adverse thermal impacts on design and cost [4] . Management of thermal issues early in the design phase is important for successfully designing 3DICs.
Dynamic simulations are required to determine temperatures when the power profile changes with time, and when capturing thermally induced transient variations of electrical characteristics. Since dynamic thermal simulation is more computationally expensive than steady-state simulation, the use of a computationally efficient simulation methodology is critical. In this work, a fast and accurate dynamic electrothermal simulation methodology built on the standard cell concept is presented.
The majority of publications on electrothermal circuit simulation have considered devices and small-scale integrated circuits [5] [6] [7] [8] . Some publications addressed the electrothermal simulation of very large scale integrated (VLSI) circuits presenting methods that calculated the steady-state temperature distribution [9, 10] but not the transient temperature distribution. Techniques for transient electrothermal simulation [11, 12] were directed at calculating the junction or channel temperature profiles. These techniques addressed transistor-level electrothermal modelling and the computationally efficient modelling of substrate heat conduction.
Over the past three decades two main algorithmic approaches have been pursued for performing the dynamic electrothermal simulation of ICs: the relaxation method and the direct method using simultaneous iteration. In the relaxation method, electrical and thermal simulations are performed separately with temperature updates passed from the thermal simulator to the electrical simulator and power updates passed from the electrical simulator to the thermal simulator [5, 13] . These updates occur at intervals much longer than the time duration of electrical transients, for example, hundreds of clock cycles or more for a digital circuit. Typically in the direct method [11, 14] an electrical circuit model of a thermal system is created based on a thermal -electrical analogy [15] . The electrical and thermal networks are solved simultaneously as if they were one large electrical circuit. The relaxation method is easier to implement as existing electrical and thermal simulators can be used directly, but accuracy of this method cannot be assumed in strongly coupled thermal problems [14] . Furthermore, very fast thermal changes cannot be considered [11] . The direct method requires a more complex physically consistent implementation than does the relaxation approach but is capable of handling very fast changes [16] .
In this paper, a dynamic electrothermal simulation methodology for VLSI circuits is proposed which is computationally efficient with transistor-level accuracy, and also has the ability to calculate the transient profile of channel or junction temperatures. The ability to handle fast changes is important for physically consistent dynamic electrothermal simulation. Here the direct method is used; however, an electrical analogue of the thermal system is not required. The proposed simulation methodology is implemented in the multi-physics simulator fREEDA TM [17] , which has the capability of simultaneously solving native electrical and thermal networks [18] . In the conventional direct approach there is the potential problem of mixing of electrical and thermal currents. This problem has been addressed by introducing the concept of local reference terminals [19] , with the result that the electrical and thermal problems do not need to be collapsed into a single network. Also, it is not necessary to artificially establish a single global ground as required in conventional circuit simulation of large distributed integrated circuits.
Section 2 describes the cell-based electrothermal modelling methodology. Three inter-related model-order reduction topics are covered in this paper: (i) Section 3 describes nonlinear model-order reduction using cell-level macromodels; (ii) Section 3 also describes electrothermal modelling of macromodels and (iii) Section 4 describes a full-chip 3DIC electrothermal simulation environment that achieves further lossless model-order reduction by partitioning linear and non-linear elements. Supporting this work is an example in Section 3 in which a NOR cell is modelled. Section 5 compares dynamic electrothermal simulations of a 3DIC with measured responses. The impact on stability of simulation and an integrated approach to electrothermal simulation of 3DICs is discussed in Section 6.
Dynamic electrothermal simulation methodology
The standard cell-based design methodology is the most widely used method of designing application-specific integrated circuits (ASICs). In this paper, compact electrothermal macromodels of standard cells are developed using the procedure described in Fig. 1 .
The first step is extraction of the compact electrical network of each standard cell in the standard cell library for a particular process. The electrical network of a standard cell is typically a SPICE netlist with each transistor, circuit element and parasitic modelled as a SPICE primitive. This network has a large number of state-variables including Fig. 1 Flowchart of macromodel-based dynamic electrothermal simulation methodology terminal voltages and charge variables. In the work presented here, a standard cell is considered as a single primitive. Statevariables for exponential non-linearities are parameterised so that all strong non-linearities are rendered as moderate nonlinearities [20] . Thus convergence of large networks is straight forward and continuation (or homotopy) methods are not required. The result is that the compact macromodel of a standard cell is created using transistor-level models but fewer state-variables are used than would be required with transistor primitives. This leads to runtime improvement over a transistor-level implementation. In developing the electrothermal macromodels all of the temperature-dependent device parameters are made functions of temperature. The macromodel has external electrical terminals, including an electrical reference terminal, one external thermal terminal and an internal global thermal reference terminal which is set to 0 K. The second step is the creation of the thermal network from the layout of the standard cells. A physical extractor, known as WireX [21] , is used to create resistive mesh-based thermal network from an OpenAccess layout. WireX discretises the full 3D structure of the circuit into volume elements. Each volume element is represented thermally with six resistors which are calculated based on the weighted average of the thermal conductivities of the materials which comprise that volume element. The resistors connect each edge of the volume to a central node where there is a thermal capacitor to thermal ground. The layout thus extracted is a quasi-3D implementation of the standard cell with transistors and electrical and thermal parasitics arranged on a 2D plane with layers of metallisation and vias. The instantaneous heat generated in the cell is the electrical energy dissipated which, for complimentary metal oxide semiconductor (CMOS), occurs mainly at switching events. Thus both the heat-flux and temperature have high-frequency components which, in the macromodel approach, is smoothed out in time before interfacing with the chip-level thermal network. This occurs because of the finite standard cell thickness and the resistive-capacitive (RC) thermal network that models it. The thermal network connecting the thermal terminals of each standard cell in the 3D ASIC is similarly extracted. The third step is building the target application circuitry using these standard cells.
Compact electrothermal macromodel
This section describes the development of the electrothermal macromodel of a CMOS NOR standard cell.
Modelling scope
The compact electrothermal macromodel of a standard cell is developed using a global modelling approach in which voltages and currents are expressed as functions of state-variables, their derivatives and time-delayed state variables [22] 
In (1), v NL (t) is a vector of voltages with respect to the electrical reference terminal of the macromodel, i NL (t) is the vector of currents at all external terminals other than the reference terminal, x(t) is the vector of state-variables and x D (t) is the vector of time-delayed state-variables. The constitutive equations of devices used in the macromodel define the functions in (1). In the usual compact modelling approach, as used in SPICE-like simulators, a device must be described by a model in the form of currents as a function of state-variables which are the terminal voltages plus additional quantities such as charge. Thus the mathematical model omits derivatives of state-variables above the first order, and omits delayed state-variables. This SPICE-like approach makes it difficult to electrothermally model standard cells where the state-variables could include temperature, and when delay is important.
In the electrothermal macromodels developed in this work, temperature is a state-variable and temperature-dependent device parameters, for example, mobility and threshold voltage, are formulated as a function of temperature. This captures the coupling of the thermal world to the electrical world. Coupling of the electrical world to the thermal world is via the electrical power dissipated represented as a heatflux (i.e. heat current) source. In the electrothermal macromodel, there could be one or more heat-flux sources with the heat-flux being the product of the supply voltage and the instantaneous supply current. For large cells multiple heat-flux sources need to be considered with the standard cell partitioned into regions each with a thermal terminal. The procedure for developing the electrothermal macromodel is outlined as follows: It is envisioned that macromodel development will remain a manual process and be included as part of the process design kit or library of standard cells.
Electrothermal NOR macromodel
The development of the electrothermal macromodel of a NOR gate is used as an example of macromodel development. The schematic of the electrothermal NOR macromodel is shown in Fig. 2 . In this macromodel the electrical constitutive equations are formulated in terms of four state-variables with the electrical state-variables being
The complete electrical transistor-level implementation in fREEDA (using the EKV MOSFET model [23] 
One of the constraining issues in conventional model development using direct coding is that the derivatives of constitutive equations must be implemented. The alternative here uses the Sacado [24] automatic differentiation package, which calculates perfect derivatives. Thus model development complexity is minimised. Model development proceeds as follows: The bulkreferenced drain, gate and source voltages of each transistor are represented in terms of the state-variables given in (2) . For the NMOS devices, assuming source and bulk terminals are at the same potential, the voltages are
In the NOR schematic, Fig. 2 , the VDD path has two PMOS transistors in series. In the macromodel formulation, the internal node 'x ' is eliminated by replacing the two series PMOS transistors with a single equivalent transistor with equivalent transconductance
, where b 1 and b 2 are the transconductances of the individual devices. The higher voltage of the two PMOS gate voltages is taken as the equivalent gate voltage. Depending upon the input voltages, it can be assumed that one of the PMOS transistors is conducting, that is, acting as a resistance, and the current through the other PMOS transistor is flowing through the series chain. That is
where i p is the dynamic current flowing through the PMOS transistors, and Q dP1 and Q dP2 are the drain charges of P1 and P2, respectively. Similarly, the transistors in parallel can also be merged into one equivalent transistor. In the case of parallel PMOS transistors, the smallest gate voltage (and in the case of parallel NMOS transistor the largest gate voltage) is taken as the equivalent gate voltage thus reducing the number of internal nodes. The next step is formulating the static and dynamic current entering each external terminal of the NOR gate. The output current I out shown in Fig. 2 has DC and dynamic components. The dynamic current is calculated by taking the time derivative of the corresponding charge so that
In (6), I N1 and I N2 are DC drain currents, and Q dN1 and Q dN2 are the drain charges of N 1 and N 2 , respectively. I P is the DC drain current of the equivalent PMOS transistor, and i p is the transistor's dynamic drain current given in (5) . The current at the VDD terminal is
where Q VDD is the source charge of P2 if x [2] . x [1] , otherwise it is the source charge of P 1 . The currents at the input terminals, assuming that the DC components are zero, are
where Q gN1 and Q gN2 are the gate charges of N1 and N2, respectively. Similarly, Q gP1 and Q gP2 are the gate charges of P 1 and P 2 , respectively. The DC current and charge in (5) - (8) are now formulated as functions of the state-variables
Electrothermal coupling is realised by formulating the longchannel threshold voltage, transconductance parameter, longitudinal critical field and the bulk fermi potential as functions of the temperature state-variable x[4] based on the EKV MOSFET model equations [23] . Finally, heat-flux I heat is equivalent to instantaneous power which is described as
In (10) the time derivative of charge is calculated as the product of the voltage derivative of charge (calculated using Sacado [24] ) and the time derivative of voltage. The time derivative of voltage comes from the time-discretisation method used in transient simulation and hence is a numerical derivative. The macromodel is fully parameterised in terms of process and geometry parameters using the complete set of EKV model parameters. For comparison purposes a physically consistent electrothermal EKV MOSFET transistor model was also implemented. The simulated transient characteristic of the electrothermal NOR macromodel switching at 250 MHz is compared with electrical and electrothermal transistor-level simulation in Fig. 2 using the default parameters of the publicly available 0.15 mm EKV 2.6 MOSFET model. Considering the transistor-level electrical simulation results, it is seen that the rise and fall times are increased when instantaneous thermal effects are considered. The macromodel-level electrothermal simulation waveform varies slightly from the electrothermal transistor-level waveform. This is due to the elimination of the internal node 'x' between the series PMOS transistors and hence not all of the PMOS transistor charges are considered when calculating the dynamic current through the series chain. Better agreement would be obtained retaining node 'x' but the macromodel results are sufficiently accurate to justify the computational advantage of eliminating the internal node. Fig. 3b shows fast temperature transients which correspond to electrical switching where most power dissipation and thus heat-flux occurs. The channel temperature profile shown in Fig. 3c for the electrothermal macromodel has a long-term temperature rise that is slower than an exponential and described by a fractional calculus model [25] . This is known as the long-tail effect referring to the long time it takes until steady-state conditions are reached in a dispersive medium. The increase in the rise and fall times would not be observed if the relaxation simulation method were used as the temperature would be fixed for the duration of perhaps many pulses. The thermal network, and specifically the thermal capacity, provides an additional energy storage mechanism; and this effect is captured in physically consistent electrothermal modelling.
Electrothermal simulation
The dynamic electrothermal network of an IC consists of a large number of linear elements including electrical parasitics and a large number of thermal elements (primarily thermal resistors and capacitors). The number of linear elements greatly exceeds the number of non-linear elements. In this work a partitioned state-variable transient analysis based on time-marching integration methods is used. This scheme is particularly efficient when the number of linear elements is large. The reduced-error formulation for this analysis method [22] is
where v NL and i NL are vectors of currents and voltages of the non-linear elements, x n is a vector of non-linear statevariables and s sv,n , M sv are time-invariant matrices defined as
In (12) G and C are modified nodal admittance matrices of equal rank n m , where n m is equal to the number of nonreference terminals in the circuit plus the number of additional required variables. The stamps of linear conductors are stored in G and the stamps of linear dynamic elements such as capacitors and inductors are stored in C. The contributions of independent sources are contained in the vector s f , b n21 is a vector of size n m that captures the history of the state-variables and derived from the compact model discretisation, T is the incidence matrix and M sv is constant if the time step (1/a or 2/a depending upon the integration method) is fixed. Thus M sv,n only changes if the time step changes. Equation (11) represents an algebraic system of nonlinear equations of size n s × n s which is solved using quasiNewton methods. This involves factorisation of a Jacobian matrix of size n s × n s at each iteration to solve the non-linear system of equations. This is in contrast to the method used in conventional circuit simulators, for example, SPICE, which decomposes large sparse matrices (of size n m × n m ) at every iteration to solve a non-linear system of network equations. The number of non-reference circuit elements (n m ) grows with the thermal network size and this slows down traditional circuit simulation. The simulation time of partitioned statevariable-based transient analysis is mostly dependent on n s since (G + aC) (a sparse matrix of size n m × n m ) is factorised only when the time step changes while a much smaller Jacobian matrix of size n s × n s is factorised at each non-linear iteration (typically the time step is not changed).
Since in electrothermal simulation the number of linear elements is usually much higher than the number of nonlinear elements (n s ≪ n m ), the partitioned state-variable-based transient analysis can have advantages. The Jacobian matrix, however, is a dense matrix and its factorisation requires O(n 3 s ) operations, whereas factorisation of the sparse (G + aC) matrix requires O(c 2 n m ) operations, where c is the number of non-zero elements [26] . The number of state-variables n s is reduced here using macromodels of standard cells. For the 3DIC considered in the next section, partitioned state-variable transient simulation enables a high-resolution thermal network to be efficiently used since the number of linear elements has little effect on runtime as computations required to handle the linear network occur primarily at initialisation.
The runtime of the macromodel and equivalent transistorlevel implementations are compared by running the electrothermal transient simulation in fREEDA for 10 ms with a step size of 1 ns. The simulations were performed on a 3 GHz Intel Xeon server with 32 GB of RAM. Table 1 presents the number of state-variables and transient simulation runtime of various designs comparing the runtimes of macromodelled and transistor-level electrothermal simulations. The data in Table 1 shows that the macromodel implementation reduces the number of state-variables and results in a faster simulation. Results show that as the design size increases, the speed-up owing to macromodelling increases. The D-flipflop and Shift Register macromodels reduce the number of state-variables approximately by the same factor, however, with the D-flipflop runtime is reduced by O(n 2 s ), whereas with the Shift Register runtime reduction is O(n 3 s ). The explanation for this is that for small circuits, the Jacobian factorisation time, O(n 3 s ) is not a major bottle neck in runtime, whereas the opposite is true for large circuits and thus a greater speed-up is obtained by reducing the number of statevariables. Moreover, speed-up is also dependent upon design type. For example, the SR-Latch and 2-input NOR macromodel reduces the number of state-variables by the same factor. However, the SR-Latch is a larger design than the 2-input NOR, but it shows less speed-up. This is because the SR-Latch has feedback, requiring more analysis iterations to converge at each time step in macromodel implementation. The accuracy of the electrical simulations is compared to a transistor-level implementation in Table 2 for various standard cells. In the worst case the propagation delay error is less than 10% and this could be reduced with more attention given to macromodel development. 
Electrothermal modelling of a 3DIC
This section presents the electrothermal simulation of a 3DIC [27] fabricated in the 180 nm FDSOI MIT Lincoln Laboratory 3DIC through-via process [28] . In this process three fully depleted silicon on insulator (SOI) wafers (tiers) are bonded together and integrated using 1.5 mm diameter tungsten-filled through-vias. The bottom tier is supported on a 675 mm thick silicon substrate. Heat conduction from the upper tier to the lower tier is through interlayer dielectric (SiO 2 ) and 3D through-vias. The 3DIC die is 3.76 mm × 3.76 mm with the floor plan shown in Fig. 4a [27] . Each tier is divided into a 3 × 3 array of identical tiles. Each tile consists of a chain of 15 frequency multipliers followed by 10 frequency dividers as shown in Fig. 4b . These frequency multipliers and dividers are asynchronous circuits and communicate using a handshaking protocol. The multiplier -divider interface is switching at 1 GHz approximately. In measurement this high frequency interface shows a significant rise in temperature and is the hottest region or hotspot. The thermal profile of this interface is examined below.
Hotspot modelling
The circuits in each tile of the 3DIC chip are modelled using electrothermal macromodels of standard cells. Mostly there are three types of circuits or cells: a token generator, a frequency multiplier and a frequency divider. The token generator cell is a signal generator which is modelled by an ideal pulse source. Behavioural electrothermal models of the frequency multiplier and divider were constituted using electrothermal macromodels of standard cells. The macromodels were sized to match the power consumed in a transistor-level simulation of the cell. The frequency multiplier circuit used electrothermal NAND macromodels and delay elements as shown in Fig. 5a . A thermal netlist was extracted from the frequency multiplier layout with a 
Simulation results
Fig . 6 shows the steady-state thermal image of the die obtained from thermal measurements. The die was thermally imaged using an InfraScope TM mid-wave infrared (25.4 mm) microscope from Quantum Focus Instruments Corporation, which employs emissivity correction for different materials. Thermal measurement images were captured at a resolution of 1.6 mm. While electrically disconnected, the die was raised to a temperature of 808C and imaged as a reference. The die was again imaged while running with emissivity corrections applied from the reference. More experimental setup details for 3DIC measurements are given in [29] . There are nine hotspots with each hotspot corresponding to a multiplier -divider interface. The hottest point on the die is the multiplierdivider interface on Tier C of tile (3, 3) on the bottom right-hand corner. Each of the tiers has a stand-alone multiplier -divider circuit identical to the circuits on the tiers above and below. The vertical section of this tile comprising all three tiers was modelled using electrothermal macromodels of cells. The transient junction or channel temperature of the transistors near the interface was extracted from simulations and compared to surface (3, 3)] with the measured surface temperature. The surface temperature was found by thermally imaging the top most passive layer of the 3D stack. The measurement result shows a 7.588C rise of the surface temperature and simulation predicts a 8.168C rise of the junction temperature. The passivation layer spreads the heat and so the surface temperature can be expected to be less than the temperature of the active devices. The substantial time to reach thermal steady-state implies that load profiles of a digital circuit, which is used in resistive thermal analysis, may not be a good predictor of thermal conditions. The junction temperature of transistors near the hotspot rise by 6.288C on Tier B and 2.618C on Tier A. As expected the hottest tier is Tier C which is farthest from the heatsink.
The temperature rise alters the switching frequency of the multiplier-divider interface. Fig. 8 shows the electrical-only and electrothermal response at multiplier-divider interface after simulating the entire multiplier-divider chain. The frequency at the interface decreases by approximately 48 MHz in 0.5 ms due to thermal effects. This enables the transient variations in the frequency to be captured.
The temperature rise here seems modest and this is because only a small portion of the circuit operates with a clock speed of 1 GHz. Multiplier stages preceding the final multiplier stage switch at frequencies that decrease by factors of two the further they are away from the final stage. A similar situation exists for the divider stages. However, the ability of the macromodel-based thermal simulation to capture dynamic thermal characteristics has been solidly demonstrated.
Discussion
In fREEDA, a multi-physics system is split into linear and non-linear components and the error at the interface of these parts is minimised. This partitioning is a lossless form of model-order reduction. Splitting the circuit between linear and non-linear parts is a significant advantage when it comes to scaling up to very large circuits because it allows the error function to be formulated at just the interface between the linear and non-linear components rather than considering every terminal or node of the circuit. The latter can result in a large system of equations which need to be solved on each Newton-Raphson iteration of each time step. This circuit splitting approach was initially developed primarily for Harmonic Balance analysis [30] . This partitioning approach enables reduction of the linear matrix using Kron's method [31] a form of lossless model-order reduction akin to finding the minimum Thevenin equivalent network representation of the linear network. This model-order reduction introduces no approximations and has no impact on the stability of transient analysis. fREEDA also has a more traditional variable time step analysis that does not divide the circuit into linear and non-linear components (like SPICE) and both approaches are stable for the different kinds of the circuits we have considered.
Previously authors in [32] presented a scheme of solving electrical and thermal parts of a network separately using a framework of direct-coupled electrothermal simulation. In this scheme the thermal network is not solved at all steps of the iteration. In the fREEDA framework electrical and thermal networks are solved at each time step. The major simulation speed-up was achieved through the use of macromodels which reduces the number of non-linear statevariables. Such macromodels could be incorporated in any circuit simulator as new elements. The macromodels enable a circuit representation of a block with a large number of state-variables as a block with fewer state-variables while preserving input -output behaviour. The standard cell-based electrothermal simulation methodology and macromodel formulation presented here can be a vehicle for full-chip transient electrothermal simulation. The current implementation of the simulation methodology is best suited to transient simulation of hotspots present in a circuit.
In this paper the electrothermal simulation methodology presented is generic and is also applicable to 2DICs. Currently virtually all 3DIC designs are wafer or die stacked 2DIC designs. One of the challenges of electrothermal simulation specific to 3DICs is that for accurate results the entire stack of dies in a 3DIC needs to be simulated at once rather than individually. The integrated approach to simulation is important because it considers the impact of heat sources on one tier on other devices present not only on that tier but also on other tiers. The standard cell-based technique allows for an iterative approach to identifying the heat sources on a tier of a 3DIC and distributing them to minimise their impacts not only on that tier but on other tiers as well. In the approach presented here, the whole 3DIC is extracted as various volume elements and the calculation of the thermal network associated with one volume element takes into account the materials, including interconnects, comprising the volume element. The heat source in one volume element always corresponds to a heat source in a standard cell. Since standard cell macromodels are based on transistor equations, the heat current is calculated automatically on each time step, external artificial current sources are not required.
Conclusions
In this paper a dynamic electrothermal simulation methodology based on macromodels of standard cells was presented. Physically consistent electrothermal macromodels of various standard cells including an inverter, NAND, NOR, XOR, Latch, Adder and D-flipflop were developed. These macromodels were based on transistor equations and a technique was presented that requires fewer state-variables than equivalent transistor-level implementation. This results in greater computational efficiency regardless of the approach to time-marching circuit simulation. A technique was presented for efficiently handling the large linear network that results from thermal network extraction. The macromodel-based simulation methodology was able to predict the transient junction temperature profile of a 3DIC. It was shown that it is possible to model the dynamic thermal profile of a 3DIC at buried devices where it cannot be imaged. The proposed methodology can also be used in modelling thermally induced transient variations in the electrical characteristics of the signals.
8 References
