In deep submicron technologies, increased standhy leakage current in high performance processors results in increased junction temperature. Elevated junction temperahue causes further increase on the standby leakage current. The standby leakage current is expected to increase even more under the humin environment leading to still higher junction temperature and possibly the thermal runaway. In this paper we investigate the thermal management of lugh performance processors during humin.
Introduction
Stressing during the Bum-in (BI) accelerate the defect mechanisms responsible for early life failures l h d and voltage stresses increase the junction temperature resulting in accelerated agmg. During bum-in elevated junction temperature, causes leakages to further increase hy approximately 34x. In nanemeter regime, this may result into positive feedback leading to the thermal runaway. Figure 1 shows a chip that has gone to the thermal runaway. To avoid thermal runaway, it is crucial to understand and predict the junction temperature under the normal and stress conditions. Junction temperature, in turq is a function of ambient t e m p t u r e (TJ, junction to amhient thermal resistance (R,& and static power dissipation.
Junction Temperature Estimation Procedure
The junction temperature ( 4 ) of an IC is defined as the average temperature of the silicon substrate. The junction temperature or T, is defmed as [2] :
where T, is the ambient or set point temperature, P is the device total power during bum-in, and Rja is the junction-to-ambient thermal resistance. The power dissipation can be subdivided into dynamic and leakage components, as:
In equation 4, C is the total IC switching capacitance and ftog$le is the frequency that is used for node toggling during BI and isexpresedas:
where C, is the gate capacitance of a single gate and N is the number of logic stages in the critical path. To evaluate junction temperature, Tj. unda different environmental conditions, a program and a methodology has been developed (Fig 2) [3, 4] .
Simulation Results
Experimental data of a 32-hit microprocessor implemented in 130 nm dual VT CMOS technology was used to verify the procedure. The pammetm of this progam were calibrated to the experimental data from the microprocessor. As illustrated in the cooled BI ovens Rja of 1.5"CIW Since the total power at BI condition (T,=IIO"C, VDD=1.8 V) for this chip is 66W, l"C/W reduction in Rj. will allow us to perform BI in 66°C higher T. . The results in Figure 3 confirm that the T, is increased from 10°C to 76°C in liquid cooled BI ovens. It should be noted that since the ambient temperature in air-cooled ovens cannot be less than the r w m temperature, it is impractical to hum-in this microprocessor in air-cooled BI oven as at room temperature amhien< the chip will eventually go to thermal runaway.
Simulations were carried with 10% reduction in channel length of the transistors in 130 nm technology to incorporate process variations. This resulted in -3x increase in the sub threshold current and consequently -3x increase in leakage power. Figure 4 illustrates the simulation results of the chip with 10% smaller channel length transistors than nominal value in 130 nm technology. As it can be seen in the Figure 4 , the ambient temperature must be reduced from 76°C to 30°C for liquid cooled BI oven to maintain Tj at 110°C. Since it is difficult to maintain the T. in BI ovens around room temperature, it is necessary to reduce the R, . of the BI oven. The next generation BI ovens are expzcted to have Rja of 0.3"C/W using refrigeration as a cooling , solution and Rja of 0.25"CMr using spray cooling technique as cooling solutio& respectively. With Rj. of 0.25OC/W, this ' processor can be bumt-in in the ambient temperature of 70°C.
Thermal Runaway
As mentioned before, temperature and leakage current are strongly correlated and create a positive feedback mechanism ~ Under BI or normal o p t i n g conditions, designers fq to control : the Tj by removing the heat from the chip. As long as the rate of j heat removal is greater or equal to the rate of heat generation, the ' Tj remains constant at the designed operating point. When the rate of heat generation becomes greater than the rate of heat removal, Tj starts to increase and thermal runaway occurs. Figure 5 shows the transient behavior ofR,.. When the chip is powered on, the %. I starts to increase and reaches to its steady state condition, which as a typical example is 0.6"CIw.
In Figure 6 , straight lines are drawn using Equation 1, which j can be expressed as:
with ambient t e m p t u r e of 35°C. As time increases, the thermal resistance increases from 0 " C N and reaches to steady state value of 0.6"CN. Hence, the straight hnes represent transient behavior caused by changing thermal resistance with time. On the other , hand, the exponential curve is the generated leakage power or ! chip leakage power at a given T. . The intersection of the straight j line (representing the removed power) and the exponential curve (representing the leakage power) represents the steady state Operating condition of the system where removed heat is equal to the generated heat. As long as there is an intersection between removed power curve and chip power curve, thermal runaway will not occur.
In Figure 7 , the leakage power of the chip with nominal leakage and the leakage power of the clup with high leakage due to 10% shorter channel length, versus the Tj are depicted. It can he seen that the leakage power for the nominal leakage chip has an intersection with removed power curve at 1 10°C. The slope of the line is I/O.S"CAN and the T. is 80°C. At higher temperature than 110°C the removed power is larger than the chip leakage power and in lower temperature than 110°C the leakage power is higher than removed power. This means that from any point in the neighbrhood of ll0"C the temperature w i l l return to llO"C, which is the design point for BI condition. On the other hand if we look at the curve for the high leakage chip, we see that there is no intersection between this curve and the removed power cuwe with Rja of O.S"C/W. Since at all temperatures the removed power is less than leakage power, for this particular chip, the BI environment will lead the chip to thermal rnnaway. To overcome the problem the BI environment must he changed. The new environment is shown in the figure with Rj. of 0.25"CAu and Ta of 70°C. From this experimenc it can be concluded that for scaled chips with higher leakage power, the setup for Bl environment must evolve by either reducing one of the T. and Rja or a combination of both of them. This will shifl the removed power curve to the lei? to intenect the leakage curve of generated power for IC at the designed BI condition.
Conclusion
In this paper, we investigated the thamal management of high performance chips in the BI environment. An electro thermal analvsis tool was develowd to analne thermal rnnawav concluded that in order to avoid the thermal runaway, the bun-in environment must be set up such that the chip power at any temperature higher th? BI temperature be less than removed power so the junction temperature converges to the B1 design point temperature. 
