In In 
Introduction
Non-ideal CMOS technology scaling results in everincreasing power density, especially local power density. With the shift towards a multi-core and even many-core design paradigm, it is likely that the total power will also rise. This is caused by the increased number of cores, the improved circuit delay and higher frequency, the relatively constant die area due to manufacturing yield, and the slow supply voltage scaling. Since all the power consumed is eventually dissipated in the form of heat, the increase of both power density and total power present severe thermal challenges to chip designers. The dissipated heat has to be efficiently removed by a thermal package to prevent circuit malfunction and thermal runaway.
Thermal packages are expensive, especially for highperformance processors, making package design for the worstcase thermal scenario prohibitive. Instead, designers usually address the typical case. As a result, to prevent thermal emergencies, dynamic thermal management (DTM) [1, 16] has become indispensable for modern thermally constrained chips. On the other hand, engaging DTM introduces performance penalties, so there is a tradeoff between the cost of the package and the DTMinduced performance degradation.
Obviously, accurate thermal characterization of the silicon die is crucial for the development of efficient thermal solutions. Compared to temperature estimations using design-time thermal models, runtime thermal characterization is preferable because it offers direct measurements with the chip running actual workloads in realtime. Recently, IR thermal imaging [7, 12] has been gaining popularity in both industry and academia for characterizing thermal behavior. It has been used to design DTM and temperature sensor placement and sampling rates. It provides unprecedented runtime real thermal measuring resolutions both in time and space. Unfortunately, IR thermal imaging also needs peculiar cooling configurations, because metal is not transparent to IR. For the infrared light emitted from the silicon to reach the IR camera, one technique is to expose the IR-transparent silicon die by removing package components, such as metal heatsink and spreader, and adding an IR-transparent liquid flow over the bare die to carry away the dissipated heat. This leads to drastically different transient behavior from a heatsink and its thermal characteristics need to be closely examined.
In this paper, we investigate the thermal characteristics of two different cooling configurations-forced air flow over a copper heatsink (AIR-SINK) vs. laminar oil flow over bare silicon die (OIL-SILICON). The former is now the prevailing cooling solution for high-performance processors, while the latter is generally used for real-time infrared (IR) thermal imaging of the silicon since silicon and certain oils are both transparent to IR. We modify the HotSpot thermal tool [16] to model OIL-SILICON and add the secondary heat transfer path that spans C4 bumps in flip-chip packages, package substrates, solder ball and printedcircuit board (PCB) for better accuracy.
This paper makes the following contributions:
1. We show that the presence or absence of a heatsink fundamentally changes the thermal behavior of the die, leading to significantly different temperature-aware design implications. The purpose of this paper is to differentiate the roles of IR and simulation. The thermal design process cannot just rely on IR measurements. Simulation and IR measurement are both needed and are complementary techniques. Even after a chip is manufactured, the design of the software interface to the hardware thermal facilities must take into account the spatial and temporal differences created in IR measurement.
We modify HotSpot to model OIL-SILICON
This paper is organized as follows. Section 2 provide some background and the related work. Section 3 shows the modifications of HotSpot to model the oil flow and the secondary heat transfer path. Section 4 characterizes and compares transient and steady-state thermal responses of AIR-SINK and OIL-SILICON. Section 5 presents architectural implications. Finally, Section 6 summarizes the paper and points out future work.
Background and Related Work
This section provides a taxonomy of different cooling mechanisms as well as a taxonomy of thermal characterization methods.
Chip Cooling Mechanisms
Depending on the cooling mechanism, popular thermal packages fall into one of the following categories:
• Convective Cooling: Either air or liquid can be used as coolant. The coolant flow can be forced or natural. Examples are forced air cooling over heatsink as in most desktop systems; natural convection for low-cost chips without a fan and a heatsink; forced water cooling in overclocked systems and some server systems; forced liquid cooling over bare silicon die for infrared runtime thermal characterization [7, 12] ; microchannel cooling in more thermallyconstrained chips [8] , etc.
• Phase-Change Cooling: Here, heat is first absorbed to change the phase of the coolant (the liquid-to-gas phase change) and later removed by a heat exchanger to the ambient, and the coolant phase is changed back as a result of either the drop in temperature or the presence of a compressor (the gas-to-liquid phase change). There has to be a circulation mechanism to repeat the phase-change process. Examples are heatpipes in most laptop computers; droplet impingement spot cooling on hot chip surface [5] ; refrigeration in data centers; phase-change heat spreader to achieve more uniform temperature across the die [15] , etc.
• Thermoelectric Cooling (TEC): Thermoelectric cooling uses the Peltier effect to pump heat from on-chip hot spots to cooler regions by consuming electric energy. Some PC overclockers use TEC to provide additional cooling for a CPU with another power supply unit.
In this paper we focus on convective cooling, leaving other modes as future work. In particular, we compare forced air cooling over a copper heatsink and forced oil cooling over a bare silicon die. These two cooling methods are used in either designtime or run-time thermal characterizations and the development of thermal management techniques.
Chip Thermal Characterization
Thermal characterization falls into two categories: designtime and run-time.
• Design-time silicon thermal characterization is usually achieved by simulations from full-chip thermal models. These models are especially useful in early design stages where the final design is not yet available. They also let the designer freely explore hypothetical designs without expensive prototypes. There has been abundant work on designtime full-chip thermal models, such as [11, 16, 17, 18] . However, existing studies do not provide flexibility on thermal package modeling, if thermal packaging is modeled at all. Usually, the only configuration is forced air convection with a heatsink. But there are now many other cooling configurations that are increasingly used in the highperformance computing community.
• Run-time thermal characterization is preferable because it offers direct measurements with chips running actual workloads, whereas design-time thermal models' accuracy is affected by many other factors such as power estimation, etc. Ideally, design-time models should be validated by real measurements. There are also existing studies on run-time thermal characterization methods. For example [10, 14] provide insights into using on-chip temperature sensors. Most modern high-performance processors, e.g. the IBM POWER series processors [6] , have multiple integrated temperature sensors deployed across the entire die. However, due to the discrete nature of sensors, actual hot spots may be missed. Another runtime thermal characterization method is IR thermal imaging [7, 12] . While this technique can capture the detailed thermal map in real time, the cooling configuration is significantly different from the chip's thermal package in normal operation. For example, in [12] , an AMD Athlon64 chip was cooled by an IRtransparent oil flow, whereas its actual package will be a heatsink with forced air convection. While the real-time IR thermal characterizations can be used to derive power consumptions of microarchitecture blocks as discussed in [12] , this approach can be problematic if IR measurements are directly used to predict what happens when the die is put in its actual thermal package during normal operation. The limited sampling rate of the IR camera may also filter out high-frequency transient thermal fluctuations and miss thermal violations. Therefore, it is important to closely examine the runtime thermal characterizations. This also suggests that runtime thermal characterizations have limits in developing DTM policies for normal operation. Care must be taken when interpreting these measurements.
In this paper, we model more thermal package configurations such as liquid or air forced convection with different flow directions. Among them, one is of particular interest-direct oil cooling of the die without heatsink [12] . This is the only cooling configuration we can find in the literature that has the detailed information needed to make our analysis possible. By using an IR transparent oil that flows directly over the back of the die, frames of silicon transient temperature maps can be recorded by an IR thermal camera.
Dynamic Thermal Management
There also has been extensive work investigating microarchitecture thermal management techniques, for example [1, 13, 16] , to just name a few. However, all these previous papers assume a fixed thermal package and do not consider how a different package choice would affect their techniques.
There are also a few studies that closely relate to our work. For example, IR thermal measurements have been used for the evaluation of general tradeoffs of different dynamic thermal management techniques [4] ; and [9] uses IR thermal measurements to guide the thermal sensor placement and calibration. Readings from the thermal sensors are then used to extract useful information about on-chip variations. It would be interesting to see how the observations in this paper would affect the results in [4] and [9] .
In this paper, with the improved thermal model, we show that the choice of thermal packages produces drastically different thermal response of different parts of the die. The thermal package alone can change the choice of key thermal management parameters such as DTM engagement duration, thermal sensing frequency and granularity, and on-chip thermal sensor placement. Therefore, being able to explore the tradeoffs among different package choices at design time is desirable. The research presented in this paper suggests another interesting dimension in the design space that chip architects can explore-the thermal package choice. Because of the distinct thermal behavior of the die caused by different packages as we show in the following sections, chip architects have yet another design knob to tune for temperature-aware microarchitecture design.
Modeling the Oil Flow
In order to investigate the impacts of the two different cooling mechanisms, we modify the HotSpot thermal modeling tool. One major extension is to model the IR-transparent oil flow over a bare silicon die, which is similar to the cooling setup in [12] .
OIL-SILICON Model Details
The default thermal package configuration in HotSpot is forced convection over a copper heatsink, the heatsink is then attached to a copper heat spreader, which is attached to silicon die via a thin layer of thermal interface material. The secondary heat transfer path is not modeled, because negligible heat is removed through that path. However, if the spreader and heatsink are removed, and a laminar viscous IR-transparent oil flow with low thermal conductivity is directly applied to the back of the silicon die, as in the case of IR thermal measurements, the heat transferred through the secondary path is no longer negligible. Therefore, we added both the oil flow and the secondary heat transfer path, as shown in Fig. 1 .
The oil flow is modeled as follows. The overall convection thermal resistance at the oil-silicon boundary can be found by [3] :
where h L is the equivalent overall heat transfer coefficient, and A chip is the entire silicon area. Furthermore, h L can be calculated for laminar flow over a smooth flat surface:
Here L is the length of silicon along the flow direction, k is the thermal conductivity of the oil, Pr is the Prandtl Number of the fluid, and Re L is the overall Reynolds Number of the flow. The transient effect of the oil flow also needs to be modeled. The overall effective thermal capacitance of the oil can be calculated by:
where ρ is the density of the oil, c p is the specific heat of the oil, δ t is the equivalent thermal thickness of the oil that contributes to heat transfer. Close to the oil-silicon interface, there is a thin layer of oil that is relatively hotter than the rest of the oil flow. This is governed by the continuity across the oil-silicon boundary. Oil temperature at the boundary is the same as silicon temperature, and gradually decreases over a thickness δ t to the ambient temperature of the free flow. According to [3] , the thermal boundary layer thickness (δ t ) can be calculated by
We also add the secondary heat transfer path, which includes on-chip interconnect layers, C4 bumps, package substrate, solder balls and printed-circuit board (PCB). These layers are added in the same way as the existing layers (i.e. silicon, interface material, heat spreader and heatsink), with the PCB layer connected to another oil layer modeled similar to the oil layer over the silicon, as illustrated in Fig. 1 .
OIL-SILICON Model Validation
We validate the oil flow model with ANSYS 1 , a finite-element commercial software package that includes detailed computational fluid dynamics analysis. Notice that ANSYS and HotSpot are independent from each other, and ANSYS is intrinsically more accurate yet less efficient. We put a silicon chip of size 20mm×20mm×0.5mm in an oil flow of 10m/s. We apply a 200W power step at time zero uniformly across the die and probe the temperature at the chip center. Fig. 2 compares the transient thermal response at the center of the heat source. The equivalent convection thermal resistance is about 1.0K/W. As can be seen, the time it takes the silicon to reach steady-state are quite similar in both cases, indicating that our transient model is quite accurate. Notice the thermal time constant is on the order of a second.
To further validate the steady-state response of the oil thermal model, we use the same settings as in Fig. 2 , except that we reduce the heat source to 2mm×2mm and 10W at the center of the die. This way, we create greater spatial temperature gradient across the die. Fig. 3 shows the comparison of our oil flow model with ANSYS for on-die maximum temperature (T max ), minimum temperature (T min ) and temperature difference (dT ).
We do not use direct IR measurements to validate the oil thermal model for two reasons: an IR setup is extraordinarily expensive; and our analysis (presented later in this paper) suggests that IR misses some fundamental phenomena. However, we perform a qualitative validation by comparing the steady-state temperature maps predicted by the model and the IR measurements from [12] for an AMD processor. The floorplan is derived from the die photo of the processor, and the power numbers are extracted from [12] . We compare model results with a typical IR thermal snapshot in [12] . The hottest temperature at the "Sched" block is about the same as that in [12] (73 • C vs. ∼70
• C), the coolest temperatures are also close (45
• C vs. ∼45
• C, excluding the blank area on the edges as in [12] ). Notice that we assume uniform power per unit as opposed to the detailed IR image in [12] . The secondary heat transfer path model is also validated in this case, since it is included in the IR measurements. Fig. 4 shows the simulated temperature map of the AMD Athlon processor.
To isolate the effect of the secondary heat transfer path with the oil flow, Fig. 5(a) shows that modeling the secondary path is necessary because a significant portion of heat is removed through the secondary path to the oil flow. Without the secondary path, the predicted temperatures are much higher (over 10
• for the AMD Athlon processor). On the other hand, for forced air cooling with a copper heatsink, Fig. 5(b) shows that adding the secondary path does not make a noticeable difference (less than 1%). This is because almost all heat is transferred through the primary path, which has a much lower thermal resistance.
Distinct Thermal Responses
With the modified thermal model, we investigate the impacts of AIR-SINK and OIL-SILICON on both transient and steadystate thermal responses of the the same silicon die.
Transient Response
There are two types of transient responses that are interesting-the warmup phase when everything starts to heat up from ambient temperature; and the fast oscillations around steady-state where everything has reached a fairly stable temperature. 
Long-term transient response during warmup
OIL-SILICON and AIR-SINK have distinct transient responses relative to each other. Fig. 6 shows the warmup phase of both cooling configurations. Both of them have the same equivalent overall convection thermal resistance (R conv =1.0K/W). For illustrative purposes, we apply power for about 6 seconds duration to one hot block that occupies a small area of the die, whereas the other blocks in the die have no power consumption. The power density is 2.0W/mm 2 . We look at the temperatures of the hottest block and the coolest block. For the long-term response, OIL-SILICON reaches steady-state temperature much faster than AIR-SINK. This can be easily explained by the fact that the copper heatsink and copper heat spreader have a much larger thermal capacitance than the thin layer of IR-transparent oil, hence a longer thermal RC time constant during warmup. Another observation from Fig. 6 . This is because the lateral heat spreading is much better in copper. The low thermal conductivity of the oil leads directly to higher onchip temperature gradient.
Short-term transient response
The short-term transient thermal response is of more interest, because for high performance systems that do not shut down frequently, the systems spend most of their time oscillating with relatively high frequencies around some stable operating points. For this experiment, we use the same floorplan as in the experiment of Fig. 6 . In the power trace, we apply the power to the hot block for 15ms and then turn the power off for 85ms. Assuming this power on and off phases repeats periodically, we can use the average power derived from this power trace to solve for steadystate temperatures, and use them as the initial temperatures for the transient thermal simulation. Fig. 8 shows the short-term transient responses of OIL-SILICON and AIR-SINK. As can be seen from the figure, the short-term transient responses of the two cooling configurations are significantly different. The heatup and cool-down phases of OIL-SILICON is more linear than those of AIR-SINK. A more significant difference is that it takes much longer for OIL-SILICON to cool down.
The reason for this phenomenon is again the difference in thermal capacitances between the two cases. For AIR-SINK, since the thermal capacitance of heatsink is orders of magnitude higher (∼250×) than that of silicon due to the heatsink size, the short-term heat pulse (i.e. with high frequency) can only heat up the silicon, leaving the temperature of the heatsink almost unchanged. These lead to the fact that local short-term thermal time constant is mainly determined by the silicon thermal time constant (see Fig 7(a) ):
On the other hand, in OIL-SILICON, the thin thermal layer of oil is very tiny in volume (about 100μm thick for a 10m/s oil flow), so its thermal capacitance is much smaller even compared to that of silicon! So for very short power pulses (i.e. very high frequency), the silicon temperature remains almost constant. For all other time scales, thermal capacitance of the oil can be neglected due to its small value, and the thermal resistance of the silicon can also be neglected because it is usually much smaller than the convection thermal resistance. Therefore, as shown in Fig. 7 (b) , the only dominant thermal time constant is τ all,oil = R conv (C th,Si + C th,oil ) ≈ R conv C th,Si (6) From this analysis, clearly, there are two phases for the case of AIR-SINK-the short-term (milliseconds or less) thermal response is determined by R th,Si C th,Si ; the long-term (seconds to minutes) thermal response is determined by R conv C sink . In the case of OIL-SILICON, both short-term and long-term thermal responses are determined by R conv C th,Si . Comparing Eqn. (5) and Eqn. (6), because R conv is two orders of magnitude greater than R th,Si (e.g. 1.042K/W vs. 0.0125K/W in our setup), the short-term thermal time constant of OIL-SILICON is much longer than that of AIR-SINK. It is well known that the step transient response of an RC ladder is exponential. Due to the greater short-term time constant for OIL-SILICON, for the time duration of milliseconds, the slow exponential curve looks linear locally. The asymmetry between the heat-up and cool-down phases in OIL-SILICON can be explained by the location of the initial temperature on the exponential curve. In Fig. 8 , the initial temperature happens to be closer to the origin of the exponential curve, therefore, the heatup phase is steeper whereas the cool-down phase is at the tail of the exponential curve and thus slower and more flat.
Based on the above analysis, another interesting experiment is to observe scenarios where the transient hot spot location changes. For example, if we run the transient simulation for a processor similar to Alpha EV6 from the steady state, and apply 2W first to the IntReg for 10ms and no power on FPMap, then after 10ms, IntReg is turned off and FPMap starts dissipating 2W. Fig. 9 shows that at 14ms, FPMap becomes the new hot spot in AIR-SINK, whereas in OIL-SILICON, IntReg still remains as hottest spot. This can be explained by the fact that AIR-SINK has To summarize the transient thermal responses-OIL-SILICON has faster long-term response but slower short-term response than AIR-SINK. These differences have a big impact on the DTM decisions and on-die thermal sensing and sensor placement, as we will see in Section 5.
Steady-State Response
The major difference on steady-state thermal responses between OIL-SILICON and AIR-SINK is the absolute temperatures and across-die temperature gradient. Fig. 10 shows the steadystate temperature maps of a processor similar to Alpha EV6 running gcc benchmark with OIL-SILICON and AIR-SINK, respectively. As can be seen, OIL-SILICON has about 30 degrees hotter maximum temperature and about 55 degrees higher across-die temperature difference. This is caused by the much less lateral heat spreading in OIL-SILICON as a result of the absence of copper spreader and heatsink.
Another factor we have not considered in OIL-SILICON so far is the oil flow direction. Oil flow direction is important because the local convection thermal resistance along the flow over silicon die is a strong function of location, i.e.,
Here h(x) is the local heat transfer coefficient, where x is the distance along the oil flow from the chip edge, and can be expressed as:
Basically, h(x) is highest at the leading edge of the die, and decreases along the flow direction. This translates into lower convection thermal resistance at the leading edge of the die, and higher convection thermal resistance along the way. Therefore, heat generated from units that are close to the leading edge is removed more efficiently. The significance of considering oil flow direction is demonstrated in Fig. 11 . Because the hottest unit is usually IntReg, which is on the top edge of the chip, an oil flow from top to bottom (last column in Fig. 11 ) is the most effective way to cool down this hot unit. In fact, IntReg is cooled so well that it is no longer the hottest unit. The new hottest unit becomes Dcache, which is further away from the leading edge of the oil flow.
In the case of AIR-SINK, the impact of the direction of air flow is negligible. This is because the temperature distribution inside the heatsink is quite uniform thanks to the high thermal conductivity and good lateral spreading of copper. Additionally, in most forced air cooling packages, the fan is placed on top of the heatsink, providing an impinging flow that is very uniform on the heatsink surface, resulting in more uniform heat transfer coefficient over a surface that is already close to isothermal.
Implications on Architectural DTM
The impacts of OIL-SILICON and AIR-SINK on silicon thermal responses have been demonstrated in Section 4. For the same chip running the same workload, they show drastic difference in both transient and steady-state temperatures. If IR or other runtime measurements are being used in the design prcess, being aware of the differences during the development of DTM techniques and on-chip sensor placement is thus crucial for the design to be efficient and free of thermal hazards. Based on the observations in Section 4, we list various aspects of the temperature-aware microarchitecture design that should be carefully considered.
DTM engagement duration
Depending on how fast a functional unit or a processing core responds to a DTM policy, the duration of the DTM engagement is chosen to minimize performance penalties. For example, AIR-SINK has a faster transient response than OIL-SILICON. Therefore, DTM should be engaged for a shorter duration for a chip cooled by AIR-SINK than the same chip cooled by OIL-SILICON. Figs. 12(a)-(b) are the simulated temperature trace snippets from an Alpha EV6 running gcc. We use SimpleScalar 2 with integrated Wattch [2] and the modified HotSpot for the simulations. Fig. 12(a) shows the case of AIR-SINK, where the overall thermal convection resistance (R conv ) is 0.3K/W; Fig. 12(b) shows the case of OIL-SILICON, where the overall thermal convection resistance at the oil-silicon interface is artificially set, for comparison purposes, to 0.3K/W as well 3 ; Ambient temperature is set to a typical 45 • C in all cases. Notice that only the top five hottest blocks' temperature traces are plotted. The temperatures of the hot units in OIL-SILICON are much higher than AIR-SINK because of the very high local power densities, the absence of copper spreader and heatsink, and the low thermal conductivity of the oil. However, the overall average chip temperatures of the two cases are still about the same. This is because the L2Caches that occupy most of the die area is cooler in OIL-SILICON, which balances the impact of the smaller hotter units in the core.
There are several observations from Fig. 12: (1) As shown before, the heat-up and cool-down phases of AIR-SINK are significantly shorter than OIL-SILICON (∼3ms vs. much more than 15ms). This confirms our earlier derivation that OIL-SILICON has a much slower transient response. Also notice that 3ms is typically shorter than IR camera's sampling interval, therefore, IR thermal measurements could miss thermal emergencies within that time scale. In Fig. 12(a) , the processor with AIR-SINK spends more times in phases where temperature is almost constant, whereas for OIL-SILICON, the processor spends most of time in the transient phases, meaning it takes longer time to bring the processor out of potential thermal emergencies in OIL-SILICON. Clearly, DTM techniques are more efficient in AIR-SINK than OIL-SILICON, and shorter engagement duration is preferred. A quantitative DTM performance comparison is almost impossible because the transient temperature scales of AIR-SINK and OIL-SILICON are so different. Reducing R conv in OIL-SILICON might be a viable option to bring the peak temperatures to the same scale as that of AIR-SINK, but that only works for this particular simulated phase of gcc, so it is still not a fair comparison. Therefore, we only provide qualitative comparisons. ( 2) The hottest spot in AIR-SINK is more distinct (IntReg) than in OIL-SILICON with the same R conv . It is hard to identify which unit is hotter in the latter case. This is because in OIL-SILICON, the vertical heat transfer from silicon to oil is not efficient due to the low thermal conductivity of the oil, making the lateral heat transfer within silicon a comparable heat transfer path to the vertical transfer path, heating up the closer neighboring units and making them equally hot. On the other hand, in AIR-SINK, most of the lateral spreading happens inside the copper spreader and heatsink, making the temperatures of units in silicon more distinct. (3) In OIL-SILICON the BPred is cooler than other shown units most of the time, whereas in AIR-SINK, due to the fast transient change, the BPred can sometimes be hotter than other units, which confirms the results shown in Fig. 9 .
Cooling configuration for IR measurements of highperformance/high-power chips
As a side note, from Fig. 12 we can see that for a highperformance chip with high power density units, like Alpha EV6, the hot spot temperature is prohibitively high for the oil cooling configuration, even for a low R conv = 0.3K/W (the oil flow speed would be an unrealistic 100m/s to achieve 0.3K/W con- vective resistance). In reality, for such chips, additional cooling mechanisms other than only the oil flow (e.g. thermoelectric cooling or fast coolant at extremely low temperature) might be necessary to further reduce R conv and hence the hot spot temperatures. In that case, since R conv is lower, the short-term thermal time constant would be also shorter, leading to yet different transient thermal responses from Fig. 12(b) . Those cooling configurations also need thorough thermal characterizations, which is beyond the scope of this paper.
Thermal sensing frequency
In terms of how frequently the on-die sensor should measure the temperature and how fast DTM should respond to a possible thermal emergency, AIR-SINK and OIL-SILION are similar. Although the heat-up transient response in AIR-SINK is faster, OIL-SILICON has a higher base temperature. Therefore, although on a relative scale, OIL-SILICON's transient response is slower, its absolute rate of change is still comparable with AIR-SINK. According to Fig. 12(a) and Fig. 12(b) , in both cases, IntReg's temperature can increase about 5 degrees in 3ms. If the desired resolution is 0.1 degrees, this leads to a sampling interval of at most 60μs.
On the other hand, if higher oil flow speed is applied in OIL-SILICON to bring down the peak transient temperature to comparable levels as AIR-SINK, OIL-SILICON needs less frequent temperature sensing because its thermal change rate is slower. Notice that less frequent sensing does not mean less DTM performance penalty, since sensing does not necessarily trigger DTM. Only the DTM engagement duration after the trigger affects performance.
Thermal sensing granularity
OIL-SILICON has greater cross-die temperature difference (Fig. 10) , meaning steeper temperature gradient across the die. In this case, if a thermal sensor is placed off the hot spot location, the sensor error can be significant due to the large gradient from the actual hot spot to the sensor location. This implies more onchip temperature sensors are needed. On the other hand, if the same number of sensors are deployed for AIR-SINK and OIL-SILICON, then the latter needs to put a larger margin on sensor error to account for the bigger accuracy impact with a misplaced sensor. This extra margin in turn will lower the DTM triggering threshold, making more frequent DTM engagements than necessary, hence more performance penalty.
This also implies that if the on-chip thermal sensor placement is determined based on IR thermal measurements, more sensors than necessary may be deployed. Putting more on-chip thermal sensors adds extra hardware complexity, area and power overheads.
Sensor placement considering oil flow direction
In Section 4.2, we also consider the possibly large impact of oil flow direction on hot spot location in the OIL-SILICON scenario. This phenomenon is especially important during on-chip thermal sensor placement. If sensor is placed without considering flow direction, the real hottest spot can be missed. For example, for a top-to-bottom flow over the Alpha EV6 chip with IR-transparent oil, the actual hot spot is the Dcache. If no flow direction is assumed, the steady-state hottest spot would always be IntReg. Therefore, for a chip with only a few on-chip sensors available, placing the sensor at IntReg would miss possible thermal emergencies at Dcache.
On the other hand, if the thermal sensor placement for a chip cooled by AIR-SINK is determined by IR thermal measurements, and if the oil flow is from top to bottom during IR measurements, the sensor would be placed at DCache, whereas in normal operation, the hottest spot is usually IntReg. This placement could lead to missing the actual hot spot and thus a thermal emergency.
Adding more sensors would be another option, but there are several difficulties: sensors are hard to be inserted into dense array structures such as caches and register files; sensors don't measure the in-situ hot spot temperatures and they also dissipate heat; the speed of the sensor might limit the sampling rate; calibration issues; area overhead, etc. We think a proper way is to combine IR and sensor measurements and thermal modeling to achieve a better thermal design.
Existing studies by Hamann et al. [7] and Renau et al. [12] take IR measurements of a chip and use them to reverse engineer the power map of the chip. In this case, assuming we have a multi-core chip, and each core is dissipating similar amount of power-under an IR camera that captures the thermal map of the chip with an oil flowing left to right across the die, the cores on the right side of the die appears hotter, which results in an artifact of higher reverse-engineered power consumption for those cores. Therefore, during the mapping from IR temperature measurements to detailed power estimations, the factor of oil flow direction still needs to be taken into consideration. As a matter of fact, Hamann et al. have taken the flow direction into consideration for more accurate power extraction in their work [7] .
Conclusions and Future Work
In this paper, we characterize the different thermal responses of forced air-cooling over an attached heatsink (AIR-SINK) and laminar oil-cooling over a bare silicon die (OIL-SILICON), and investigate their implications on temperature-aware microarchitecture design. AIR-SINK is the dominant cooling solution for the majority of high-performance processors, whereas OIL-SILICON is used in infrared thermal imaging of silicon. By modifying HotSpot with oil flow and secondary heat transfer path, we identified different transient and steady-state thermal responses of both AIR-SINK and OIL-SILICON. The direct convective cooling of silicon via the oil flow and the low thermal conductivity of the oil cause OIL-SILICON to have higher hot spot temperature, larger thermal gradients, and slower shortterm transient response. In addition, the direction of the oil flow also plays an important role of hot spot location due to the nonuniform heat transfer coefficient along the flow path.
While IR-transparent oil cooling over bare silicon opens the way to direct and detailed runtime thermal measurements, the unique cooling configuration it requires also makes the silicon transient and steady-state thermal responses drastically different from a chip attached to a conventional air-cooled heatsink. DTM design based on OIL-SILICON measurements generally results in longer engagement duration of DTM than necessary, increasing performance overhead. It also leads to more on-chip thermal sensors to account for the greater across-die thermal gradient, hence increasing hardware complexity. The flow direction dependency of hot spot location also causes challenges of thermal sensor deployment and the accuracy of temperature-to-power reverse conversion.
Notice that in this paper, we only present analysis and results for the architecture implications of two specific chip cooling configurations. The entire design space of thermal packages and interaction with temperature-aware architecture-level performance needs thorough and quantitative analysis. This will be an interesting future work.
Another interesting future goal is to enhance a design-time thermal model to reconcile the differences among thermal packages. For example, it could be useful to ascertain the thermal response of a chip with air-cooled heatsink based on the IR measurements from an oil-cooled bare silicon die. Certain factors such as the temperature dependency of leakage power and the feedback loop from DTM may make such a derivation more complicated.
