become an important concern in modern processor and other microelectronic chips. The problem has become especially severe as the ability to reduce supply voltage has slowed. As a result, the number of devices per unit area is scaling up faster than the power density is scaling down. This requires more expensive cooling solutions to keep the chip and its local hotspots cool, and these challenges will be exacerbated by 3D integration, which seems imminent. Furthermore, high temperature slows integrated circuits because of degraded carrier mobility and interconnect resistivity. It also accelerates multiple chip-failure mechanisms such as electromigration and negative bias temperature instability (NBTI), because the wearout rate has an exponential temperature dependency. Static leakage power is primarily an exponential function of temperature. There's also the possibility of thermally induced security vulnerabilities, such as denial of service. 1 Unfortunately, air cooling's ability to address temperature concerns is limited by system-level power constraints, acoustic challenges, and sometimes form factors, while alternative cooling solutions are still too expensive for commodity use. Temperature-aware design can reduce these problems.
Why temperature aware, not just power aware?
Addressing these thermal concerns requires modeling at every design stage, from early, pre-RTL design exploration to post-layout timing closure. Although temperature is fundamentally a byproduct of power consumption, and temperature-aware design is intrinsically related to power-aware design, often using the same techniques, significant differences exist between the two design approaches. 2 First, temperature is proportional to power density, not just power. Therefore, in addition to incorporating better cooling solutions, methods to reduce temperature can also reduce power, increase area, or distribute power more evenly over a larger area. Low-power techniques could actually increase power density and temperature by using smaller structures and limiting activity to a smaller area. Second, temperature is a nonlinear function of time, rising and falling like an RC circuit, while power is an instantaneous value and energy is merely an integral of power over time. Third, power-aware techniques try to further reduce power consumption when usage is low, whereas temperature-aware techniques are primarily engaged when the processor's usage is high. Fourth, power-aware techniques could create more temperature swings as the chip cycles through highperformance and low-power modes, which harm chip and package reliability, whereas temperature-aware techniques generally try to alleviate these temperature swings. Finally, power-aware techniques usually seek to reduce total chip power and typically aren't concerned with localized power densities, while a major goal of temperatureaware techniques is to control local hotspot temperatures.
Why temperature modeling?
When designing a temperature-aware processor, temperature models are needed to fully explore the large design space without expensive silicon prototypes. You might wonder whether power or power density is an appropriate proxy for temperature. Explicit temperature modeling is indeed necessary. Temperature changes gradually in time and space, while power can be a step function. In fact, temperature is a lowpass filter, filtering out both high temporal and spatial frequencies. Temperature is also needed to accurately estimate power, because leakage power is exponentially dependent on temperature. None of these phenomena can be inferred without actually modeling temperature and heat transfer. However, temperature is a function of power, so the power inputs' accuracy and resolution determine the thermal models' accuracy and resolution. For example, microarchitecture-level thermal models using power inputs from microarchitecture units can't be used with any precision for transistor-level temperature estimations. Figure 1a shows components inside a typical server system and the air flows from the inlets all the way to the fan, removing heat generated by different components. A system-level thermal model should model the airflow and account for its impact on the thermal coupling among all the components, such as the impact of hot air flowing off the processors onto the dual inline memory modules (DIMMs). One such model from academia is Mercury. 3 There are also several commercial, system-level thermal models. Figure 1b shows a modern processor's typical package components. There are two heat-transfer paths. The primary path from silicon to heat spreader and heat sink accounts for about 90 percent of heat transfer with forced-air cooling. The thermal interface materials often have more thermal resistance than the other components, making them essential for accurate modeling. The secondary path from the silicon to C4 pads to package substrate and printed-circuit board becomes dominant with passive cooling. A thermal model for chip architecture design should capture both heat-transfer paths with reasonable details to achieve good accuracy. 4 Several thermal models focus on the chip, accounting for the temperature deltas within the chip given a particular air temperature.
5,6
Why early temperature-aware design?
Now, at which design stage should we start to include temperature considerations with the aid of thermal models? The answer is to include them as early as possible in the design cycle, before making decisions that unwittingly rule out the most effective design choices for managing temperature, or even unwittingly commit to choices with severe thermal consequences. Here, we give several examples.
Cooling solution changes chip architecture
Li et al. investigate the cooling solution's impact on multicore chip architecture. 7 For a system with a high-end cooling solution, thermal constraints are less severe. This generally favors complex cores with more power-hungry and high-power-density structures to exploit instruction-level parallelism, and also allows more cores in a chip. On the other hand, a mediocre thermal solution shifts the optimal configuration toward fewer and simpler cores with narrower issue width and shallower pipelines, because these cores have less-severe local hotspots and generally consume less power. Core type is an important lever, but if we choose the wrong type for detailed design and only later discover that it doesn't match the cooling solution's capabilities, the core count, voltage, and frequency will be the main ways to compensate, leading to severely suboptimal performance or dramatically higher cooling costs. A heterogeneous many-core design with one complex primary core and many simpler cores could suffer from the impact of local hotspots inside the large core, especially if it's boosted to run at high frequencies when the other cores are idle. On the other hand, a homogeneous many-core design has more uniform temperature distribution, and thus is more immune to the ability of cooling solutions to smooth out hotspots. Recent work shows that the thermal constraint imposed by a commodity cooling solution makes performance largely insensitive to the complexity of a boostable primary core across diverse degrees of parallelism. 8 Recently, various studies have explored novel cooling solutions. One example is a heat sink design with localized cooling that provides two coolant flow paths, for the hotspots and the rest of the chip, if the hotspots are known in advance. 9 Microchannel cooling (both 2D and 3D) is an example of proximity cooling, where coolant flows through microchannels cut in the silicon substrate and provides a short heat-transfer path from the heat sources to the ambient. Although Huang et al. show that a 2D microchannel can tolerate much higher power densities, 10 increasing chip sizes pose a practical limitation, as pumping coolant becomes much less efficient through longer microchannels. 11 All these advanced cooling solutions would allow significantly higher local-power density and favor tightly clustered cores to reduce communication latency and enable resource sharing. On the other hand, conventional cooling solutions with a long heat-transfer path from hotspots to ambient might favor a distributed-core configuration where cores are separated from each other by last-level caches that have low power density, which can act as thermal buffers. All these examples show how the cooling solution changes the processor architecture-yet, in current practice, designers typically set the architecture before choosing the cooling solution.
Cooling power isn't free
Temperature-aware chip design can also affect system-level design. A thermally optimized processor saves precious cooling power (such as power spent in the fan) by allowing better thermal balancing between processors and other system components such as memory DIMMs and disks, as the processors are usually the thermal bottleneck. This is especially beneficial in power-capped high-end servers. For example, recent work by Shin et al. investigated power optimization between cooling fans and processors. 12 The trade-off here is the cooling fan power and processor leakage power, which is exponentially dependent on temperature.
Device modeling can prevent problems
Temperature also matters during synthesis. For example, temperature can affect timing closure: circuit paths in or near hotspots might violate timing constraints, limiting potential operating frequency. As another example, many characteristics of analog and mixedsignal circuits (such as the output power of a transceiver circuit in a SiGe BiCMOS design) are sensitive to thermally induced mismatches, as observed by Gradient DA. 13 We can identify and prevent such problems with appropriate thermal modeling, before committing the resulting limitations to silicon.
3D integration increases power density
As the industry moves toward 3D integration, early-stage temperature-aware design becomes even more crucial, as 3D integration significantly increases power density. As layers from heterogeneous semiconductor processes are integrated into the same package, existing thermal challenges like those we've discussed are magnified, and new thermalrelated issues arise, such as vertically overlapping hotspots.
Challenges
To excel in temperature-aware architecture design, industry and academia must address several remaining challenges. Here, we identify a few.
Dynamic thermal management is hard
If cooling is sufficient, runtime thermal management is only needed as a fail-safe to protect against extraordinary programs, extreme environmental conditions, or hardware failure. However, as power density and total power rise and hit cooling limits (whether owing to intrinsic limits, form factors, or merely cost), processors could be forced to operate below the circuits' intrinsic performance capability. This requires more sophisticated, efficient runtime thermal management with minimal performance overhead. The challenge, of course, is that thermal management is most needed when the workload is placing the greatest demand on the system and thus is most sensitive to overheads. Dynamic thermal management (DTM) has been extensively studied; see Kong et al. for a survey.
14 Most prior work has focused on throttling throughput (often via dynamic voltage and frequency scaling) or migrating between hot and cold resources to achieve a more even spatial distribution of power dissipation. The former approach, which spreads out work in time, inevitably incurs some slowdown. The latter could avoid slowdown, but spreading out units in space typically increases communication latencies, and migrating tasks incurs some slowdown as well. DTM is particularly challenging in real-time contexts, where unexpected delays can disrupt real-time schedules. DTM becomes even more complicated in the presence of manufacturing variations in both silicon and package (such as thickness variation in thermalinterface material 4 ). Dynamically migrating tasks from a hot core to a cold core 15 could incur more performance loss than simply throttling hot cores, because the cold core can be leakier owing to coreto-core variations. Combined consideration of variability and temperature is necessary to address this problem. 16 Proposals have addressed this problem reactively, based on temperature sensor measurement, 15 or proactively, by predicting future behavior with thermal history and taking preemptive actions. ............................................................................................................................................................................................................... imprecise, so the challenge is to use a small number of on-chip temperature sensors that have limited accuracy to achieve sufficiently accurate thermal control. To make matters worse, hotspot locations change with workloads, and sensor circuits are difficult to place near hot structures, especially dense data paths and array structures such as caches and register files. When the sensors are too far from the actual hotspots, their accuracy falls off dramatically. These challenges are exacerbated by manufacturing variations, which can change the hotspots' location and severity. The appropriate choice of guard bands-and how rigid to make themis also an open question, because they must protect against many failure mechanisms: timing errors, soft errors caused by thermal noise, excessive leakage, and various aging phenomena that develop at different temperature-dependent rates. A few recent studies make promising advances on these issues, 18 but more research is needed. Without sufficient accuracy, guard bands are too large, imposing high costs in performance or cooling. However, aggressive deployment of precise sensors remains costly.
Thermal modeling and management need to be hierarchical
Another major challenge is the multiscale nature of heat transfer. As we've seen, we must consider temperature effects at granularities ranging from individual transistors to entire racks in a data center. Silicon has a spatial temperature variation as small as several microns, and its temporal variations are of hundreds of microseconds to milliseconds. In comparison, packages have millimeters and seconds, servers have centimeters and minutes, and data centers have meters and hours. Designs at different levels are tightly coupled together; so, to achieve a fully thermally optimized design, we also need a tightly coupled thermal model covering the range from transistors to machine rooms, and from microseconds to hours.
Additionally, most reliability problems only manifest themselves over long durations, requiring modeling and management techniques that can accommodate such long time scales without sacrificing too much precision. Bruteforce modeling of fine details over long time scales is prohibitively expensive. Hierarchical thermal modeling and management is needed, requiring new ways to link models from different levels of the design hierarchy in ways that maintain accuracy at each granularity yet capture temperature evolution over long time scales. This will require collaboration from researchers at each level. There are some initial efforts on this topic at the chip level, 19 system level, 3 and data-center level. 20 
Thermals are interrelated with other physical constraints
The thermal constraint is just one of several constraints facing architects. We've already mentioned manufacturing variations; in addition to thermal implications, they also affect performance, power, and reliability. Power delivery limits are another major constraint. The demand for I/O and current is going up faster than the pad count can, because the density of processing units is roughly doubling each generation (that is, Moore's law), while pad size can't shrink much and the number of pads scales slowly. Consequently, power delivery (more accurately, current delivery) to future processors becomes a real challenge. Large currents and current swings also pose thermally induced reliability issues on silicon power supply pads as well as large on-chip voltage noise. Although the emergence of onchip voltage regulators could alleviate these power delivery constraints to some extent, system-level power constraints are present, too (limits on the current that affordable batteries or power supplies can source, limits on power distribution within data centers, and so forth). It's still unclear which constraint will be met first-temperature or power delivery.
With technology scaling pushing the limits of affordable cooling, processor design must be temperature aware from beginning to end, and many important research questions remain.
