The 2018 award was presented to "Temperature-Aware Microarchitecture" by Kevin Skadron, Mircea R. Stan, Wei Huang, Sivakumar Velusamy, Karthik Sankaranarayanan, and David Tarjan. All the authors were at the University of Virginia at the time of publication.
This work was carried out at a time when power dissipation of microprocessors was starting to be regarded as a major concern by the scientific and industrial community. Power dissipation of microprocessors had been increasing exponentially over the past years, and this trend was unsustainable. Some pioneering works during the late 1990s and early 2000s observed that power dissipation would soon limit the increase in performance that Moore's law should theoretically provide, meaning that effective use of both the increasing transistor density and their faster switching rates would be jeopardized by their associated power increase. These observations motivated an increasing number of works in the area of power reduction, which has continued as a hot research topic since then.
The awarded paper, building on prior work by Brooks and Martonosi 1 and Skadron, Abdelzaher, and Stan, 2 provided a new perspective on the problem by focusing on the thermal issues caused by power dissipation. Chip operating temperature has important implications in performance and energy consumption, but most importantly, it cannot exceed the threshold associated with reliable operation. The paper claimed that many power reduction techniques may have little effect on operating temperature because they may not reduce power density in hot spots. Temperaturespecific techniques had been confined up until then to thermal package design (heat sink, fan,
Antonio González
Polytechnic University of Catalonia and so on) or costly system-level techniques (such as stopping the clock when operating temperature exceeded a safe threshold). On the other hand, this paper made a case for architecture-level thermal management techniques. These techniques can work at a finer granularity, both in terms of time and space, by reacting more quickly and focusing on the particular blocks that are responsible for the hot spots at each moment.
The paper proposed a number of microarchitecture-level techniques, such as localized toggling of microprocessor blocks (dynamically adjusting the duty cycle based on the operating temperature), migrating computation to spare units located in different parts of the chip to better distribute the heat, and dynamic frequency scaling based on temperature tracking. This paper also showed the value of formal feedback control methods in managing these techniques.
However, the most important contribution of the paper was a novel approach for modeling temperature at the architecture level. This approach was based on a well-known analogy between heat transfer and electrical phenomena by which heat flow can be described as a "current" passing through a thermal resistance and leading to a temperature difference analogous to a "voltage." Thermal capacitance is used for modeling the delay in temperature change produced by a change in power. Each microarchitecture block of the system represents a node in this RC circuit, whose thermal resistance and thermal capacitance depends of the physical characteristics of the block, mainly its area and thickness, in addition to the physical properties of the material used for each component (such as silicon and copper). Using this approach, the authors developed a tool called HotSpot; they validated it using a commercial, finite-element simulator of 3D fluid and heat flow, and then made it publicly available.
