A programmable thermal management interface circuit for PowerPC systems has been designed, implemented, and tested for the Integrated Thermal Management (ITEM) System [1] . Instead of worst-case design, the ITEM system approach is to target nominal power dissipation and have the system actively monitor its thermal activity and control cooling mechanisms to ensure operation within speci®cation. Using a suitable combination of hardware and software, the interface design yields intricate control and optimal management with little system overhead and minimum hardware requirements, as well as provides the¯exibility to support different management algorithms. This interface circuit was fabricated in the HP 0.5 mm single-poly 3-metal process through MOSIS. q
Introduction
Increases in circuit density and clock speed in modern computer systems have brought thermal issues into the spotlight of VLSI design. Local overheating [2, 3] in one part of a high-density circuit, such as CPUs and high-speed data routing circuits, can cause a whole system to crash. Besides the use of heat sinks and other heat dissipation mechanisms, early detection of overheating and proper handling of such an event is becoming an essential capability to avoid system failure [4] . The ACPI (Advanced Con®guration and Power Interface) speci®cation [5] was developed to provide a standardized approach to con®guring the hardware, systems, and software necessary for power and thermal management within personal computer systems.
Temperature sensors and system monitors are a core part of any reliable thermal management system. However, current desktop implementations of the ACPI standard require an external temperature sensor, which suffers a time delay in temperature reading due to the thermal constant between the integrated circuit being monitored and the external sensor. Furthermore, in most ACPI hardware implementations, the embedded system monitor is hardwired, and thus in¯exible, and uses a simpli®ed control algorithm which prohibits optimal management; on the other hand, pure software implementations require an active daemon in the operating system, which decreases system performance [5] .
In this paper, we present an implementation of a Thermal Management Interface Circuit (TMIC), a subcomponent of a thermal management chip (TMC) to be used in PowerPC systems. Using a suitable combination of hardware and software, such a system can reduce operating system overhead while achieving extra temperature control, as well as provide the¯exibility to support different management algorithms. Implementing the ACPI protocol is easy using this design since the TMIC provides programmable threshold-generated interrupts and a memory-mapped interface. This design yields intricate control and optimal management with less system overhead and minimum hardware requirements.
In Section 2, the design speci®cations and architecture of the TMIC are addressed and justi®ed. In Section 3, implementation¯ow, tools, simulation and test results are presented. Integration and functionality in our ®nal system are also addressed. Implementations of ACPI and other thermal management protocols using this design are discussed in Section 4. thermal management scheme [1] . This system design supports temperature reading and thermal control activity to be performed locally at a node but also provides for global control capability at the system host as well. Each node of this Integrated Thermal Management (ITEM) System contains a PowerPC 604 CPU [6] , an Enhanced Router Interface (ERIF) [7] , a number of DRAM devices, and a TMC. The TMC device contains an on-chip temperature sensor [8] with an integrated A/D converter [9] , and a TMIC [10] . The ERIF device is a custom component that contains a network router, network interface, PPC604 bus controller, and a DRAM controller [7] . Additionally, the ERIF contains 3 embedded ring oscillators to serve as temperature indicators. These temperature sensors as well as the analog temperature sensor in the TMC are based on designs that have been optimized through previous research [3,7±9,12] and are not the focus of this paper.
The TMIC is the con®guration and interface portion of the TMC. It contains con®guration registers to allow system software to set the sampling rate for reading the temperature and a threshold value for which an interrupt is generated, as well as PowerPC 604 bus interface circuitry to communicate with a node processor. Fig. 1 shows a block diagram of a thermal management system for advanced computer systems. Fig. 2 shows a block diagram of the TMIC. The detailed function of each portion is described below.
² PowerPC 604 interface: supports the bus arbitration policy and four-clock burst read and write mode [6] for Power PC 604 processors. It translates the CPU address and controls signals into corresponding internal control signals in order to access (read or write) the con®gura-tion, sample, and threshold registers and monitor (read) the sampled temperature value. ² Con®guration register: 4-bit register that contains 2 temperature sensor selection bits, one interrupt enable bit, and a threshold¯ag bit (read-only), which indicates the sampled temperature has exceeded the speci®ed threshold temperature. Bit assignments are optimized to reduce the number of clock cycles needed for checking the threshold¯ag. The sensor selection bits are used to specify which temperature sensor is currently being sampled (the embedded one on the TMC or one of the 3 external ring oscillator temperature sensors in the ERIF). The combination of the interrupt enable bit and threshold¯ag allows the system programmer to choose to implement polling-based or interrupt-driven (or some combination) temperature monitoring. ² Sampling register: integer value that speci®es how many clock cycles elapse between temperature samplings. This value is compared with the value of a continuously incrementing counter. When the two values match, the current temperature sensor value is latched into the temperature register, and the counter is reset to 0. The value of the sampling register may be very different for each sensor or thermal management algorithm requirement. ² Threshold register: integer value that speci®es a threshold temperature. When the temperature register value exceeds this threshold value, the threshold¯ag is set, and an interrupt is generated if the interrupt enable¯ag is asserted. ² Temperature register: 8-bit register that stores a value from the currently selected temperature sensor. The temperature register is updated at the periodic rate speci®ed by the sampling register. ² Interrupt generator: when the temperature reading exceeds the value of the threshold register, an interrupt is generated if the interrupt enable in the con®guration register is asserted. The capability to disable interrupts provides the option for the system design to implement an active or passive thermal control algorithm.
As mentioned above, any one of three ring oscillators on the ERIF chip or the on-chip temperature sensor can be monitored to provide the¯exibility of measuring temperature from different locations in the system. For the ring oscillator temperature sensors, the TMIC contains an 8-bit down counter that is used as a frequency counter, which is calibrated by SPICE simulations and measurements. Depending on the value of the sensor selection bits in the con®guration register, either this 8-bit counter value or the 8-bit digital ®lter output of the on-chip sensor will be stored in the temperature register. The temperature sensor selector also controls the supply power of ring oscillators. By disabling unused temperature sensors, extraneous heat and noise sources may be eliminated.
The TMIC registers occupy two cache lines in the node memory map, one for the temperature register, the other one for the read/write of con®guration, sample, and threshold registers. The architecture parameters, such as register bit widths, are based on simulations, measurement, and previous research [2, 8, 9, 12] . Also, careful address and bit assignments have minimized and optimized the TMC pin counts as well as the use of system resources by allowing only burst-mode memory accesses.
Circuit implementation
This chip was fabricated on an HP 0.5 mm single-poly 3-metal process through MOSIS. The die microphotograph is shown in Fig. 3 . The temperature sensor [8] , digital ®lter [12] and TMIC are indicated on the photo. The TMIC occupies 231:3 mm £ 1094:4 mm of the area and contains 3293 transistors.
The layout of the TMIC was generated by Powerview schematic capture tools [13] and Lager synthesis tools [14] using standard cells developed for the ERIF chip. These standard cells have been modi®ed previously to ®t sub-micron processes. Functionality of the TMIC was veri®ed by Powerview simulations using Lager standard cell VHDL models and Berkeley IRSIM [15] at the transistor switch level. Both simulations indicate this chip was fully functional at the system clock requirement of 50 MHz.
An initial lot of 5 TMC die were packaged in 40-pin DIPs for low-cost functionality testing. Upon successful results from this test, the remaining TMC die were packaged in 40-pin LCC packages for inclusion in the ITEM system. A ®nal system node board is shown in Fig. 4 . The TMC works perfectly at the targeted system speed of 50 MHz.
Thermal management system implementation
Different approaches for a thermal management system can be easily implemented with the TMIC device, since the TMIC provides¯exible ways for systems to read the temperature, set the threshold value for interrupt generation, and measure temperature values from different sensors. In this section, we illustrate how the TMIC can be used to implement an ACPI-compliant protocol [5] as an example for other thermal management algorithms. A typical¯ow chart for an ACPI implementation [5] is shown in Fig. 5 . The system acquires a temperature and ®rst determines if it is within the current granularity window, repositioning the window as needed. If the sampled temperature has exceeded the Passive Cooling (PSV), Active Cooling (ACX), or Critical Temperature (CRT) thresholds, corresponding de®ned actions, such as reduce CPU clock, activate fan, and system shutdown, will take place. Most software approaches require a continuous loop, while hardware approaches use interrupts. The advantage of the hardware approach is that processor time is used for thermal management only when certain situations require it.
With the use of the TMIC, all system actions needed to implement the ACPI protocol are triggered by the actions of acquiring the temperature and generating an interrupt when the sampled temperature exceeds the threshold. These two actions can be used to implement the ACPI protocol as shown in Fig. 6 . Here, there is no concept of a granularity window-all temperature values are of importance. The threshold register is initialized to the PSV value in the ®rst step. If an interrupt is generated, it indicates that the PSV has been exceeded, so the CPU performs a prede®ned action for this threshold and resets the threshold register to the ACX value. This time, when an interrupt is generated, it indicates the ACX value has been exceeded, so the CPU performs a more severe prede®ned action and now resets the threshold register to the CRT value. Since the TMIC continuously samples the temperature and compares it against a programmable threshold, CPU resources are required only when signi®cant events have occurred. In contrast, software implementations require CPU resources even under default circumstances because they rely on polling to sample temperature values and then must perform computation to determine if a signi®cant event has occurred. A further danger with software implementations is that signi®cant events, such as temperature spikes, may be easily missed if the polling period is not suf®ciently small. Compared to other hardware implementations, the TMIC reduces the number of different interrupts and requires the same amount of software cooperation. Furthermore, unlike most hardware implementations, the TMIC provides the ability for the system to actively acquire temperature readings in addition to passively waiting for critical situations.
The TMIC can be used to implement but is not limited to the ACPI protocol. For instance, the temperature threshold can be set to any number of values to represent any number of critical situations. Fuzzy logic control and other algorithms requiring more levels of alerts can be applied. Also with the capability of actively acquiring temperature measures at any time, the CPU can verify a desired temperature response when it executes a cooling action. With this feedback, actions like increase/reduce FAN speed and clock rates can be applied for more complex management algorithms.
Conclusion
A TMIC for PowerPC systems was designed and implemented. This paper presents the architecture, design¯ow and applications for this circuit. The designed architecture yields a balance of hardware and software that is required for the hierarchical thermal management scheme of the ITEM embedded multi-computer system. The TMIC may be used to implement the industry-standard ACPI protocol. However, its elegant,¯exible design enables it to implement more complex thermal management algorithms as well. The proposed architecture combined with the fully integrated fan controller [11] used in this system, make it possible to integrate the whole thermal management system inside the CPU with minimum external components and minimum physical size. 
