This paper proposes a new temperature constrained power management scheme for 3D MPSoCs that utilizes instantaneous temperature monitoring along with information on the physical structure of the stack to determine operating V-F levels for pro cessing elements (PE). The scheme implements a weighted pol icy that prevents PEs deep inside the stack from being turned off, maintains operating temperatures stable and within safe margins, and reduces overall execution time by up to 19.55%.
between PEs in the stack, and implemented through a weighted policy that prevents PEs on deeper layers from reaching critical temperatures and thus being turned off. The scheme is eval uated for per-core and island granularities, and is observed to effectively maintain temperatures of all PEs stable and within safe margins when compared to the conventional 2D DVFS ap proach. 2 
Thennal Modeling
A 3D integrated circuit contains multiple vertically stacked silicon layers, each containing PEs and memory modules. Most compact thermal models use resistor and capacitor to model the steady-state as well as transient temperature response in such circuit, analogous to electrical RC networks [8] . Thermal con ductance between two PEs can be calculated using conductance equations. However, due to the flow of heat in different direc tions, additional information like impedances in different direc tion and various paths are required to have a direct relation be tween temperature and V-F level. Figure 1 illustrates a thermal model of a section of 3D die stack where resistor, capacitor and current source denote thermal resistance, thermal capacitance of a PE, and heat transfer rate or power of a PE, respectively.
The heat sink is shown at the bottom of the stack which con nects to die 1 through a thermal resistance R hs. For a thermal model to be accurate, each thermal cell must be small enough so as for the temperature within it is to be assumed uniform. The heat flows in all directions and through different paths. The ratio of heat flowing in the different directions depends on the ratio of impedances seen in those directions. The difference be tween D.. Tl and D.. T2 is a strong function of material properties, and it increases as thermal resistance between them increases. This becomes more cumbersome in an actual model where a PE node is not only connected to the nodes above or below it, but also on the same plane via Rlateral .
Although Rlateral is often ignored, this resistance should be considered in deep stacks as the conductivity to the ambient decreases with the depth in a stack. The temperature change at 
Power Management Scheme
The temperature of a PE is primarily determined by its power dissipation, as well as its location within the 3D stack and for heterogeneous systems, its area. Activity factor (utilization) from PE performance counters, temperature from PE thermal sensors and total chip power obtained from the system are con sidered as inputs by the proposed Power Management Block Initial Updates: At the beginning of a new control period, the difference between total chip power and local power budget value are computed. In the event that a new temperature check cycle has started, the difference between actual and critical tem perature of each PE is updated.
Thermal Runout: This step ensures that temperature of each PE is maintained within the safety margin. Each PE is assigned a weight
where a and b are experimentally determined constants. A less active PE with a strong thermal relation with the victim is con sidered to have the heaviest weight, and is thus considered for V-F scale down first. If required, the next candidate PE is se lected and scaled down. In the event that temperature cannot be brought below the critical, the victim is turned off. In order to prevent repeated fluctuations, when the V-F level of a PE is scaled down due to a victim PE, it is not reinstated until the victim is within the safe temperature margin. Such updates are performed in the initial update stage.
Convergence Check: Power value is considered as converged if total chip power is between 98% and 100% of power budget value. If this is not the case, V-F level pull up or pull down is required.
Pull Up/Pull Down: To scale the system up or down depend ing on the allocated power budget, a weighted equation is con sidered.
(c * Util) + (d * normalized_temp_margin)+ (e * normalized_height) + (f * normalized_area) (4) where c, d, e andf are experimentally determined constants. A highly active PE that is cooler, situated close to heatsink and with a larger area is the preferred choice for V-F upscaling.
However, scaling is performed only if the new temperature after scaling is below the safety margin. This upscaling is performed iteratively until no more PEs can be pulled up, or if the total power reaches the 98% window of convergence with the bud get value. In the event that the budget has been exceeded, the pull down stage is invoked in order to converge. For V-F down scaling, the PE with the smallest weight is selected and the pull down is iteratively performed until no more PEs can be pulled down, or if the total power falls below the budget value. At each instance of pull up and pull down at a PE, the difference between its actual and critical temperature is updated. 
Experimental Results
In order to evaluate the proposed scheme, the auto.basicmath application from the MiBench benchmark suite was used [11] .
A trace of the application's activity profile was generated us ing SimpleScalar [12] , while 3D-ICE The proposed approach was also applied to vertical voltage islands as shown in Figure 8 . The weight of an island was taken to be the average of its constituent PEs. Table 2 provides a com parison between per-core and voltage island based approaches.
The granularity and depth of islands can essentially be altered in a deep stack to achieve benefits of islands as well as per core approach. Implementations of such a scheme would also need to consider thermal relationships between islands in order to control temperatures effectively. As a result, islands higher up in the stack could achieve better performance, while consid ering their thermal relationships would allow for V-F levels to be effectively scaled down should thermal conditions on lower dice require it. 
