The main contribution of this work is an analytical model for finding the upper bound on the temperature difference among various locations on the die. The proposed model can be used in many applications, such as estimation of maximum temperature variations on the die and estimating the maximum placement error in temperature sensor placement algorithms. The model also identifies the conditions under which these maximum temperature variations might happen, which is very helpful for generating test data for thermal stress tests and for augmenting different benchmarks. Experiments show that maximum temperature differences can be underestimated as much as 9ºC. Based on this model, a temperature sensor placement algorithm is also proposed which is able to guaranty a maximum temperature error due to placement of the sensor. The ability of the proposed model to estimate point to point maximum temperature difference can improve the efficiency and accuracy of the sensor placement technique so that we can reduce the number of thermal sensors needed by about 16% on average.
INTRODUCTION
High temperatures in new generations of VLSI circuits and embedded systems require that thermal considerations be taken into account during design, manufacturing and test [2] . High temperatures and temperature variations result in several issues such as degrade reliability, slow down devices, increase resistances and leakage power, etc. 8. Spatial and temporal temperature variations happen as a result of differences between functionality and structural differences and computational activities across the chip and workload variations during the time. Temperature variations as high as 50ºC across the die in a modern microprocessor are reported in [2] . Temporal and spatial temperature variations may result in performance mismatches, which can in turn lead to performance or functional failures. For example, since resistances scale with temperature, differences between temperatures of two regions of the die cause difference between resistances at those two regions which may result in clock skew problems in clock networks. Moreover, according to [2] , more than 50% of all integrated circuit failures are related to thermal issues. A comprehensive framework proposed in [5] analyzes the effects of temperature on reliability of multi-core systems. It is shown in [4] that spatial and temporal temperature gradients determine the device reliability at moderate temperatures; and to achieve satisfactory reliability, resolving the thermal hotspots alone is not adequate. Temperature variations are also important in clock tree design and optimization [19] . Increase in temperature decreases the carrier mobility, and increases the resistance of interconnects. These affect the device performance and increase the delays in the system. Temperature variations on the die can change timing characteristics of the wires and interconnect networks. Therefore, the clock signals which are the most timing sensitive signals are really vulnerable to temperature variations on the die. Such issues make analysis of the temperature variations and gradients an important issue. To estimate the magnitude of these variations, we need to know the maximum temperature difference between various points on the die. The maximum temperature difference under different workloads can be found by extensive simulations, which would incur significant overhead, while our method provides an upper bound on the temperature differences with practically no overhead. For systems which interact with other systems, even the same workload can result in completely different behavior due to interactions with other systems, which make simulations of a set of benchmarks even less reliable in determining such parameters. In this paper, an analytical model is proposed for finding an upper bound for the temperature difference between various locations of the die. An absolute upper bound is calculated which does not depend on the workloads or interactions between systems. As an application of this model, we also present a temperature sensor placement technique based on this model. An efficient sensor placement technique can affect the cost of the system and effectiveness of dynamic thermal management techniques. For DTM techniques to be effective, they need to capture the changes in the temperature caused by power consumption variations due to runtime workload dynamics. Many runtime DTM techniques require accurate real time temperature information [6] . Lower or higher temperature estimations than the actual temperature may cause late or early activation of the thermal management techniques which can result in degraded reliability or degraded performance [7] . One of the important causes of inaccurate temperature measurement is the sensor placement error. Thermal sensors are often placed at locations other than the hotspots or other locations of interest since these areas on the die are usually also areas where silicon real estate is at premium. Thus, there can be a disparity between sensor readings and the actual temperature at the location of interest [7] . Adding more sensors on the die can resolve this problem to some extent at significant hardware overhead. An efficient sensor placement algorithm can increase the accuracy of temperature monitoring while reducing the hardware overhead. Our sensor placement technique uses our model to find the maximum temperature difference between the point of interest and potential sensor locations. This is used to limit the difference between the point of interest and the sensor location and guaranty the desired accuracy requirements. The rest of the paper is organized as follows. Section 2 discusses the related work. Section 3 explains the details of the model. Section 4 explains the sensor placement technique based on this model. Section 5 demonstrates the experimental results and Section 6 concludes the paper.
RELATED WORK
Due to the reliability, performance and cost issues caused by high temperatures and temperature variations, accurate estimation of temperature is becoming increasingly important. Usually extensive simulations are performed in order to estimate the maximum temperature variations. To the best of our knowledge, no model has been proposed to estimate the temperature difference between various locations on the die. The technique proposed in [17] is a special case of this problem and proposes a model for estimating the temperature at distance d from a hotspot. It estimates the maximum temperature differential between a hotspot and another location based on their distance and processor packaging information. This model is based on the assumption that the temperature of the hotspot decays exponentially with the distance from a hotspot. For a given maximum temperature error, a maximum distance from the hotspot is calculated. The points within this distance from the hotspot will have a temperature difference to the hotspot within the desired accuracy. Selection of activity factor parameter is not easy and also depends on the application. Therefore the results will not be exact and a pessimistic selection of the distance must be used to guaranty the maximum error. Moreover, when calculating the maximum temperature difference to the hotspot, the result depends only on the distance from the hotspot. In other words, in this model, for all of the points at equal distance from the hotspot, same maximum temperature difference to the hotspot is assumed, while this is not correct, i.e. for the points at the same distance from hotspot, the maximum temperature difference to the hotspot could be very different. This could be due to the temperature effect of other power sources on the temperature around the hotspot. Figure 1 shows the contour map of maximum temperature difference relative to a point of interest in a multi-processor SoC which is used in our experiments and consists of 6 XScale ® cores [11] . It is clearly shown that the maximum temperature differences around the region of interest are not symmetric. Even when there is only one power source, the aforementioned assumption could be wrong, e.g. due to the location of this power source on the chip. Several techniques have been proposed for on-chip placement of thermal sensors. These techniques are usually based on the identification of the hotspots on the die and placing the sensors such that they appropriately cover these hotspots. [16] introduces a systematic technique for thermal sensor allocation and placement in microprocessors. This technique identifies an optimal physical location for each sensor such that the sensor's attraction towards steep thermal gradient is maximized. The problem with this approach is that it does not consider the accuracy of the sensors and does not guaranty a maximum error in the thermal sensor readings. The technique proposed in [18] determines the number and positions of the sensors required for thermal monitoring on a FPGA for an arbitrary design that uses distributed fine-grain reconfigurable logic. This technique relies on the model introduced in [17] to calculate the range of the hotspot which is the maximum distance from the hotspot that a sensor can be placed while still maintaining the intended accuracy. We propose the concept of observability area instead of the range. The observability area is the area around a point of interest in which the maximum temperature difference to that point is always within maximum tolerable error. Due to the location of power sources and the effect of neighbor power sources, the observability area of a point of interest is not usually circular, therefore in these cases considering it as a circle -as done by some previous methods -may be incorrect. The next section describes the details of our model. 
DESCRIPTION OF THE MODEL
Calculating the upper bound of the maximum temperature difference between various locations on the die is an important step during the design process as it enables better placement of temperature sensors and helps with evaluation of potential reliability issues. Our algorithm starts with the evaluation of the effect each power source has on the temperature variations. This can be done by simulation or by using analytical methods. Following this step, the maximum temperature difference between pairs of locations is calculated by exploiting the LTI characteristics of the system. Temperatures at different locations on the die depend on several factors such as the power consumptions of the functional units, the layout of the chip and the characteristics of the materials used in the chip. The differential equations used to describe the heat flow have a form similar to that of electrical current. This duality is the basis for the microarchitectural-level thermal model proposed in [7] and further explained in [8] , [9] and 8. The thermal network generated by the model includes thermal resistors and capacitors. Temperature can be modeled at the level of a functional block, or the die can be divided into regular grid cellsas shown in Figure 1 -to obtain more fine grained estimates. Each grid cell has its own corresponding node in the thermal network whose voltage represents the cell's temperature, while the power consumptions of different components are applied as current sources to the thermal network. The lumped values of thermal Rs and Cs can be computed to represent the heat flow between the units and from each unit to the thermal package. The method for calculating the equivalent thermal R and Cs is described in [9] . Given the layout and the thermal characteristics of a chip, we divide it into a grid of r rows and c columns as shown in Figure 1 . The proper size of the grid cells and the number of rows and columns of the grid can be determined by the method proposed in 8. Since the thermal resistors and capacitors are linear components, this thermal network can be considered a linear timeinvariant dynamic system. We exploit the LTI characteristics of this system as a basis for calculating an upper bound for the temperature difference. First, we explain the idea using a simple case of a single input and single output system, and then extend to the thermal networks with multiple inputs and outputs. Representing the power input to the thermal circuit as p(t), the temperature output of the system f(t) can be represented as:
where h(t) is the impulse response of the system. The maximum and minimum power consumed at each functional unit (p Max and p Min respectively) are known and we also know that the power consumed at a functional unit is always non-negative:
Let's suppose H+ and H-are respectively the sets of intervals where the impulse response h(t) takes non-negative and negative values. Therefore, the equation (1) can be rewritten as:
Based on (2), we can write:
This results in:
Equation (5) provides an upper bound on the value of the output f of a single input single output system based on its impulse response. Suppose a system with m power sources represented as p 1 
(t), …, p m (t). p i Max
, p i Min are maximum and minimum input values (min and max power consumptions of the corresponding functional unit). Considering a single output, and h i (t) as the response of the output to the impulse on input i, the LTI characteristics of the system imply:
Therefore, for the minimum and maximum values of the function, we will have:
This can be extended to the systems with multiple outputs as well. Equation (7) holds for each output of the system. Our model for calculating the upper bound for temperature difference between two different points on the chip is based on equations (5) and (7). The function f is defined to be the difference between temperatures of the two different locations of interest on the die. The temperature difference between points a and b is represented by Td(a,b) . Impulse response of this function Figure 2 . 
(t)=h a,i (t)-h b,i (t) II. Calculate Td
Min and Td Max using equations (5) and (7) Figure 2 Algorithm for calculating the maximum temperature difference between two points a and b
Step 1 needs to be done just once, and then its results can be used for all pairs of interest. The calculations need not be done for all pairs, but only for the pairs of grid cells which are of interest. After simulations of step 1 which are done once, step 2 is done only for the pairs of interest.
Here we show how to generate the power trace leading to the maximum temperature difference. Based on (3), to maximize the value of f(t) at t 0 , for the intervals of τ where h(τ)≥0, p(t 0 -τ) must take the maximum value. For example if h(τ) is non-negative on interval t 1 < τ <t 2 , p(t) must take its maximum value on interval t 0 -t 2 <t<t 0 -t 1 . It can be easily shown that for the intervals of τ where h(τ)<0, p(t 0 -τ) must take it minimum value. These rules allow us to generate the power trace p(t) which leads to the maximum temperature difference. Doing this for all power sources enables us to detect the configurations and scenarios which lead to maximum temperature variations between different points on the die. This information is also helpful in augmenting the benchmarks and generating test data for stress tests. In the next section, the sensor placement method is explained which uses our model to guarantee the desired accuracy of sensor temperature readings.
THERMAL SENSOR PLACEMENT
As explained before, usually the thermal sensors can not be placed exactly at the locations they are supposed to monitor. This causes a disparity between the temperature at the point of interest and the temperature read at the sensor (also known as sensor placement error). In a thermal sensor placement technique, the objective is to find the minimum number of the sensors and their locations such that the placement errors at all points of interest are always less than the required accuracies. We introduced the concept of observability area earlier which is defined as the area around a point of interest a in which the maximum temperature difference to a is always less than the maximum tolerable error. Therefore, to find the observability area, the maximum temperature difference between each point of interest and its neighboring grid cells must be found. The observable set for each point of interest is the set of grid cells which fall into the observability area. The input to the sensor placement technique is a set of points of interest along with their desired accuracies. As shown in Figure 6 the chip is divided into a grid. First the observable area of each point of interest is found. It is the area in which the maximum temperature difference to that point of interest is less than its maximum tolerable error. The observable set is the set of grid cells which completely fall in the observable area. Given the observable sets of points of interest, the minimum set of grid cells will be found such that if sensors are placed in it, each points of interest has at least one sensor in its observable set. Given these maximum temperature difference between a point of interest and its neighboring grid cells enables us to identify its observable set. If some grid cells cannot be used as a sensor location due to some reasons like routing limitations, those cells are eliminated from the observable set. Given the observable set for each point of interest, the next step is to find the optimum number of sensors and their locations such that there is at least one sensor in the observable set of each point of interest. Considering the collection of all grid cells which are potential sensor locations as set G, C is the collection of all k observable sets (O i s) which are all subsets of G. A minimum cardinality set S of grid cells must be found such that S contains at least one grid cell from the observable set of each point of interest. This guaranties the ability to satisfy the accuracy requirements since at any grid cell in the observable set the temperature of the point of interest can be sensed with desired accuracy. This problem is the minimum hitting set problem which is proven to be NP-complete. There are different heuristic algorithms for this problem. We use integer linear programming (ILP) to solve the minimum hitting set problem for minimizing the number of sensors. Let's suppose x a =1 if a sensor is to be placed at grid cell a, otherwise x a =0. In order to minimize the number of sensors, this cost function must be minimized:
Since each observable set must have at least one sensor in it for the corresponding hotspot to be covered, this inequality must hold for each point of interest j:
Based on the above, the sensor minimization problem can be formulated as an ILP problem as follows:
For each grid with x a =1, a sensor is placed in the corresponding grid cell. To solve ILP problem, we use lp_solve [20] which is an integer linear programming solver freely available. lp_solve is based on the revised simplex and the branch-and-bound method for the integers.
EXPERIMENTAL RESULTS
In order to verify the effectiveness of the proposed technique, we applied it to a multi-processor SoC comprised of 6 XScale ® cores [11] . The layout of the chip is shown in Figure 1 . MiBench Ver 1.0 [12] programs are used as benchmarks for evaluation of the technique. MiBench is a free, commercially representative benchmark suite which is developed at the University of Michigan [13] . A set of programs from the automotive/industrial, network and telecommunications categories of MiBench are selected and run on datasets provided by [14] . These programs were used as the workload arriving for each core. To introduce idle intervals between the tasks, a Pareto distribution is used [15] . A timeout-based dynamic power management policy is applied in order to determine the active and low power states which each core experiences during running these workloads. We use the power values for active and sleep states reported in [11] to generate the power trace. HotSpot 3.0 [10] is used for thermal simulations. Parameters used for package are: convection capacitance 140.4 J/K, convection resistance 0.1 K/W, spreader thickness 10-3m, and initial temperature of 333°K. The next three figures provide a real example which shows the case where benchmarks may not be able to generate the maximum temperature differences on the die since it may require very specific conditions. Figure 3 shows a simulation slice in which the temperature difference between points a and b has reached its highest value. The dotted line shows the maximum temperature difference estimated by our model. Although the temperature difference has reached its maximum value during the simulations, it is still lower than the maximum estimated by our model. h (a,b) ,i (t) is calculated by differentiating the response of the temperature difference to the step input i (since it is in discrete time, this is done by differencing the consecutive samples of step response. To keep the example easy to follow, only three power sources are considered and the rest of power sources are off. Based on the h (a,b),i (t), the power traces generating the maximum temperature differences are calculated as explained in section 3. Applying each of these power traces leads to the corresponding temperature differences shown at third row. The overall temperature difference due to all power sources which provides the maximum difference at time 0.9s (time unit 9000) is shown in Figure 5 . As this example shows, the maximum temperature difference occurs under very specific conditions which may be difficult to see with standard benchmarks, but can happen in real working conditions. This accentuates the necessity of a model such as ours that has a proven ability to provide the maximum temperature difference on the die. The difference between simulations and model in this example is not large because of low power consumption of XScale ® cores, the fact that we looked at only 3 cores with observation points in proximity of each other. The error can be much more significant in modern processors with higher powered devices. Depending on which of the two points has higher temperature at any time instance, the temperature difference between them could be positive or negative. The model provides the upper bound for both positive and negative directions. Table 1 shows that the errors as high as 9ºC happen in estimates of temperature differences when relying only on simulations. Even using combinations of different benchmarks does not resolve the problem. Such errors can cause significant functional and reliability issues. For example, if the maximum temperature difference between a sensor and a hotspot is underestimated, it may cause late activation of DTM which may result in serious reliability problems. The simulation overhead of our method is minimal. The largest overhead occurs if we used simulation instead of analytical techniques to calculate the impulse responses. It involves simulating one step response for each power source compared to simulating the whole set of benchmarks when standard simulation is used. We also compare our sensor placement with previous techniques. Previous techniques such as [17] and [18] depend on calculation of the range of the hotspot which is the maximum distance r from the hotspot that a sensor can be placed while still maintaining the intended accuracy. This range has a circular form and is centered at the point of interest which limits the accuracy and the efficiency of such techniques. To be able to guaranty a desired accuracy, this circle must be centered at the point of interest and be completely within the observability area of the point of interest with the same accuracy. Some points which meet the accuracy requirements might be missed, as demonstrated on SoC with 6 XScale ® cores shown in Figure 6 . The two red x's represent the points of interest to be monitored. The observability areas of the points of interest are shown by solid green and blue lines. Circular ranges of the hotspots are shown by dotted circles. Our sensor placement technique considers the observability area of a point of interest instead of its circular range as the potential location of a sensor. Therefore it can identify the point marked by * at the overlapping part of the observability areas to place a single sensor to monitor both points of interest. When using the circular ranges, this sensor location would not be identified since the ranges do not overlap; and therefore two separate sensors would be required. To evaluate our sensor placement technique, we used different values for the desired accuracies of points of interest and compared our sensor placement technique with the techniques such as [17] which use circular ranges. Table 2 shows the number of sensors required to monitor 8 hotspots of the chip with specified accuracy which is the maximum tolerable error between the temperature sensor and the actual temperature at the point of interest. As Table 2 shows, our sensor placement technique needs fewer sensors to monitor the same hotspots with the same accuracy. 
