Abstract-Temperature has become an important issue in while in [15] 
been solved using efficient heuristics that can be applied to our system and application model as well as the temperaturelarge scale applications. aware voltage selection technique. A motivational example Due to increasing demands on performance, embedded is then given in section III. The optimization technique is applications are frequently implemented on multiprocessor described in section IV and the experimental results are systems on chip (MPSoC). Very often they are required to presented in section V. Finally, conclusions are drawn in satisfy strict timing constraints and are functioning with a section VI. limited energy budget. One of the preferred approaches for reducing the overall energy consumption is dynamic voltage II. PRELIMILRIES selection (DVS). This technique exploits the available slack A. System Model times by reducing the voltage and frequency at which the We consider systems realized as bus based multi-processor processors operate and, thus, achieves energy efficiency. In architectures on chip. We assume that the processors can the context of system-level design, energy optimization should operate in several discrete execution modes. An execution be achieved by jointly conducting task mapping, scheduling mode is characterized by a pair of supply and body bias and DVS in the design flow. Some proposed algorithms voltages: (Vdd, Vb,). Each execution mode has an associated [16] , [21] do mapping, scheduling and DVS in successive frequency and power consumption (dynamic and leakage). steps and sequentially conduct each step in every iteration of The functionality of the application is captured as a set the optimization loop, others, oppositely, make simultaneous of task graphs. In a task graph G(J, F), nodes T II decisions for mapping, scheduling and DVS [8] , [14] . represent computational tasks, while edges Tj C F indicate The high power densities achieved in current system of chip data dependencies between tasks (communication). For each (SoCs) do not only result in huge energy consumption but also task, the worse case number of cycles to be executed is given. lead to increased chip temperatures. High temperatures can And each task is annotated with deadlines that have to be met impact reliability as well as cooling and package cost. Based at run-time. On the development of temperature modeling and analysis tools e.g. [5] , [20] , several temperature-aware system level B. Temperature-Aware DVS design approaches have emerged. Wang et al. [18] proposed
In [1] we have presented an approach to combined supply an approach to task scheduling under peak temperature con-voltage selection and adaptive body biasing. Given a multiprostraints. Design space exploration for MPSoCs architectures cessor architecture and a mapped and scheduled application, under area and thermal constraints is presented by Li et al. [9] , the DVS algorithm in [1] [2] . This information will be used for calculating the dynamic As shown in [2] , in most of the cases, convergence is reached energy consumed by the task in a certain execution mode in less than 5 iterations. according to the energy model presented in [1] . Similarly, in
The above DVS technique assumes that the task graph is [1] the equations are presented which are used to calculate already mapped. If this mapping, however is performed before leakage energy, during the optimization process. However, the DVS step, it has to be based on an approach which since leakage strongly depends on temperature, an obvious ignores the temperature at which cores are running. As we will question is which temperature to use for leakage calculations. show in the next section, this can result in significant energy Ideally, it should be the temperature at which the chip will losses. Therefore, in the rest of this paper, we will present and work when executing the application. known for each task) the dynamic power profiles are calculated and the thermal analysis is performed and the temperature is Let us consider the two mapping alternatives in Fig. 3 . For determined for each core in steady state. This new temperature each of the two mapping alternatives, the following steps are is now used again for voltage selection and the process is performed ' repeatd untl thetempeatureconveges. onverence eans For simplicity in this example we do not consider the energy consumed that the actual temperature values used at voltage selection for communication, this energy, however, is calculated in our mapping correspond to the temperature at which the chip will function optimization flow. 1) Construct a task schedule which determines the order of IV. MAPPING HEURISTIC execution.
2) Run the temperature unaware DVS algorithm (the one Our temperature aware mapping approach is based on described in [1] ) which will generate the voltage levels a genetic algorithm (GA) . By imitating and applying the for each task. As discussed in section II-B, the DVS al-principles of natural selection and "survival of the fittest" on gorithm considers, for energy calculation, an "assumed" a population pool (consisting of several solution candidates), temperature provided by the designer (in our case 90°C). GAs are able to gradually improve the quality of solutions and 3) Calculate the consumed energy for each task and for to evolve towards close to optimal result [7] . Each solution the whole system, considering that the application runs candidate individual is encoded as a string (chromosome)
at the assumed temperature. This energy (denoted El and is associated with a solution quality (fitness). Based on in Table I ), differs, of course, from the actual energy their fitness, the individuals are ranked within the selection dissipated by the application, but it constitutes the best pool. In each iteration, the highest ranked individuals are approximation that can be obtained without a tempera-selected for reproduction by mating (crossover) with other ture aware technique. individuals of the population. The produced offspring replaces 4) Using the voltage obtained at step 2, perform the thermal least ranked solutions in the population. Occasionally, new analysis of the system and determine the temperature for individuals are also generated by mutation. A mutation is each core in steady state.
realized by randomly changing the value of certain genes 5) Using the temperature obtained in step 4, calculate the of a chromosome. Fig. 4 illustrates the overall flow of our energy consumed by each task and by the whole system. temperature-aware mapping approach. This energy (denoted E2 in In order to compute the energy, the application is scheduled 2 However, if we consider the real energy consumption, E2, it after which our temperature-aware voltage selection presented turns out that, in reality, Mapping2 is more energy efficient and in [2] is applied, which calculates voltage levels for each task, using that mapping, instead of Mappingl1, reduces the energy such that the total energy consumption is minimized (section consumption by 22% (0.0263J instead of 0.0338J). What II-B). Based on the calculated voltage and actual temperatures, this example shows is that temperature has to be taken into the total consumed energy is obtained. consideration when deciding on an energy efficient mapping Based on the energy (fitness) value the individuals for of a MPSoC application. Such a temperature aware mapping mating are selected, using a roulette wheel technique [7] .
technique is presented in the following section. According to this selection rule, high fitness individuals have a high probability to mate. For mating, multi-point (2 to 8 points) crossover is conducted (Fig. 6 ). The value of crossover Crossover(multi-point) I F mutation(muti-point) |Given a certain application we run the temperature-aware 1rossover(multi-point) tation(multi-point) task mapping (later referred as TaTM), as described in section IV (Fig. 4) , and obtain the optimized solution corresponding a) Overall Flow b) Energy Evaluation Flow to an energy consumption Eta (temperature aware). For the same application we run a task mapping optimization ignoring Fig. 4 . Temperature aware mapping temperature (temperature unaware task mapping, later referred as TnaTM), which is realized by running the same mapping optimization as described in section IV (Fig. 4) , with the V. EXPERIMENTAL RESULTS difference that a temperature unaware voltage selection is
Experimental results presented in this section are aimed at used inside the exploration loop. As discussed in the previous exploring the efficiency of being temperature-aware for task section, this means that energy calculations are based on an mapping, compared to previous mapping algorithms which ig-"assumed temperature" instead of the actual temperature at nore the temperature issue. For our experiments we have used which the cores run and, thus, the TnaTM is less efficient.
both randomly generated applications and real-life examples. Considering the mapping solution and the voltage levels We have randomly generated applications consisting of 60 produced by the TnaTM approach, we run the temperature to 400 tasks. The applications are mapped on a MPSoC analysis to obtain the real temperature at which the application architecture consisting of 9 identical cores. The cores are will run and, finally, we calculate the consumed energy Enita running at 10 different supply voltage levels in the range [0.6V, (not temperature aware). By comparing Eta with Enita we can 1.8V]. The temperature model related coefficients are the same appreciate the efficiency of using a temperature aware mapping as in [2] , while the power models and associated parameters scheme. are the same as in [2] [12] [11] . The total workload of an Given a certain application, we define the energy efficiency application (later referred as TW) is randomly generated in factor G of the TaTM, compared to the TnaTM, as G the interval [107, 9 * 107] cycles. The size of individual tasks (Enta -Eta)/Enta * 100%. is in the interval [103, 106] cycles. Fig. 8 shows the energy efficiency factor obtained for the We have generated three sets consisting of 50 applications three application sets. For each set the average and maximum each. For the first set, SI, the applications are such that 75% value of G is indicated. It can be observed that, in the case of the TW is realized by the tasks with sizes in the interval of applications in which task sizes are very similar (set St) [104, 105] cycles, while the rest of the TW is realized by the energy efficiency factor is smaller. This means that the the tasks with sizes in the interval [103, 104] or [105, 106] potential gain of applying a temperature aware approach is cycles. In the second set S2, tasks sizes are distributed over larger for the applications with an uneven distribution of task sizes (set S3).
The explanation for the above phenomenon has its roots in Mappingl the exponential dependence of leakage current on temperature. As a result, cores running at high temperature will dissipate an 1 2 unproportionaly high amount of energy. Therefore, a solution Fig. 8 and Fig.   S3 ). The TaTM approach will actively seek balanced solutions 9. Obviously, Enta, the energy consumption produced by the since those will produce lowest energy. The TnaTM approach, TnaTM approach, is as larger as further away from reality the however, which ignores the dependence of consumed energy designer's "temperature guess" is.
on temperature, can end up with an unbalanced solution
In Fig. 10 we show the average energy efficiency factor that, according to the energy calculation at the "assumed for the applications in set S2, for different "temperature temperature", consumes less energy and also satisfies the time guesses". If the temperature guess of the designer corresponds constraints. While in the case of very similar task sizes, it is to the average temperature, as considered in our previous possible that the solution produced by the TnaTM is relatively experiments (temperature difference = 0) the average factor balanced, this is much less likely in the case of applications G is identical with the one in Fig. 8 TaTM   0  5  10  15  20  25  30 approach as a function of the application size. As can be SD_Tna observed, even very large applications can be handled in Fig. 9 . Relationship Between SD_Tna and G reasonable amounts of time.
We have investigated the efficiency of temperature aware Fig. 9 shows the average energy efficiency factor G, for task mapping using two real-life examples: A GSM voice the same three sets of applications as in Fig. 8 , as a function codec and a multimedia MPEG4 audio-video encoder. Details of SD_Tna. As we can see, the more unbalanced the core regarding the two applications can be found in [17] and [13] , temperatures produced by the TnaTM are, the larger the gain respectively. The GSM voice codec is composed of an encoder 
