Abstract
Introduction
Reducing energy dissipation of embedded systems has emerged as a major task in system design. Portable devices (e.g. cellular phones, notebooks, PDAs etc.) draw energy from batteries with limited capacity which specifies the operational time between recharge phases. Extending these intervals increases the product value to the customer -an important purchasing factor. Faced with problems like global warming also environmental reasons are forcing more and more the design of low power consuming devices, such as general purpose computers and automotive embedded systems. Due to cost and power reduction embedded systems are being implemented as a single chip, called system-on-a-chip (SOC). This has become possible through the development of powerful CAD tools and the dramatically improved feature size allowing today more then 100 million transistors on a single die [1] . To cope with such a complexity designers try to adopt a core based system design methodology where the system is composed out of Intellectual Property (IP) cores (microcontrollers, ASICs, DSPs etc.), also called processing elements (PEs).
Recent advances of dc-dc converters [2] have made it possible to vary the supply voltage of computational PEs dynamically during system operation. Thus enabling to tradeoff performance against power dissipation [3] . Variable voltage scheduling has received noticeable attention from the research community over the last few years [4, 5, 6, 7, 8, 9] . Most of the above mentioned publications [4, 5, 6, 8] consider the power dissipation (switching activity) of the PEs to be constant for each operation which is not the case for many modern VLSI circuits [10] using e.g. gated clocks. In [7] the general variable voltage scheduling problem, allowing variations of power, was formulated as an integer linear programming (ILP) problem under the constraint that possible supply voltages are fixed a priori. The feasibility of this formulation becomes impracticable due to two reasons. First the ever increasing demand of functionality increases the number of tasks describing the behaviour of the system and second it has become possible to implement dc-dc converter on the same chip resulting in the necessity to consider near continuously variations of the supply voltage.
In this paper we investigate the combination of allowing power variations among tasks [7] and not fixing the supply voltages a priori [8] in order to minimise energy of a processing element.
Preliminaries
This section gives the background of variable voltage scheduling and illustrates the motivation for the proposed method on an example.
System model
The system model considers the behaviour as a set of independent executing tasks [11] . All tasks have the same period, otherwise the least common multiple (LCM) theorem is applied [9] . The functionality has to be mapped (allocation and scheduling) to the PEs in such a way that the period (deadline) is met and the overall PE energy consumption is minimised. In our work we assume that the partitioning [12] and the scheduling [11] have already been performed. With each task there is an associated worst case execution time and an average power dissipation [13] . The average power is sufficient since no preemption is allowed.
Power model
Power dissipation P in a digital circuit [14] is caused by dynamic power P dyn and static power P static .
where the dynamic power depends on load capacitance C, 0 to 1 switching activity N 0→1 , supply voltage V dd and the frequency f at which the system runs. Static power is caused by the short circuit current I sc , the leakage current I leak and the static current I st and is often considered to be negligible compared to the dynamic power. From here on only dynamic power is considered. It is evident that V dd reduction lowers the power consumption, but unfortunately it increases the circuit delay d, which is inverse proportional to the operation frequency:
where k is a constant and V t is the threshold voltage under which the circuit does not operate. The system performance S is the ratio of the delay at maximum supply voltage d(V max ) and the circuit delay at supply voltage d(V dd ). System performance and power depending on the supply voltage V dd are given by [8] :
From Equations (3) and (4) it is possible to derive the energy vs. supply voltage E(V dd ) and energy vs. performance E(S) relationships as:
where P(V max ) and t(V max ) are the power dissipation and execution time at maximum supply voltage, respectively and V 0 equals Figure 1 shows the normalised energy vs. supply voltage curve, show that it is always better to run a single task at a lower voltage. Note that Equation (6) differs from the one in [3] , since we consider the use of gated clocks which shuts down unused resources. 
Motivational example
To show the main idea of the proposed technique, consider the following example. Three sequential executing tasks T1, T3 and T5 need to be run on a single processing element using 3, 10, and 6s respectively, reflected in Table 1 and Figure 2 . Assume that V max of the PE is 3.3V . The system time constraint requires that all the task have to be finished by the deadline t d = 22s. The first part of 5W  3s  5W  3s  T3  5W  10s  2W  10s  T5  5W  6s  3W  6s   Table 1 : Power and execution time of T1,T3, and T5 when power is fixed and variable during the execution of tasks the example considers a processor model with a fixed power dissipation independent of the computation carried out, so the processor runs constantly at 5W . Since energy is the product of power and execution time, the total energy consumed by the processor is 5W · 3s + 5W · 10s + 5W · 6s = 95J, if the processing element is switched off after the computation. Energy dissipated by the PE could be reduced if its supply voltage is lowered at the expense of increased execution time. This increased computational time can be tolerated due to the availability of idle time which evokes out of the difference between required computation time and the time constraint of the application. Here it has been assumed that the time constraint is greater than the execution time of the tasks which is a valid assumption since it is very unlikely that an IP core matches exactly the required performance specification, leading to overdesign. Assuming this, the execution time can be increased from 19 . From Equation (4) it follows that energy is reduced from 95 to 79.79J. This example has assumed that the power dissipated by the processor core is independent of the operations involved in the computation, which is only valid for a certain type of processor cores [15] . Whereas the power dissipation of many modern IP cores does depend on the computation carried out [10] , as shown in Table 1 and Figure 2b ) for example. This shows that the power varies among the tasks between 2 to 5W . Computing the energy consumption under consideration of the varying power model leads to E = 5W · 3s + 2W · 10s + 3W · 6s = 53J. Similar to the previous example we could find a single supply voltage such that the deadline is just meet, by extending the execution time from 19 to 22s, resulting in V dd = 3.03V and E = 44.52J. However this is not the optimal solution (minimising energy) since a different but unique supply voltage for each task could further improve the energy consumption which is main aim of the proposed technique. For example: Running the tasks at V dd 1 = 2.79V , V dd 2 = 3.12V , and V dd 3 = 3.01V would result in an total energy dissipation of E = 43.55J and still meet the deadline t d = 22s. Note that all the above mentioned results were obtained assuming that the threshold voltage V t is 0.8V .
Fixed power Varying power
Task P(V max ) t(V max ) P(V max ) t(V max ) T1
Problem formulation
As given in subsection 2.1 the system is modelled out of independent executing tasks therefore the PEs do not influence each other and the minimised energy consumption E will be achieved when all PEs consume a minimum amount of energy. Hence it is adequate to formulate the optimisation problem for a single PE.
subject to
where N is the number of task.
Find for all task of the processing element a performance factor S i ∈ R + : 0
is not violated and the energy consumption is minimised.
Given the motivational example in subsection 2.3 and problem formulation above the aim of the following section is to introduce an effective and efficient iterative improvement technique to solve the energy optimisation problem by assigning appropriate voltages to each task under consideration of power variations among tasks.
Proposed algorithm
A iterative improvement algorithm for the variable voltage problem is formalised in this section. It assigns voltages to each task in the schedule of the processor core such that the energy is minimised. The selected voltages are not fixed a priori, unlike [4, 5, 7, 6] and the average power consumption of the processor can vary among the tasks, unlike [4, 5, 6, 8] . The algorithm takes advantage of the fact that reducing the supply voltage, i.e. slowing down the PE, results always in better energy consumption (Figure 1 ). The algorithm is shown in Figure 3 . We now explain the algorithm using the Table 3 : Evaluation task sets at V max t idle = t d − t exe = 3s. In its first step the algorithm divides this idle time into Q equal time pieces, where Q is the number of iterations used by the algorithm to converge. Its value is set apriori as an input to the algorithm and in this example it is Q = 12, resulting in ∆t = 0.25s. In the second step an initial lookup table is generated.
This table contains the energy savings ∆E of each task when its execution time is extended by ∆t. Asides from that the new execution times have to be stored to allow the updating of ∆E during the iterations. So this initial table would look like Table 2 . In each iteration the algorithm goes through three steps shown in step 3 of Figure 3 . The first iteration of our example would select task T1, extend it to 3.50s and change ∆E to 1.14J. The loop in step 3 finishes after all Q time pieces of size ∆t are distributed among the tasks. In the final step the algorithm calculates the supply voltages for each task out of equation (3) using the execution time at maximal supply voltage t i (V max ) and the found extended execution time t i (V max ) + ∑ ∆t i reduced by ∆t.
Experimental Results
To demonstrate the efficiency of the proposed technique in reducing the energy consumption of a PE a number of examples are given. Nine randomly generated task set examples (t 10 1 to t 100 3) ranging from 10 to 100 tasks were generated, resulting in the total execution time and energy consumption at V max as shown in Table  3 . Table 4 shows the results of applying the proposed algorithm to these various task sets when the time constraint of the appli- Table 4 : Energy optimised results after running the proposed algorithm with 1 million iterations on the evaluation task sets cation increase by 10, 20 and 50% with respect to the execution time at V max . As it can be seen that the proposed technique reduces the energy consumption of all task sets. For the task sets example t 100 1 the energy consumption was reduced from 2815J to 2393.23J at the expense of increased computation time from 560s to 616s which can be tolerated due to the system specified time constraint t d = 560s + 10% = 616s. The computed supply voltages range was 2.79 to 3.3V . The energy could be further improved by relaxing the time constraint to t d = 560 + 50% = 840s resulting in E = 1635.97J and a supply voltage range from 2.24 to 3.3V . The proposed iterative improvement algorithm has a time complexity polynomial to the number of tasks N, while its space complexity is linear. Figure 4 shows that the proposed algorithm converges Here the number of iterations reflects the computational time of the algorithm. On a PentiumIII/500MHz PC running Linux each iteration took on average 1.47 to 4.59ns for 10 and 100 tasks, respectively. Running the proposed algorithm with 1000 iterations (execution time < 5ms) resulted for all given task sets in solution within a 1% margin with respect to the results shown in Table 4 .
Conclusion and future work
This paper has presented an efficient technique to solve the variable voltage energy minimisation problem under consideration of variations in the power dissipation depending on task. Furthermore the possibility of continuously variable voltages were exploited to reduce the energy consumption. It was shown that energy can be reduced by 12 to 42% if the idle time varies between 10 and 50% of the execution time. Working progress will investigate the feasibility of scheduling algorithm for different system models.
