Abstract
Introduction
Scheduling for distributed embedded systems is becoming more important. Scheduling of tasks on a distributed architecture is a resource allocation problem and there are two kinds of allocation policy: static and dynamic. The dynamic policy is more efficient and its implementation is also simpler. However, the disadvantage is the overhead both in program size and in execution time. Static scheduling minimizes the overall execution time drastically reducing overheads but all the properties of the application, including its environment must be known at compile time. Therefore, static policy is appropriate for the implementation of real-time control, signal and image processing algorithms on embedded systems, where resources and time are hardly limited and where the algorithm and its environment are well-known.
Today, most of the distributed embedded systems operate in open environments where both workload and available resources are difficult to predict such as mobile vehicles, robots and so on [3] [5] . We present a feedback control distributed static scheduling (FC-DSS) system that combines feedback control scheduling and static scheduling. The proposed approach results in lower overhead and better resource utilization, which makes it attractive for embedded systems with limited resources.
Feedback controlled static scheduling framework
We use the AAA heuristic [2] to obtain the static schedule of the algorithm application. AAA is based on the complete knowledge of task execution times. When it is difficult to define the workload and execution times, systems may not meet the deadline. To overcome this problem, we propose to use average execution times instead of worst case and an additional feedback control loop to handle this unestimated characteristic of the system.
Algorithm, architecture and task model
The algorithm specifying the application to be implemented is modeled by a data-flow graph (denoted with G al ). Each vertex represents a task and each directed edge represents a data-dependence. The data-flow graph is executed periodically and each execution is called an iteration. An example of algorithm graph is given in Figure 1 . The period of the iteration is the deadline for the completion time of all the tasks. This hard real-time constraint is difficult to meet when there exists a lack of prior information about the execution times.
The architecture is modeled by a graph (denoted with G ar ) where each vertex is a processor or a communication medium (bus, link, etc.) and each non-directed edge is a connection between a processor and a medium. Processors (rep. media) execute sequentially computations (resp. communications) tasks (see Figure 1 ).
Static scheduling heuristic
We briefly present the static scheduling heuristic implementing this solution. Let O sched (n) denote the list of already scheduled tasks and O cand (n) denote the list of candidate tasks built from the algorithm graph at step n ≥ 0. A task is candidate when all its predecessors are already scheduled. Initially, O sched (0) = ∅. Using a cost function called schedule pressure, one operation is chosen to be scheduled at step n. The heuristics computes first the "the earliest start date from start" S(n)(τ i , p j ) for each task τ i and processor p j∈{1,2,··· ,n} . S(n)(τ i , p j ) takes into account the communication times between τ i and its successors and predecessors when they are distributed on different processors. Thus the schedule pressure σ is computed as:
where ET i1,pj is the execution time of τ i on processor p j ; this value is given in p j 's characteristics lookup table. Note that among k versions of τ i , the longest version is utilized in the heuristics. At each step n, one task is selected and implemented on a processor until all tasks are scheduled according to the following algorithm:
Compute the scheduling pressure for each τ i ∈ O cand (n) and keep the result for each operation:
Update list of candidates and scheduled tasks:
For further detail about AAA, one may refer to [2] .
Imprecise computations revisited
Imprecise computations allow programmer to identify some tasks as more important than others. It includes a number of implementation techniques such as milestone, sieve, and multiple-version that are presented thoroughly in [4] . In the present work, multiple-version implementation is considered. This means that each task is composed of many versions such as primary, secondary and so on. The primary one is the longest version that produces a precise result, whereas the other versions are shorter and they produce imprecise results. We propose an imprecise computation which assumes that each task τ i ∈ G al is described with a tuple {I, ET}. Each task τ i has k versions, i.
Since the target architecture is heterogeneous, the execution times for a given task can be distinct on each processor. Thus, each version has different execution times on n different processors, i.e. ET = {{ET i1,p1 , ..., ET i1,pn }, ..., {ET ik,p1 , ..., ET ik,pn }} (with the assumption that ET i1 > ET i2 > ... > ET ik ). Note that these execution times are average instead of worst case execution times in order to achieve higher CPU utilizations in the system. Feedback control loop provides a satisfactory performance even when ETs vary from their estimated average values during run-time.
Feedback control loop
We insert a feedback control algorithm into the application algorithm. A model of the loop is given in Figure 2 . The control loop is invoked once at each iteration of the global schedule. The controlled variable denoted with y is the normalized completion time of the whole algorithm, i.e. the completion time divided by the period of the iteration T . For the rest of the paper, the term completion time should be interpreted as normalized completion time. The reference variable y ref is the desired value of y. We implement this feedback control loop with the algorithm graph G fc given in Figure 3 . Thus, our proposed model extends the whole algorithm to be scheduled to G al ∪ G fc . We explain comprehensively Monitor, Controller and TLA parts below.
Monitor
The monitor measures the completion time of the whole algorithm. For any processor, let us denote the completion time of the last task in one iteration with C i , and then normalize it with m i = Ci T . Here, m differs from CPU utilization with the fact that CPU idle time when tasks wait for data from their predecessors is also included in m calculation. The reason of using m as the controlled variable is that the end of completion time of the algorithm is not allowed to exceed period T . Using m instead of CPU utilization as the controlled variable has also another advantage that CPU utilization cannot exceed 100% whereas m can. Therefore we do not have the difficulties of non-linear analysis and a limitation on y ref (for more explanation, please refer to [3] ).
where t ∈ N denotes the discrete time index, i.e. the actual timet can be computed ast=Tt.
Controller
We use P (Proportional) control function to calculate the control output. The reason of using a simple P controller is that its implementation introduces less overhead, and as already remarked in [5] and [3] , a P control structure is enough flexible to stabilize the system.
Controller first calculates the error, executes P control action and calls TLA for the selected processor. The pseudo-code of the controller is given below:
if(e(t) < 0) then proc=Select CPU having highest m and minimum one degradable task(); else proc=Select CPU having highest m and minimum one enhancable task(); ∆u(t) = K p · e(t); T LA(proc, ∆u(t)); } ∆u is the output of the P control action. ∆u > 0 means that the requested completion time of the algorithm should be increased, and ∆u < 0 means that the requested completion time of the algorithm should be decreased. The controller then calls task level actuator (TLA) to change the current requested completion time from u to u + ∆u. In other words, negative error means that the completion time of the algorithm exceeds the desired value and some tasks must be switched to their secondary versions so as to decrease the completion time. In this case, controller selects the processor having the highest completion time (i.e. the processor that defines the completion time of the whole algorithm) and minimum one degradable task. Then, it executes the controller and TLA for that processor. Degradable means that task's version can be changed from primary to secondary so as to decrease the completion time. If the error is positive, then the controller selects the processor having the highest completion time and minimum one enhancable task in this case.
The schedule of the whole algorithm is given in Figure  4 . Since y = 1.01 > y ref and there may exist unpredictability in the execution times of the tasks, the feedback control loop dynamically tries to keep y at the desired level specified with y ref .
Task level actuator
The task level actuator (TLA) manipulates the requested completion time of the algorithm by changing the versions of the tasks. For instance, if it changes the version of task τ i from τ ip to τ ir , it adjusts the requested completion time by calculating ET ir − ET ip . TLA, then sends the information about the chosen versions of the tasks to the LC i (Level Controller) tasks that adjust the versions to be executed in the next iteration. The pseudo-code of TLA is as follows: 
Control theory based analysis
Designing a stabilizing controller is an important concern [3] [1] . In stability analysis, we first derive the state space form of the closed-loop system given in Figure 2 . Starting from the control input, TLA changes the estimated completion time of the algorithm for the next period at every sampling instant t such that u(t + 1) = u(t) + ∆u(t). Since the precise execution time of each task is unknown and time varying, the actual completion time (denoted with y) may differ from the estimated completion time u(t) such that y(t) = G · u(t) where G represents the maximum extent of workload variation in terms of requested completion time. We assume that the monitor measures the actual completion time precisely, i.e. y (t) = y(t).
Then, we derive the error and the control signals as e(t) = y ref − y (t) and ∆u(t) = K p e(t) respectively. The closed-loop system is therefore described by
A classical stability criterion for discrete time linear systems guarantees that u(t) is asymptotically stable around its equilibrium, if the following condition is fulfilled:
Assuming that the stability condition 3 is satisfied, control system guarantees zero steady state error, i.e.
Simulations
In the experiments, we used the example application algorithm and architecture given in Figure 1 . The execution times of primary versions of the tasks and the communication times are generated randomly. The secondary versions 
Conclusion
In this paper, we presented a feedback control static scheduling technique for distributed embedded systems. Static schedule is created in accordance with the AAA heuristic and an additional feedback control algorithm handles the possible unestimated behaviors in the workload. We provide a stability analysis to tune the parameters. The results are demonstrated through a simple experiment. The proposed scheduling system should also be tested under more realistic workloads.
