In this paper we present an approach to mapping and scheduling of distributed embedded systems for hard real-time applications, aiming at minimizing the system maifiation cost. We consider an incremental design process that starts from an already existing system running a set of applications. We are interested to implement new finctionaliiy so that the already running applications are disturbed as little as possible and there is a good chance that, IateK new firnctionality can easily be added to the resulted system. The mapping andschedulingpmblem are considered in the context of a realbtic communication model based on a TDMA protocol.
INTRODUCTION
Distributed embedded systems with multiple processing elements am becoming common in various application areas [4] . In [12] , for example, allocation of processing elements, as well as process map ping and scheduling for distriiuted systems a~. formulated as a mixed integer linear p m p " i n g (MILP) problem. A disadvantage of this appach is the complexity of solving the MILP problem. Therefore, alternative problem formulations and solutions based on efficient heuristics have been pmposed [ 1, 2, 8, 14] .
Although much of the above work is dedicated to specfic Another characteristic of research efforts conceming the codesign of embedded systems is that authom concentrate on the design, &om scratch, of a new system optimized for a particular application. For many application areas, however, such a situation is extremely uncommon and only rarely appears in design practice. It is much more likely that one has to start from an already existing system running a certain application and the design problem is to implement new functionality (including also upgrades to the existing one) on this system. In such a context it is very important to make as few as possible modifmtions to the already running applications. The main mason for this is to avoid unnecessarily large design and testing times. Performing modifications on the (potentially large) existing applications increases design time and, even more, testing time (instead of only testing the newly implemented functionality, the old application, or at least a part of it, has also to be retested). However, this is not the only aspect to be considered.
Such an incremental design process, in which a design is periodically upgraded with new features, is going through several iterations. Therefore, after new functionality has been implemented, the ded systems in the context of an incmnental design process as outlined above. This implies that we perform mapping and scheduling of new functionality so that certain design constraints are satisfied and:
a. alreadynmningappWmaredistu&edaslit&leaspossible;
b. there is a good chance that new fimctionality can, later, easily
In [Ill we have discussed an incremental design strategy which excludes any modifications on already running applications. In this paper we extend our approach in the sense that remapping and scheduling of Currently implemented applications m allowed, if they are needed in order to BccOmmodatB the new functionality.
In this context, we propose a heuristic which finds the set of old applications which have to be remapped together with the new one such that the disturbance on the running system (expressed as the total cost implied by the modifications) is minimized. Once this set of applications has been determined, mapping and scheduling is performed according to the requirements stated above.
Supporting such a design process is of critical importance for current and future industrial practice, as the time interval between successive generations of a product is continuously decreasing, while the complexity due to increased sophistication of new functionality is growing rapidly.
The paper is divided into 6 sections. The next section p e n @ some preliminary discussion. Section 3 introduces the detailed problem formulation and the quality mebics we have defined Our mapping and scheduling strategies are outlined in Section 4, and the experimental mults are. presented in Section 5. The last section presents our conclusions.
PRELIMINARIES 2.1 System Architecture
We consider architectms consisting of processing nodes connected by a broadcast communication channel. Communication between nodes is based on a TDMA protocol such as the TTP [7l which mtegrates a set of services necessary for fault-tolerant real-time systems.
The communication channel is a broadcast channel, so a message sent by a node is received by all the other nodes. Esch node can transmit only during a predetermined time interval, the's0 called TDMA slot S, (Figure 1 In order to ihplement an application, represented as a set of process graph, the designer has to map the processes to the system nodes and to derive a schedule such that all deadlines are satisfied. However, finding a valid schedule is not always possible, either because there are not enough available hardware resources, or the resources are not intelligently allocated to the already running applications. Thus, in order to produce a valid solution, the resources have to be reallocated through rescheduling and remapping of some of the already running applications or, in the worst case, the architecture has to be modjfied by adding new resources.
In Figure 2 we consider a single processor system with three applications, A, B and C, each with a deadline DA, DB and Dc. Application C is depicted in more detail, showing the two processes PI and P2 it is composed of. Let us suppose that the already running applications are A and B, and we have to implement C as a new application. If A and B have been mapped and schedules like in Figure 2a , we will not be able to map application C (in particular, process P2). With a mapping of A and B like in Figure 2b and c, we are able to map both processes of C, but no schedule can be produced which meets the deadline D e If A and B are implemented like in Figure 2d , application C can be successfully implemented. Two aspects can be highlighted based on this example: 1. If applications A and B are implemented like in Figure 2a (or like in 3b or 3c), it is possible to correctly implement application C only with modifying the implementation of application B. 2. If during implementation of application B we would have taken into consideration that sometimes in the future an application like C will have to be implemented, we could have produced a schedule like the one in Figure 2d . In this case, application C could be implemented without any modification of an existing : application.
PROBLEM FORMULATION
We model an application ram, as a set of process p p h s Cif reach with a period TGi and a deadline DG~S TG~. For each process Pi in a process graph we know the set Gi of potential nodes on which it could be mapped and its worst case execution time on each of these nodes. The underlying architecture is as presented in ..section 2.1. We consider a non-preemptive static cyclic scheduling $olicy for both processes and message passing.
Our goal is to map and schedule an application rcu,.-, on a system that-already implements a set ly of applications, considering the foUow@g requirements: Figure 3 we present the graph corresponding to a set of ten applications. Applications r6, r8, r9 and rib depicted in black, are hzen: no modifications are possible to them. The rest of the applications have the remapping cost Ri depicted on their left. r7 can be remapped with a cost of 20. If r4 is to be reimplemented, this also require.^ the mcdification of r7, with a total cost of 90. In the case of r,, although not hzen, no remapping is possible as it would trigger the need to remap r 6 which is frozen. Given a subset of applicatio G ly, the total cost of modifying the applications i n n is R(n) -x R , ,
To each application r, E Vthe designer has associated a cost Ri of reimplementing r,. Such a cost can typically be expressed in hours needed to perform retesting of ri and other tasks connected to the remapping and rescheduling of the application. Remapping of ri and the associated rescheduling can only be performed if the ' If a set of applications have a circular dependence, such that the modification of any one implies the nmapping of all the others in that set, the set will be represented as a single node in the graph. . '*.
process graphs that capture the applications and their deadlines are available. However, this is not always the case, and in such situations the application is considered frozen.
Characterizing Future Applications
What do we suppose to know about the Family rfiNTe of applications which do not exist yet? Given a certain limited application area (e.g. automotive electronics), it is not masonable to assume that, hased on the designers' previous experience, the ~t u r e of expected future functions to be implemented, p r o m of plwious a p plications, available uncomplete designs for h e versions of the product, etc., it is possible to characterize the family of applications which could possibly be added to the current implementation.
This is an assumption which is basic for the concept of incremental design. Thus, we consider that, concerning the future applications, we know the set S,=(r mi,,,...fi,...fm,} of possible worst case execution times for processes, and the set Sb={bmi,, ,... bi ,... b-} of possible message sizes. We also assume that over these sets we know the distributions of probability&#) for tcS, andfib@) for beSb For example, we might have worst case execution times S,={50, 100,200,300,500 ms}. If t h m is a higher probability of having processes of 100 ms, and a very low probability of having processes of 300 ms and 500 ms, then our distribution function&,(t) could look like this: /,~50)=0.20, f~,(100)=0.50, fsr(200)=0.20, &r(300)=0.05, and&#00)=0.05.
Another iformation is related to the period of process graphs which could be part of future applications. In par&icular, the smallest expected period Tmh is assumed to be given, together with the expected necessary processor time t,,,&, and bus bandwidth b,,,&, inside such a period Tmh. As will be shown later, this information is used in order to provide a fair distribution of slacks.
The execution times in S, as well as fried are considered relative to the slowest node in the system. All the other nodes are characterized by a speedup factor relative to this slowest node.
33 Quality Metrics A designer will be able to map and schedule an application rhkm on top of a system implementing y and ram,,,, only if there are sufficient resowes available. In our case, the resources are processor time and the bandwidth on the bus. In the context of a non-preemptive static scheduling policy, having free resources translates into having free time slots on the processors and having space left for messages in the bus slots. We call these free slots of available time on the processor or on the bus, sIack. It is the size and distribution of the slacks that characterizes the quality of a certain design alternative from the point of view of its potential to accommodate fume applications. In this section we introduce two criteria in order to reflect the degree to which one design aItemative meets the r e e m e n t b) presented at the beginning of section 3. reflects how well the resulted slack sizes fit to a future application. The slack sizes resulted after implementation of l -, , , , on top of ty should be such that they best accommodate a given family of applications rhhrrer charackaized by the sets before. Let us consider the example in Figure 2 , where we have a single processor with the applications A and E implemented and a future application C which consists of the two processes, Pl and P p It can be observed that the best configuration, taking in consideration only slack sizes, is to have a contiguous slack. Such a slack, as depicted in Figure 2c and d, will best accommodate any future application. However, in reality, it is almost impossible to map and schedule the current application such that a contiguous slack is obtained. Not only is it impossible, but it is also undesirable from the point of view of the second design criterion, discussed below. expresses how well the slack is distributed in time. Let P i be a process with period Tp that belongs to a future application, and M(Pi) the node on h i c h Pi will be mapped. The worst case execution time ofPi is r j ?
. 
Cost Function and Exact Problem Formulation
In order to capture how well a certain design alternative meets the requirement b) stated in section 3, the metrics discussed before are combined in an objective function, as follows: Cp and Cf are those components of the two metria that capture the slack properties on processors, while Cl"' and Cj" are calculated for the slacks on the bus. Our mapping and scheduling strategy will try to m i n i i e this function.
The first two terms measm how well the resulted slack sizes fit to a future application (first criterion), while the second two terms reflect the distribution of slacks (second criterion). We call a valid solution that mapping and scheduling which satisfies all the design constraints (in our case the deadlines) and me& the second At this point we can give an exact formulation to our problem.
Given an existing set of applications w which are abeady mapped and scheduled, and an application ram, to be implemented on top of ty, we are interested to tind the subset R s y of old applications to be remapped and rescheduled such that we produce a valid solution for ram, U R and the total cost of modification R m ) is "iied.
Once such an C2 is found, we are interested to minimize the objective function C for the set ram,,, u a, considering a family of future applications characterid by the sets S, and sb, the functions.&, andfib as well as the parameters 7' -, td, and b , , d
MAPPING AND SCHEDULING STRATEGY
As shown in Figure. 4, our mapping and scheduling strategy WS)
has two steps. In the first step we try to obtain a valid solution for I--, , , u Q so that R@) is " I . Starting from such a solution, a second step iteratively improves on the design in order to minimize the objective function C.
The Initial Mapping and Scheduling
The 6rst step of MS consists of an itemtion that hies subsets R z; y with the intention to 6nd that subset ht=R,,,, which produces a valid solution for ram U R such that R(n) is " i z e d . Figure 4 , hies to improve on the design in order to "ize the objective hction C. In a similar way as dur-
ing
Step 1, we iteratively improve the design by successive moves.
In [ll] we introduoed a heuristic with the goal of guiding the moves discussed above. Its intelligence lies in how the moves are se- 
Minimizing the Modification Cost
The first step of our mapping strategy described in Figure 4 iterates on subsets Sr searching for a valid solution which also minimizes the total modification cost R(Q). As a first attempt, the algorithm searches for a valid implementation of rC-, without dislurbmg the existing applications (Sr=0). If no valid solution is found successive subsets Sr produced by the function Nextsubset are considered, until a terminating condition is met. The performance of the algor&m, in terms of runtime and quality of the solutions produced, is strongly influenced by the implementation of the function NextSub--set and the termination condition. They determine how the design i;aCe is explored while testing different subsets a of applications.
Exhaustive Search (ES)
In order to find a, , , , ", the simplest solution is to try successively all the possible subsets Q s y. These subsets are generated in the ascending order of the total modification cost, Starting &m 0. The termh$tion condition is fulfilled when the h t valid solution is generated. Since the subsets are generated in ascending order, accord ir cost, the subset Sr that first produces a valid solution is ubset with the minimum modification cost. The generation of subsets is performed according to the graph Fthat cha&terhs the existing applications (see section 3.1).
the example in Figure 3 This approach, while finding the optimal subset Sr, requires a large amount of computation time and can be used only with a small number of applications.
Ad-hoc Solution (AH)
If the number of applications is larger, a possible ad-hoc solution could be based on a greedy strategy which, starting h m ik0, progressively enlarges the subset until a valid solution is produced. The algorithm looks at all the non-&zen applications and picks that one which, together with its dependencies, has the smallest modification cost. If the new subset does not produce a valid solution, it is enlarged by including, in the same W o n , the next application with its dependencies. This greedy expansion of the subset is continued until the set is b e enough to lead to a valid solution or no application is left. For the example in Figure 3 the call to NaxtSubset(0) will produce R({r7})=20. and will be successively enlarged to R({r7. r3))=70, R({r, r3, r2))=140 (r4 could have been picked as well in this step because it has the same modification cost of 70 as r2 and its dependence r7 is already in the subset), R({r7, r3. r2, r4})=2lO, and so on.
While this approach finds very quickly a valid solution, if one exists, it is possible that the total modification cost is much higher than the optimal one.
Subset Selection Heuristic (SH)
An intelligent selection heuristic should be able to identi@ the mmns due to whkha valid solution has not been tbund. Such a failure can have two possible causes: an initial mapping which meets the deadlines has not beenp"3, or the. second criterion is not satisfied.
Let us investigate the first reason. Ifan application ri is to meet its deadline D,, all its pn#.~sses PE ri have to be scheduled inside their [ASAP, ALApl intervals. Ini&lMappingScheduling (IMS) fails to schedule a process inside its [ASAP, ALAP] interval if there is not enough slack available on any processor, due to other processes scheduled in the same interval. In this situation we say that them is a conflict with processes belonging to other applications. We are interested to find out which applications are responsible for conilicts encountered by our r,,,,,, and not only that, but also which ones areflerible enough to move away in order to avoid these conflicts.
IMS determines a metric Ai that characterizes the degree of conflict and the flexibility of application ri in relation to r,,,-,. A set ons Q will be characterized, in relation to r , , , by : ; : ; Pl i x *I . The metric A@) will be used by our subset seleo tion he&stic if IMS has failed to produce a solution which satisfies the deadliies. An application with a larger Ai is more likely to lead to a valid schedule if included in Q. In Figure 5 we illustrate how this metric is calculated. Applications A, B and Care scheduled on three processors PI, P2 and P3, and our goal is to implement the current application D. At a certain moment IMS comes to the point to place process D f E D. However, it is not able to place DI inside its [ASAP, ALAP] interval I, because there is not enough free slack available inside I on any of the processors. We are interested to determine which of the applications A, B and C are more likely to lend h e slack for DI if remapped. Therefore, we calculate the slack resulted after we move away processes h m the interval I. For example, the resulted slack available after remapping application C (moving process C~E C either to the left or to the ri t inside its own [ASAP, A L A 4 interval) is of sue 111 -minp?f I C~I ) . n u s , we inmment & with 6, = 14 -min(l~$, IC] I -ID 1 1 . The inmments SB and SA to be added to AB and AA respectively, are also presented in Figure 5 . IMS continues with the other processes of application D (after assuming that process DI has been scheduled at the beginning of interval I). As result of the failed attempt to map D, IMS will produce the metrics AA, AB, and &.
If the initial mapping was successll, the first step of MS could fail during the attempt to satis@ the second criterion. In this case, the metric Ai is computed in a different way. It will capture the potential of an application r, to improve the metric C2 if remapped together with ram, Thus, for the improvement of C2 we consider a total number of moves from all the non-frozen applications (determined using Potential-MoveC&)). For each move that has as subject P p ri, we increment the metric A, with the predicted im- Our heuristic SH will then start by finding the ad-hoc solution produced by the AH algorithm (this will succeed if there exists any solution) with a corresponding cost RflR(Q& and a AAH=A(QAH). SH now continues by trying to find a solution with a more favorable Q (a smaller total cost R). Therefore, the thresholds R-=RAH and Amin=AAH/n (for our experiments we considered n=2) are set. For generating new subsets Q, the function " S u bset now follows a similar approach like ES but in a reverse direction, towards smaller subsets, and it will consider only subsets with a smaller total cost then and a larger A then & (a small A means a reduced potential to eliminate the cause of the initial failure). Each time a valid solution is found, the current values of Rmax and A,,,,,, are updated in order to fUrther restrict the search space. The heuristic stops when no subset can be found with D h n . or a certain imposed limit has been reached (e.g. on the total number of attempts to find new sets).
EXPERIMENTAL RESULTS
For evaluation of the proposed strategies we first used process graphs of 80,160,240,320 and 400 processes, represeOting the application ated for each graph dimension, resulting in a total of 150 graphs. We considered an architeclux consisting of 10 nodes. For the communication channel we considered a h;lnsrmss ion speed of 256 kbps and a length below 20 meters. The maximum length of the data field in a bus slot was 8 bytes. Experiments were run on a SUN Ultra 10.
The first results comm the quality ofthe solution obtained with our mapping strategy MS using the search heuristic SH compared to the case when the ad-hoc approach AH and the exhaustive search ES are use& For each of the five graph dimensions for r , , , Figure 6 we have eliminated those sittations in which no valid solution could be produced by MS.
Another importatlt aspect to be proven by experiments is the extent to which the mapping strategy proposed in the paper really facilitates the implementation of @&re applications. For these experiments we have considered that no modiications are allowed to the applications in y. We have used an existing set of applications y~ consisting of 400 processes, with a schedule table of 6s on each processor. and a slack of about 50% of the total schedule size. Figure 7 shows the number of successll implementations in the two cases. In the case r,,-, has been mapped with MS*, this means using the design criteria and metrics proposed in the paper, we were able to find a valid solution for 65% of the total process graphs considered. However, using AM to map ram, bas led to a situation where IMS is able to find schedules which satisfy the deadlines for only 27.5% cases. When re,,, grows to 160 processes, only MS* is able to find a mapping of that supports an incremental design process, accommodating more that 60% of the future applications. If the remaining slack is very small, after we map a ram, of 240, it becomes practically impossible to map new applications without modifjhg the current system.
If the mapping heuristic is allowed to modify the existing syxtem, as discussed in this paper, then we are able to increase the total num- The important aspect, however, is that it is obtained not by randomly selecting old applications to be remapped, but by performi"p this selection such that the total modification cost is " m d .
Finally, we considered an example implementing a vehicle cruise controller (CC) modeled using one process graph. The graph has 32 processes and it was to be mapped on an architectuR consisting of 4 nodes, namely: Anti Blocking System, Transmission Control Module, Engine Control Module and Electronic Throttle Module. The period was 300 ms. equal to the deadline. In order to validate our approach, we have considered the following setting. The system \y consists of 80 processes generated randomly, with a schedule table of 300 ms and about 40% slack. The CC is the ram application to be mapped. We have also generated 30 future applications of 40 processes each with the characteriStics of the CC, which are typical for automotive applications. By mapping the CC using MS* we were able to later map 21 of the futrtre applications, while using AM only 4 of the future applications could be mapped. when modifications of the current system were allowed, using MS, we are able to map 24 of the 30 I h r e applications.
CONCLUSIONS
We have presented an approach to the incremental design of distributed hard real-time embedded systems. Such a design process satisfies two main requirements when adding new functionality: already nmning applications m dishukd as little as p i b l e , and there is a good chance that, later, new functionality can easily be mapped on the resulted system. Our approach assumes a non-preemptive static cyclic scheduling policy and a realistic communication mode1 based on a TDMA scheme. We have introduced two design criteria with their c m p o n ding metrics that drive our mapping strategy to solutions supporting an incremental design process. Three algorithms have been proposed to produce a " a 1 subset of applications which have to be remapped and scheduled in order to implement the new functionality. ES is based on a, potentially slow, branch and bound strategy which 6nds an optimal solution. AH is very fast but produces solutions that could be of too high cost, while SH is able to quickly produce good quality results. The approach has been validated through several experiments.
