This paper presents real-time scheduling methods supporting a prototyping language for embedded systems. Rapid prototyping of embedded systems can be accomplished using a Computer Aided Prototyping System (CAPS) and its associated Prototyping Language (PSDL) to aid the designer in handling hard real-time constraints. The prototyping language PSDL is designed based on the real-time model of CAPS. The language models time critical operations with maximum execution times, maximum response times, minimum or fixed periods, and deadlines. Tiing constraints in PSDL are described along with scheduling algorithms relative to a series of hardware models used in the CAPS which includes multi-processor configurations.
INTRODUCTION
Large scale, parallel and distributed, hard real-time systems are important to both civilian and military operations. The correctness of these systems depends not only on the logical result of computation, but also on the time at which the results are produced [29] . Examples of hard realtime systems include air traffic control systems, telecommunication systems, space shuttle avionics systems, C31 systems, and future SDI systems. Most of the hard real-time systems are specialized and complex, require a high degree of fault tolerance, and are typically embedded in larger system. Real-time constraints and resource limits make it very difficult to adequately specify, design, implement, and modify such systems. Prototyping can help to overcome these difficulties.
Rapid prototyping can be used to reduce the risks of producing systems that do not meet customer needs [23] . It helps customers visualize system behavior prior to detailed implementation of complex embedded control systems with hard real-time constraints, helps designers asses feasibility of implementation, and provides the means for stabilizing and validating the requirements for these systems. The Computer Aided Prototyping Systems (CAPS) [21] supports an iterative prototyping process characterized by exploratory design and extensive prototype evolution. It enables the engineers to produce complex systems that match user needs and reduces the need for expensive modifications after delivery by providing automated decision aids for designers and customers. Demonstrations of proposed system behavior can be effective for validating system requirements, especially for new or unfamiliar application areas. Unlike traditional approaches to software development, which produce working code only near the end of the process, rapid prototyping, when utilized during the early stages of the development life cycle, allows validation of the requirements, specification, and initial design before valuable time and effort are expended on implementation software.
An embedded software system is part of some larger system, which it monitors and controls. The control functions are usually subject to timing constraints that must be met even under the worst possible operating conditions, because missing a timing constraint can cause catastrophic failures in such systems. Feasible requirements for large embedded systems are difficult to formulate, understand, and meet without extensive prototyping. Computer aid is the key to rapid construction, evaluation, and evolution of such prototypes. Analysis and measurement of prototype designs provides upper bounds on the time required to execute particular functions, and experiments with simulated environments provide information about the accuracies and response times required to keep external physical systems within desired operating regions.
An automated prototyping methodology, with CAPS tool-support for validating functional requirements and verifying design specifications early in the development life cycle, increases designer productivity and allows the customer to gain more confidence that the final software product is feasible. The automated support provided by CAPS enables rapid prototyping and practical validation of timing constraints in requirements for hard real-time systems. The CAPS tools are based on the Prototyping System Description Language (PSDL) [22] which is a high-level language designed specifically to support the conceptual modeling of real-time embedded systems, including the timing aspects of hard real-time systems in single and multi-processor hardware configurations. A good modeling strategy and declarative timing constraints help to (1) de-couple the behavioral aspects of a system from its timing properties to allow independent analysis of these two aspects, and (2) organize timing constraints in a hierarchical fashion, to allow independent consideration of smaller subsets of timing constraints.
The prototyping language PSDL supports a modeling strategy based on dataflow graphs augmented with non-procedural timing and control constraints. It provides a separation of timing and behavioral modeling, which supports computer-aided prototyping of both timing and behavior in the system development. Real-time requirements in the system development are modeled as PSDL specifications, which are semi-graphical dataflow descriptions. PSDL specifications are executable, and are analyzed via the CAPS real-time analyzers -the static and dynamic schedulers.
The static scheduler reads the PSDL source and extracts operators with critical timing constraints. It then creates a feasible static schedule if one exists. A feasible schedule guarantees that all executions of time-critical operations will meet their specified timing constraints. The static schedule is a task that executes time-critical operators at the scheduled times.
Operators without timing constraints, which are not included in the static schedule, are passed to the dynamic scheduler to create a dynamic schedule. The dynamic schedule acts as a run-time executive when exercising the system. It schedules operators without timing constraints by using spare capacity in the static schedule. It handles run-time exceptions and hardware/opera-tor interrupts. It communicates with the user interface during prototyping runs. Thus, it acts like a miniature operating systems. Although the problems involved in this subsystem are interesting, the focus of this paper is on the static scheduler that guarantees hard real-time constraints.
lines -an operation with a soft deadline can be skipped if the deadline cannot be met. The Flex language is designed to support computations that can be interrupted to deliver approximate results, where the accuracy of the results improves as more computation time is allowed [18, 27] .
The main differences between the treatment of real-time constraints in rapid prototyping and the corresponding treatment in the design and implementation of the production version of the software are the requirements for flexibility and simplicity. In CAPS, flexibility is provided by automatic procedures for scheduling and code generation and simplicity is provided by the timing constraints of PSDL, which comprise a minimal orthogonal set of constructs sufficient to express most realistic embedded systems, as described in [22] . PSDL has been designed to help the analysts determine how to adjust the proposed functions of the system, the target architecture, or both to ensure that the requirements are feasible and to provide the best service possible to the users of the proposed software. PSDL also provides a description of a proposed design that can be smoothly transformed into a final implementation after the requirements have been validated and the design has been verified. A denotational semantics for PSDL can be found in [ 161.
Our most refined hardware model allows different processor speeds for different processors and different latencies and data rates for links between different pairs of processors. This model is characterized by a vector of processor speeds, a matrix of latencies for communication, and a matrix of maximum data rates. The scheduling problems associated with this model are largely unexplored, but there is reason to believe that finding optimal solutions with respect to this model is computationally complex and that optimal solutions are tractable only for very small problems.
One goal of our current research is to determine whether fast heuristic methods for determining good but non-optimal solutions relative to this model can utilize the hardware more efficiently than optimal solutions relative to more tractable but less accurate hardware models. A positive result appears likely because fast heuristic methods are known for some categories of real-time scheduling problems [29].
SCHEDULING METHODS USED IN CAPS
One way for CAPS to demonstrate the schedulability of a prototype is via the generation of a static run-time schedule that enforces all hard real-time -constraints under the worst case condition. CAPS first uses a Pre-Scheduler to extract the global data flow graph from the input PSDL prototype description and generates three set of data -the set of "time-critical-operators", the set of "link-statements" representing the edges of the global precedence graph, and the set of "nontime-critical operators". It then locates the cycles in the global dataflow graph and updates the pipelining and mutual-exclusion information associated with the corresponding operators. The sets of time-critical-operators and the link-statements are then passed to the Static Scheduler while the set of non-time-critical-operators are passed to the Dynamic Scheduler.
The Static Scheduler first converts all sporadic operators in the "time-critical-operator" file into equivalent periodic operators. It also checks each operator-pair connected by dataflow streams to make sure that both operators have identical periods. Once all the operators are in periodic form, it divides the operators into a set of "harmonic blocks". A harmonic block is defined as a set of periodic operators, where the periods of all its component operators are exact multiples of the base period. The length of a harmonic block, denoted by HL, equals to the least common multiple of the periods of all the operators contained in the block. In the uniprocessor environment, the Static Scheduler will generate only a single harmonic block that contains all operators in the "time-critical-operator" file. The static scheduler then checks the timing requirements of the operators in each harmonic block against a list of necessary conditions for schedulability violation, and calls the SCHEDULE-OPERATORS sub-module to produce a static schedule if the necessary conditions are all satisfied.
The static schedule is a linear table giving the exact execution start time for each time-critical operator and the reserved maximum execution time within which the operator must complete its task. The input to the SCHEDULE-OPERATORS sub-module is a set of periodic atomic nonpreemptive operators { u1,02, . . . , um} and a set of edges present in the global precedence graph.
Given the global precedence graph G and the length of the harmonic block L, we define the constraint-graph CG = (V, E ) as follows: (1) For each operator ui, create a vertex for each instance of ui that must be executed inside the harmonic block. Denote the vertices created by { (q, l), (q, 2), ... , (q, ni)}, where ni = the number of instances of oi that must be executed inside the harmonic block. Note that the SCHEDULE-OPERATORS sub-module never constructs the constraint-graph CG explicitly. It computes the precedence constraints described by the constraint-graph dynamically as it builds the static schedule based on the global precedence graph. if the relative ordering of the tasks (i.e. vertices of CG) in the schedule satisfies the precedence constraints imposed by CG. A static schedule is said to be feasible if the schedule is legal and every task when executed according to the schedule meets its deadline. For each task (oi, k), we define m d i n e s of (oi, k) as the amount of time by which (oi, k) misses its deadline. The cost of a schedule is defined to be maximum tardiness over all tasks in CG. Hence, any legal schedule with a non-positive cost is a feasible schedule.
For brevity, the following notations will be used throughout this section. 
A static schedule is said to be For any vertices v, w in CG, let
We have st(ofirsf) = Ct(Ojirs,) = 0, And for k > 1, we have
SINGLE PROCESSOR SCHEDULING
The existing static scheduling algorithms available in CAPS for uniprocessor environment fall into two categories, the exponential-time optimal algorithms and the fast heuristic algorithms. Algorithms in the first category (exhaustive-enumeration and branch-and-bound) guarantee finding a feasible schedule if one exists, but the exponential running time makes them useless for most problems of practical size. Algorithms in the second category (topological-sort, earlieststarting-time-first, earliest-deadline-first, and simulated-annealing), on the other hand, are very efficient, but they can fail to find the feasible schedule even if one exists. The branch-and-bound, earliest-starting-time-first, and the earliest-deadline-first algorithms were adapted from those presented in [ 11, which were developed originally to schedule independent non-preemptive tasks without precedence constraints. The output of the single processor scheduling algorithms is a linear table S consisting of all the vertices in CG, where ofirst is the first task with an actual starting time of 0. The actual starting time of a task v z ofirst in the schedule S equals where w = the task immediately preceding v in S.
st(v) = max{ ct(w), ready(v)}

EXHAUSTIVE-ENUMERATION
Since the static scheduling problem is NP-hard [32, 33, 351 , systemic global search is the only deterministic method that always returns a feasible static schedule if one exists. Exhaustiveenumeration is a very simple algorithm that inspects the legal schedules one by one and returns the schedule with the minimum cost at the end of the enumeration. (See, for example, [30, pp. 621-632.1 exhaustive search in graphs.) Since we are only interested in obtaining a feasible schedule, we can modify the algorithm so that it stops as soon as a legal schedule with non-positive cost is encountered.
BRANCH-AND-B OUND
The exhaustive-enumeration algorithm, though simple, is not useful in practice due to its exponential time complexity. Hence, branch-and-bound is introduced to cut down the running time of the exhaustive search. The modified algorithm first initializes the upper bound of the cost of the optimum schedule to some large value initially. It then enumerates the legal schedules implicitly like the exhaustive-enumeration algorithm, and updates the bound to the cost of a cheaper legal schedule whenever one is encountered. However, unlike the exhaustive enumeration algorithm, the modified algorithm can eliminate a subset of the legal schedules in a single step based on an estimated cost of a partial schedule. To obtain a lower bound on the cost of a partial schedule S, we first compute, for each unscheduled task v, lower bounds on the starting time 
denoted by est-cost(S), equals
As the branch-and-bound algorithm enumerates the legal schedules implicitly, it compares the est-cost(S) of every partial schedule S generated against the upper bound of the cost of the optimum schedule. If est-cos@) is greater than or equal to the upper bound, all partial and com-"{Cl, c21.
. plete schedules having S as prefix will be eliminated from the search space. Like the exhaustiveenumeration algorithm, the branch-and-bound algorithm also stops as soon as a legal schedule with non-positive cost is encountered.
To obtain the lower bounds on the estimated starting time and completion time of an unscheduled tasks v, we assume that all unscheduled ancestors of v in CG are executed one after another and that there is no idle time interval between them. Given a partial schedule S, with w as the last scheduled tasks in S, we compute the estimated tardiness of an unscheduled tasks (oi, k) as follows:
(1) Let UA(oi, k) be the set of unscheduled ancestor tasks of (oi, k) in CG. The exhaustive-enumeration algorithm, even with the help of branch-and-bound, is still too slow to be useful for rapid prototyping of large systems. Hence, CAPS provides four more efficient heuristic algorithms for scheduling real-time periodic tasks.
TOPOLOGICAL-ORDERING
The topological-ordering algorithm produces a static schedule by sorting the vertices in CG
(1) Initialize the in-degree of each vertex v in CG to the number of incoming edges to v.
(2) While the set Ready is not empty loop topologically as follows.
Initialize the vertex set Ready to { ojrst} and the vertex set Zero-Degree to empty. end. It is easy to see that the schedule produced by the topological-ordering algorithm must be always legal. However, since the algorithm only examines one schedule, it may fail to find a feasible schedule even if one exists. 
EARLIEST-STARTING-TIME-FIRST and EARLIEST-DEADLINE-FIRST
To improve the chance of finding a feasible schedule, heuristics are added to Step (2.1) of the topological-ordering algorithm to alter the way in which vertices are removed from the set Ready. The earliest-starting-time-first algorithm works like the topological-ordering algorithm, except that it always removes the vertex with the earliest ready time among all the vertices in the set Ready in Step (2.1) . The earliest-deadline-first algorithm, on the other hand, always removes the vertex with the earliest deadline among all the vertices in the set P in Step (2.1). 
SIMULATED-ANNEALING
The major drawback of the previous three algorithms is that they all take a hit-or-die attitude. They only find feasible solutions for very simple constraint graphs. One way to increase the chance of finding a feasible schedule without spending exponential execution time is the use of stochastic search. CAPS provides a sixth algorithm that finds feasible schedules using simulated annealing. Simulated annealing is a search technique based upon the Metropolis Algorithm, which simulates a complex system of particles (molecules) in a heat bath T := T x R; end while; end if; End; Starting from the initial legal schedule and its evaluated cost, the algorithm randomly perturbs this schedule to obtain a new one. The cost function of the new solution is computed and compared to the current solution. As in the standard local iterative improvement approach, the new solution always replaces the current if the change in cost, AC, is non-positive. However, unlike the local iterative improvement approach, new solution is accepted with probability = exp(-ACIT) if AC is positive.
The control temperature, T, is a value in the same units as the cost function. It regulates the probability distribution that defines the acceptance criteria of solutions with degrading costs. At each temperature, T, the procedure attempts up to either a total of L moves or L, acceptance moves. The temperature is then reduced by a factor R. The resulting behavior is a downwardbiased random walk through the solution space, with the ability to escape local minima. Gradually decreasing control temperature changes the exponential probability distribution. This tightens the acceptance criteria against larger degradations. The value of T for which no degradations are reasonably expected and no more improvements can be found is T? the freezing temperature. At this stage the algorithm returns the current solution, and halts. Since we are only interested in finding a feasible schedule, i.e. a legal schedule with non-positive cost, we have modified the simulated annealing algorithm to halt as soon as it encounters such a schedule. To utilize simulate annealing algorithm, we must provide an efficient and effective ways to (1) obtain the initial legal schedule, (2) perturb the existing schedule to obtain new legal schedules, and (3) choose the annealing schedule that controls the number of legal schedules being examined at each temperature T and the rate at which Tis lowered.
Although annealing can begin from any solution in the search space, empirical evidence suggested that reasonably good initial solutions can often provide better final results [ 11,261. Hence, we always run the earliest-starting-time-first algorithm before the annealing algorithm. The result generated by the earliest-starting-time-first algorithm will be used as the starting solution of the annealing process if it is not a feasible schedule.
Empirical evidence also suggested that simulated annealing performs best for problems with smooth cost function surfaces [ 121. Hence, we re-define the cost of a schedule as Etardiness(v) (instead of ma.x{tardiness(v)}). Summing the tardiness values over all the tasks in CG has the same effect as averaging the tardiness values, which will produce a smoother cost function surface than the previous cost definition. The cost of a feasible schedule is again zero under the new cost definition.
The method of adjusting a given schedule to generate new schedules must maintain the precedence relationships between the tasks as defined by the constraint-graph CG. The annealing algorithm has two methods to generate a new schedule based on a given legal but infeasible schedule.
(1) The algorithm starts randomly at some point in the schedule, traverses up the schedule and finds the first task with a positive tardiness. If the algorithm reaches ojrst without encountering any task with a positive tardiness, it will scan the schedule one more time starting at the last task in the schedule. Assume that (oi, k) is the first task encountered with a positive tardiness. If k > 1, the algorithm will move the task as far up the schedule as possible while maintaining the precedence constraint. Update ct(v) and turdiness(v) for each operator v at and after (oi, k) in the new schedule. If k = 1, it will abandon the given schedule and generates a -new one using the next method.
all the vertices in the set Ready in Step (2.1) of the topological-ordering algorithm.
(2) The algorithm generates a new schedule from scratch by randomly removing vertices among Producing a brand new schedule is a very expensive operation time wise, since we have to recompute the relative ordering of the tasks, as well as the deadline and tardiness of every task in the schedule. Hence, the annealing algorithm is designed to invoke method (1) more often than method (2) during the annealing process.
The choice of the annealing schedule affects both the running time and the effectiveness of the annealing algorithm. The higher the initial temperature, the larger the cooling factor, and the larger the number of trials at each temperature will result in a more thorough search of the solution space. These parameters are normally established from trial and error experimentation [5] . The goal in choosing these parameters is to ensure that a sufficient, but not excessive, number of solutions are examined. In our experiments, we set the starting temperature to twice the cost of the starting solution, the freezing temperature to 1.0, the sample size to 100 and the cooling factor to 0.95.
MULTIPLE PROCESSOR SCHEDULING
The next generation of CAPS will run in a multiprocessor environment. We are looking into the possibilities of extending the existing algorithms to handle multiprocessor scheduling for hard real-time systems. Our initial results are summarized in this section.
The major difference between the single processor scheduling and multiple processor scheduling is that, in addition to deciding which task to be executed next, the multiple processor scheduling algorithms must decide which processor the task should run on. Given a constraint graph 
EARLIEST-STARTING-TIME/MOST-AVAILABLE-PROCESSOR-FIRST
Like the earliest-starting-time-first-algorithm for single processor scheduling, this algorithms always removes the vertex v with the earliest ready time from the set Ready. For 1 I j I M, let las9 as the last task scheduled on processor j so far. If there exists a processor i such that ct(lasti) I ready(v), then the algorithm assigns the task v to the processor i such that ct(lasti) = m a { ct(Zusfi> I 1 I j I M and ct(lus~) I ready(v)}. Otherwise, it assigns v to the processor i such that ct(lasti) = min{ cfclasfi> I 1 I j IM}.
EARLIEST-DEADLINELEAST-IDLING-PROCESSOR-FIRST
Similar to the earliest-deadline-first algorithm for single processor scheduling, this algorithm always removes the task v with the earliest deadline from the set Ready in Step (2.1) of the topological-ordering algorithm. If there exists a processor i such that Ct(lmti) I &adline(v) -met(v), then the algorithm assigns the task v to the processor i such that ct(fasti) = mar{ ct(lm$) I 1 I j I M and ct(lasfi) I deadline(v) -met(v)}. Otherwise, it assigns v to the processor i such that ct(lasti) = min{ct(lmfi) I1 S j S M } .
SIMULATED-ANNEALING
The only difference between the simulated-annealing algorithm for multiple processor scheduling and the one for single processor scheduling is the way new schedules are generated during the annealing process. Like the one described in section 3.1.4, the simulated-annealing algorithm for multiple processor scheduling has two methods to generate a new schedule based on a given legal but infeasible schedule S = [SI, S2, . . . , S,].
The algorithm starts randomly at some point in Si for some processor i, traverses up the table Si and finds the first task with a positive tardiness. If the algorithm reaches the head of Si without encountering any task with a positive tardiness, it will scan the whole schedule [SI, S2, . . . , S,] sequentially one more time to locate a task with a positive tardiness. Assume that The algorithm generates a new schedule from scratch by randomly removing vertices among all the vertices in the set Ready in Step (2.1) of the topological-ordering algorithm. Again, producing a brand new schedule is a very expensive operation time wise, and the annealing algorithm is designed to invoke method (1) more often than method (2) during the annealing process.
