Abstract. A highly parallel algorithm for termination of Ada tasks is suggested. The algorithm is intended for hardware implementation in order to overcome the complexity of recursive software algorithms. High efficiency and determinism is expected to be achieved with a hardware implementation. A VHDL implementation of termination units is presented together with simulations that show that the termination condition can be.evaluated in constant time.
Introduction
A hard real-time system is defined as a system where deadlines must not ever be missed, i.e. a missed deadline means system failure. Such systems require schedulability analysis, either a static schedule is calculated, or a dynamic scheduling algorithm is applied which guarantees that no deadline is missed. For the latter case, there are typically several restrictions, such as for each task, the periodicity, CPU utilization and critical resource usage must be known. In particular, before a new task is added, a new schedulability analysis must be performed. Thus dynamic task creation is not common in hard real-time systems, and as a consequence tasks rarely are required to terminate; they are expected to live throughout the execution of the system. The design of a hardware support for task termination, as described in this paper, should be considered as a part of a larger design, where a centralized Controller manages all tasks in a distributed system. We suggest a hardware implementation of Ada tasking as a separate node in a distributed system [2] [3] . The concept is not language dependent. Any language with a clearly defined tasking mechanism could benefit from such hardware support, but we have primarily concentrated on Ada 83 [4] , since it is standardized (with Ada we mean Ada 83 in this paper).
Most important of the previous work in the area of hardware implementation of Ada tasking, is ATAC [5] . Full Ada 83 tas-' king is implemented as a co-processor. ATAC in its current version 2.0 handles 32 tasks on-chip, but the number of tasks can be expanded to 2048 with external RAM.
Other hardware solutions for increasing performance of Ada is the microprocessor Thor [6] , a general purpose 32-bit RISC with support for Ada tasking.
However, if such schedulability analysis allows a certain number of dynamic tasks, there must be means to achieve efficient and predictable task creation and termination. The diff% culty to achieve this is caused by (at least): Task creation requires memory to be allocated (from the heap), and task termination requires investigation of the task state of other tasks directly or indirectly dependent on this task and its master(s).
There are other hardware implementations of real-time kernels that are language independent, e.g. the co-processor FASTHARD [7] , and the CPU FASTCHART [8] , but they do not address the Ada tasking mechanism.
An efficient implementation of Ada task termination for shared memory MIMD architecture is presented in [9] . The solution is highly parallel and distributed. Hardware :support for Ada tasking is also discussed in [lo] . A distributed version of ATAC is suggested in [ 111.
We present an algorithm for evaluation of the termination condition. The algorithm is intended for hardware implementation, and we give the design and implementation in a hardware description language, VHDL [I] . High efflciency is achieved by having a state machine per task evaluating the termination condition. The state machines execute in parallel. The implementation is written in a way that allows synthesis as well as simulation. Synthesized code can be downloaded and executed on a Field Programmable Gate Permission LO make digital or hard copies of p3n or alI of this work for personal or classroom use is granted without fee provided that copies arenormadeordisrriburcd forprolit orcommerciaJndvantageandthat copies bear this notice and the full citation on the first page. Copyrighrc for components of this work owned by others than ACIM must br honored. Abstracting with credit is permitted. To copy othenvise. to republish. to post on servers or to rcdistributr to lists. requires prior specific permission a&or a fee.
The scheduling co-processor in SPRING [ 121 perforrnes onthe-fly schedulin, = analysis to decide if new tasks can be created with all deadlines guaranteed.
Outline
The rest of this paper is structured in the following manner: Section 2: Discusses Ads's task model with the purpose of explaining and clarifying important aspect of the semantics. Section 3: An example of how complex task termination is to handle in software, just to give some perspective of and motivation to our hardware solution. Section 4: This is the main section which describes our termination algorithm and its implementation in VHDL, illustrated with examples and simulation traces. To copy otherwise or republish, requires a fee and/or specific permission.
The Task Model in Ada
This section contains an overview of parts of the tasking model concerning termination. Comments and analogies are given to the LRM quotes, and some motives for certain language constructs are given.
The complex termination rules for Ada task causes the termination to be non-deterministic. A task can have one or more dependent tasks, and is also depending on a master. A task completes when the last statement is executed, or when it is aborted, or if an unhandled exception is raised. A task may terminate when waiting on a terminate alternative in a select statement. The task may only terminate when all its dependents (recursively defined in a tree-structure) has terminated or is waiting at a terminate alternative. All such tasks are required to terminate simultaneously, as an atomic action.
It is further complicated by the fact that a task waiting at a terminate alternative may be called, thus causing itself and all of its dependents to become non-eligible for termination. In a simpler process model, a process is either terminated or not. Waiting for termination is just a matter of waiting for all children to become terminated.
Execution of a select-or-terminate takes longer time than a simple accept, since the termination condition must be evaluated (cooling down direct and indirect masters).
The termination rules must be checked (warming up direct and indirect masters) at every call to a task waiting at a selector-terminate construct. Such calls take longer time than a call to a task waiting at a simple accept or a select without terminate.
Master
Each task depends on at least one master. A master is a construct that is either a task, a currently executing block statement or subprogram, or a library package.
The reason for introducing masters and task dependencies is that it allows a group of tasks to terminate when it has accomplished its mission.
Direct Dependence
A task depends on amasterdirectly in the following two cases: (a) The task designated by a task object that is the object, or a subcomponent of the object, created by the evaluation of an allocator depends on the master that elaborates the corresponding access type definition.
(b) The task designated by any other task object depends on the master whose execution creates the task object.
Taskdependency is important both when deciding if onecompleted task may terminate, and if a group of tasks waiting at a terminate alternative may terminate simultaneously. Furthermore, the dependency is used to determine when allocated memory of a task can be reused. The memory of a terminated task is not deallocated until all dependent tasks has terminated.
Indirect Dependence
Furthermore, if a task depends on a given master that is a block statement executed by another master, then the task depends also on this other master, in an indirect manner; the same holds if the given master is a subprogram called by another master, and if the given master is a task that depends (directly or indirectly) on another master. Dependences exist for objects of a private type whose full declaration is in terms of a task type.
A block or a subprogram can be masters, and they can be executed by an other master. In this case the task depends indirectly on the other master.
There is a duality in the task model, that a task both may be a master (have dependents) and have have a master (be dependent).
A master is in itself not dependent on any other master, only task dependence is defined in the LRM. However, the analogy of exiting the scope of a block/subprogram and terminating a task is apparent.
Completed
A task is said to have completed its execution when it has finished the execution of the sequence of statements that appears after the reserved word begin in the corresponding body. Similarly a block or a subprogram is said to havecompleted its execution when it has finished the execution of the corresponding sequence of statements. For a block statement, the execution is also said to be completed when it reaches an exit, return, or goto statement transferring control out of the block. For a procedure, the execution is also said to be completed when a corresponding return statement is reached. For a function, the execution is also said to he completed after the evaluation of the result expression of a return statement. Finally the execution of a task, block statement, or subprogram is completed if an exception is raised by the execution of its sequence of statements and there is no corresponding handler, or, if there is one, when it has finished the execution of the corresponding handler.
A task, block or subprogram completes, conceptually by reaching its "end-of-statements". The task completion and block/subprogram completion are similar, even if there exists many different ways to reach the end.
Terminated
If a task has no dependent task, its termination takes place when it has completed its execution. After its termination, a task is said to be terminated. If a task has dependent tasks. its termination takes place when the execution of the task is completed and all dependent tasks are terminated. A block statement or subprogram body whose execution is completed is not left until all of its dependent tasks are terminated.
Termination requires that the task has completed, and if there are dependents, that those tasks have terminated.
Again the analogy with subprograms (and blocks): "Termination" of a subprogram requires that the subprogram has completed, and all dependents have terminated.
Terminate Alternative in Select Statement
The tricky part is when a group of tasks, connected by the dependencies, are about to terminate simultaneously:
Termination of a task otherwise takes place if and only if its execution has reached an open terminate alternative in a select statement (see 9.7.1) . and the following conditions are satisfied: -The task depends on some master whose execution is completed (hence not a library package).
-Each task that depends on the master considered is either already terminated or similarly waiting on an open terminate alternative of a select statement.
When both conditions are satisfied, the task considered becomes terminated, together with all tasks that depend on the master considered.
The "termination" of a master requires that the master has completed, and that all its dependents have m-are about to ter- 
Comments
The rule for termination does not prevent a terminated task from being called. Any visible task may be called, and Task&Error is raised if the called task is terminated.
The opposite applies when a master has "terminated"; it also prevents any terminated tasks dependent of this master from being called. No task/subprogram longer exists, that could make such a call.
Library packages that are masters never terminate. Thus, for tasks depending on library packages, the terminate alternative is not applicable until the main program completes.
Example of Task Dependencies and Possible Calls
The example below illustrates direct and indirect dependencies of five tasks and two (subprogram) masters. The condition to terminate is indicated by comments, but we only consider the simple case where tasks complete and wait for dependents, not the select-or-terminate case. The dependencies are also shown in fig. 1 The procedure Ml is a master since it creates a new task, C, which also is a master since it creates D etc. Task D and E are indirect dependents on Ml. The tasks A and B are declared in M2 and created by Ml executing M2, so A and B are indirect dependents of Ml.
The "dependency" of master M2 on Ml is also indicated with a thick line. Such a dependency is not defined in the language, but it is a way to show the indirect dependency of A and B on Ml. An important property is that Ml cannot complete until M2 completes and returns. M2, A and B however can complete in any order. The local tasks A and B, which is created during the call to M2, does not exist when M2 has returned. In the routines for accepts and entry calls, there are similar and rather complex algorithms to wurm up and cool down layers. A layer is a collection of tasks that may terminate collectively. The layers are connected in linked lists that are traversed, and furthermore when reaching the end of a list, the traversing may continue in a "master block" that has a similar linked list of layers. i : A task control block (TCB) and a layer are associated via pointers, and layers are also linked in lists. The "temperature" of a layer is: WARM = not waiting on select with open terminate COOL = waiting + has non-terminable dependents COLD = waiting + all dependents if any are terminable For example, a warm layer (the task associated with this layer) is executing, delayed, suspended on an accept, or waiting at a select with open delay. A cool layer has been suspended on a select with open terminate, but has dependents that are not ready to terminate. A cold layer has also been suspended on a select with open terminate, and all its dependents have either terminated or are likewise suspended on a select-or-terminate.
Nested Declare Blocks
The complexity is caused by having nested declare blocks (that are masters) and allowing a select-or-terminate statement inside the block, as in the following example. Inner completes only if the entry of A is called. Outer completes only after Inner has completed. To terminate when waiting at the select, both B and C must be terminable (completed or waiting), and the master of A completed.
Algorithm for Cooling and Warming Layers
The algorithms in the run-time system for handling select with open terminate is as follows. Starting with task A's Most,RecentLayer, each layer temperature in the linked list Link-Layer is set to COLD, until a layer with non-terminable dependents is found. If end of list is encountered before a layer with non-terminable dependents, then nr-nonterminable-dependents in Master-Block is decreased, and if nonzero the search continues (recursively) in Master-Block.
The search is ended either when a master has non-terminable dependents or is completed. In the first case, the temperature of the layer is set to COOL, and in the second case the task and all dependents may terminate collectively.
The algorithm for handling entry calls is as follows. The transitions are shown in the following automaton, fig. 3 .
Fig 3. Task states and transitions
This is not a complete description of Ada task states, but sufficient for evaluation of the termination condition. For example, a task that completes (reaches the end statement) corresponds to the terminated state. In the same way, a task that is aborted or that raises an undefined exception, corresponds to the terminated state.
All tasks in the Waiting state must terminate simultaneously. When all tasks depending on a master has terminated, the allocated resources (memory etc.) of the tasks and the master may be destroyed, and then reused.
Only a little extra logic is needed to calculate the termination condition for one task, but there are four signals necessary to communicate with the other tasks. The signals are named Rin, Rout. Tin. and T,,,t. R propagates information upwards to a master, and informally means "is running". T propagate a command downward to dependents, and informally means "do terminate", see fig, 4 . Term causes the task state to change to terminated, which "immediately" is propagated via Tout to any dependents.
Connection of Termination Units
As seen in fig. 5 , a termination unit consists of the task state represented by a state machine, and two registers to hold the identity of the master and the dependents. 
VHDL Implementation and Simulation
We use the Mentor Graphics software which contains a predefined VHDL library MGC-Portable with definitions such as the type Qsim-Logic, a 1Zvalued enumeration type for simulation and synthesis of bits and buses. Resolution functions such as Qsim-State-Resolved-Or-Vector are available. The code presented is intended for synthesis as well as simulation. The synthesis subset of VHDL is not standardized so this will be compiler dependent. 
479
The termination unit The termination unit is encapsulated with communication both to the environment via DBUS and to other termination units via the master signals. The originally two master signals are implemented as four vectors since the synthesis does not allow a resolution function being applied to inoutparameters. This should be solved by specifying the master signals as bus, which allows a resolution function to be applied, however only wired-x is allowed. This is highly compiler dependent
In the simulation of one termination unit shown in fig. 6 , the signal names match the names in the VHDL code below, however, not all signals are shown. The termination unit is dependent on master nr 1 (Reg-Master is set to 0001) and has no own dependents (Reg_Depend is set to 0000). To simulate this, TJn( 1) is forced to 1 and R-In(O) to 0 at 1. This together with the values of Reg-Depend and Reg-Master affects R,, at 1 and Tin at 2 respectively. The actions written to Reg-Control always affects the state z on the following rising clock pulse, as seen at 3 and 4. The performed actions result in the correct state changes. Since there is no dependent, Ria is high only when z = running. At 5 when z = waiting, T-In(l) is forced to 0. to simulate termination initiated by a completed master. Tout is set at at 6 to terminate any dependents. Term becomes high at 7, and this causes the next action Terminate at 8. The state change is effected at the next clock pulse at 9 (intentionally delayed in the figure) . The next state is Terminated at 10. An array of termination units
An array of components is created and connected with the generate statement in VHDL. Most discouraging this statement is not implemented in the VHDL synthesis subset of the current version of our software, Mentor Graphics ~8.4. The only solution is to name and connect each component "manually". Such an implementation is shown below, however, this is not feasible for a large (realistic) array.
A simulation of the termination array is shown in fig. 7 . The dependencies are the same as the example described in the following section 4.5. From 1 and forward the four tasks Tl to T4 are created, they are allocated to termination unit a0 to a3. From 2 and forward the master (Tl) completes and changes state to Terminated, and then the dependent tasks (T2 to T4) one by one changes state to Waiting. When the last task changes state, the condition for termination is fulfilled and the signal T,,,,, at 3 ripples through all dependents, so that Term is set at 4. Term going high affects the action at 5. For simplicity only a few signals from termination unit al is shown in the figure. All dependents are terminated simultaneously. The width of R and T is four bits each, which is the minimum number necessary with this set of tasks. If bit To and Ro is used to represent"no dependents", the values of the master register M, and dependents register D becomes:
The tasks and the dependencies are shown in fig. 8 and more elaborated in fig. 9 , where the mapping of registers M and D to R and T is shown in detail for each task. The signal Ru that represent "no dependents" must be hardwired to "l", while To can be ignored (or at least wired-OR and then ignored). Rn and To are rightmost in fig. 9 .
Representing Procedures and Declare Blocks
A procedure or block that is a master is represented with a termination unit. Since a master does not necessarily have its own thread of execution it is necessary to keep track of the current master.
When a procedure master is created, its initial state is Running. It will remain there until end is executed, which changes its state to Terminated. The procedure is now completed and waiting only for dependents to terminate.
For a declare block there is an additional complexity: the select-or-terminate statement is allowed (accept is allowed but that does not make it more difficult). A declare block is created with an initial state Waiting and it will remain there until end is executed which changes its state to Terminated, or until it is terminated together with its master and all dependents. When executing a select-or-terminate inside a declare block it is the termination unit of the task that changes state to Waiting, the states of the (potentially) nested block's termination units are not affected.
Nesteddeclare blocks are handled the same way as nested procedures, in the sense that all termination units are in the state, and execution of a programmatic end only affects the last termination unit, causing it to become Terminated. The difference is that for nested blocks the state must be Waiting, and for nested procedures the state is Running (could be Waiting).
Both of the following code fragments have nested masters that are not tasks. The dependencies are mapped on termination units in the same way, as shown in fig. 10 . 
Limitations
The number of master signals sets a hard limit on the number of masters in the system. The most "master consuming" configurations are . all tasks nested: task T 1 created T2 that creates T3 etc. The number of tasks is the same as the number of levels. l a binary tree: task Tl creates two tasks, that each creates two tasks. If the nesting depth is n the number of masters is 2-t and the number of tasks is 2".
Luckily, any number of sibling tasks can be created without increasing the number of masters. It is reasonable to assume that the number of tasks in a realistic system is much larger than the number of masters.
Block or subprogram
Master signals that is a master Rcut = task depending directly on this master ?ia = task uses its master's signal for termination decision 1 Rill = task uses the wired-OR of its dependents for termination decision m T Out = task is a master and has dependents 
Increased Utilization of the Master Signals
If the number of masters in a program is larger than the number of master signals, the task hierarchy will not fit on the hardware. By structuring the task hierarchy into subtrees, and the master signals in groups belonging to subtrees, it is possible to increase the utilization of the hardware. Tasks belonging to the same subtree are mapped to termination units physically close on the chip. The master signals are separated by (programmable) latches, so that the same master signal can be reused, mapped to different masters, in different subtrees without interference.
There is a simple mapping algorithm that allows any number of masters in aN-ary tree, i.e. a tree where a task may have any number of children. The tasks are enumerated in-order and then directly mapped to the the termination units. The termination units arepacked so that no units are"wasted". The limiting factor with this algorithm is the nesting depth.
The example in fig. 11 (although with fewer children than a binary tree) shows a tree where the tasks for simplicity have been named after their in-order number. The subtrees rooted at T2 and T6 must be separated, and likewise the subtrees at T4 and T7. If this tree is part of a larger system, the master signals must be cut at more places, e.g. above Tl and below T9.
An even simpler way to increase the number of masters is to cut the signals on a few fixed places, like in rig. 12. Within each subtree termination units can be allocated arbitrarily 5 Run-Tie System and Compiler Impact
In our previous papers we have suggested a distributed system architecture with a centralized node managing all tasks in the system. This section describes the proposed Contder and the impact on the local run-time systems. and on the compiler.
In the proposed distributed system there is a potentially large number of loosely-connected nodes, i.e. CPUs each with private memory, connected to a high-speed LAN. The nodes are general purpose computers that execute a distributed Adaprogram. One node, the Controller, is highly specialized for management of all Ada tasks in the system.
Each node has a smaller run-time system than a"normal" Ada program since all task management is replaced with a protocol for communication with the Controller. Any call to the runtime system concerning tasking is transformed to a request to the Controller. Every request causes a reply that among other things contains information about which task the node shall execute. Messages may be sent at any time from the Controller to a node, for example when a delay expires. The Controller has all state info, effects all state changes, and schedules all tasks. For each node there must be a priority-sorted ready queue. There must be one expiration-sorted delay queue for the system. For each entry, there must be one FIFO queue. All operations such as Insert and Remove must be deterministic.
There is one state machine per task. Calls to the Controller are serialized but each call is served by parallel execution of all task state machines. The address of the state machine can be used as a global task identity.
The controller must be able to handle for example create and delete task, local and inter-node rendezvous, interrupts, and ACP -the Ada Controller Protocol.
Create task: Memory is allocated locally, to be used for task stack as well as for some task state info, registers etc. A request for task creation is sent to the Controller, which allocates aTCB (which the termination unit is part of). The allocation algorithm should be simple: any TCB that is free. The actual choice of TCB identity, must be propagated back to the requesting node.
Destroy task: All termination is originally programmatic; one task that completes or executes a select-or-terminate sends a request to the Controller. There will be one reply and possibly a large number of messages to other nodes that has tasks to terminate. Memory are deallocated locally.
Rendezvous: All entry calls and accepts cause a request to the Controller, which reschedules the tasks on the node (or nodes if the called task is on another node). The tasks are removed or inserted in the ready queue, delay queue and/or ready queue. If two nodes are involved, both a reply and a message is sent, to switch tasks. Any rendezvous parameters are transferred directly between the nodes. Both the parameter size and address must be known by the run-time system. For local calls only the address is necessary. Any exception during rendezvous is first transmitted to the Controller, which propagates it to the caller.
Interrupts: If handled by a procedure (by pragma interrupt-handler or similar) this does not cause any Controller activity but any interrupt associated with an entry gives a request/reply to the Controller. The ATAC handles interrupt and reschedules without disturbing the CPU but in a distributed system this is not practical.
ACP: A simple and efficient protocol for communication between the nodes and the Controller is required. The round trip time of a request/reply is about a few l.ts on a fiber-optic network, and the Controller must be able to make any tasking decision during this short time. The Controller must implement both the protocol and the tasking in hardware to achieve this.
484
The impact on the compiler should be minimized, primarily by making changes in the run-time system instead. Since Ada is a standard, the run-time system interfaces of different compilers are more uniform than for non-standardized languages.
In addition to the task id, the current master must be known to the run-time system. Some operations concerns the task itself, such as execution of a select statement, while other operations concerns the current master, such as execution of an end statement.
Summary
Our hardware algorithm makes possible the termination of groups of Ada tasks in a deterministic manner (in fixed time), which is important not only when a task completes and terrninates, but also when a task makes an entry call to another task that are waiting at a select-or-terminate. In a hard real-time system, such language constructs (that are non-deterministic) are excluded.
Compared to software implementations with potentially long lists of tasks that must be traversed sequentially (task dependencies as well as nested masters are handled), hardware implementations in general offers high speed as well as the more important determinism.
The benefit of expressing an algorithm in VHDL is that VHDL code can be tested in simulators as well as synthesized and executed on a Field Programmable Gate Array, FPGA. The FPGA can easily be reprogrammed and thus it is relatively cheap to experiment with different hardware implemented algorithms. The synthesis step of the termination unit will be performed as soon as possible, however, the correct version of the software for synthesis and execution has not yet been delivered. The next step is to implement the Controller in VHDL, simulate, synthesize, and execute it on a FPGA.
