In this paper, a Clustering Heuristic Scheduling Algorithm (CHSA) The CHSA consists of a Processor Selection Algorithm (PSA), a Time-slot Selection Algorithm (TSA) and a System Utilization Controller (SUC). The PSA assigns parameters to each processor according to the nature of the task that is to be scheduled. It then prioritizes the processors capable of processing the new task according to each processor's status parameters. The PSA returns a list of processors, in order of preference, which it passes to the TSA. The TSA then attempts to assign the task to a time-slot on a processor beginning with the processor assigned the highest priority. For that purpose the TSA creates a simplified Gantt chart in the form of a link list from the schedule it maintains for each processor. The TSA then searches the link list for differet? patterns in their order of priority. The SUC limits the number of tasks allowed to be scheduled on the system in order to keep the ratio between the code fetch requests and data fetch requests within acceptable limits. The CHSA was successfully tested with a simulator of a multiprocessor system.
Introduction 2. Processor Selection Algorithm
This paper is focused towards a scheduling heuristic algorithm that attempts to schedule i periodic tasks with
The CSHA has to make two decisions when soft deadlines on a multi-processor system. The algorithm scheduling a new channel. The first decision selects the has been designed for a linear array of p processing processor on which the new channel is scheduled. This is elements. Each processor has a program and data done by the ~rocessor Selection Algorithm (PSA). The memory.
second decision selects the time slot(s) in which the new All tasks have a time period T. All tasks are nonresident. Non-resident tasks are tasks whose code has to be loaded from an external source. The system loads data and code using a memory interface. The memory interface is the bandwidth bottleneck of the system. It is assumed that code fetch times are significantly higher than data fetch times. Therefore, the scheduler attempts to reduce repeated code fetches caused by scheduling dissimilar tasks on the same processor.
A processor's program memory is big enough to store code of only one task at a time. The time period P is task is scheduled and processed on the selected processor. Processors with equal priorities are arranged in descending or ascending order of their respective processor ID numbers.
Processor Prioritization

Time-slot Selection Algorithm
Once the processor are prioritized and sorted, all that remains is the selection of the time slot. Similar tasks are heaped together on the same processor in consecutive time-slots. An upper limit is defined which limits the bandwidth requirements on the memory interface in a single time slot, over all processors. This way the scheduler avoids scheduling too many channels in one time slot. Doing so may create a bandwidth bottleneck at the external memory interface and can delay the processing of tasks.
Link-list Generation
The first step in the Time Slot Selection Algorithm is the creation of a link-list. This link-list presents us with a simplified picture of the schedule of channels on the particular Processor. The link list is a circular double linked list.
Time-slot search
When scheduling a new task z of type X on a particular processor, we can encounter either one of two scenarios.
1. Other tasks of type X are already scheduled on the processor. 2. Tasks of type X are not scheduled on the processor. Once the scheduler finds a valid sequence which contains at least one Idle time link for which,
Task X Already In
Durution(ld1e I,,,) >= Duration (Tusk X )
it counts the total number of links in the sequence and the number of Idle time links. This ensures that this sequence does contain enough Idle time to accommodate at least one more channel running task X.
If this sequence is the first sequence found on this particular Processor, the scheduler just stores it and continues its search for another sequence.
If the number of Idle time.links also happens to be equal, the scheduler goes on to compare Let us assume that we end up with two sequences,
In order to select either one of the two sequences the scheduler calculates, 
<-> Task X,
If the scheduler is unable to find a single instance of the required sequence it skips to step 3. 
4.
The scheduler returns to the best processor returned by the Processor Selection Algorithm and searches it for its biggest available time slot and schedules the channel in its center.
The state diagram in Figure 7 shows the sequence in which the scheduler searches for an appropriate time slot using steps 1-5 described above. 
Task X Not Already In Use On Selected Processor(s)
If the processors returned by the Processor Selection Algorithm are of a kind which do not contain any previous channels of task type X, it follows the method outlined in this section to schedule a new task z.
It finds the Processor with the biggest Idle time slot from among the processors returned by the Processor Selection Algorithm and schedules task z in the center of the biggest available Idle time slot.
System Utilization Controller
It is usually desirable that processor utilization of a system be kept as high as possible. However, when the code fetch to data fetch ratio exceeds a certain limit it becomes necessary to limit the System Utilization (SU).
System Utilization
The System Utilization (SU) is calculated by,
The value of SU is updated every time a new task is scheduled or removed.
System Utilization Limit
The System Utilization Limit (SUL) is the maximum allowable upper limit of the SU. The SUL is changed adaptively by the System Utilization Controller.
In order to keep the System Entropy (SE) within limits it is necessary to adaptively control the allowable upper limit on the SU, referred to as System Utilization Limit (SUL). The SUL is controlled by the System utilization Controller (SUC).
System Entropy
The scheduler uses a metric called System Entropy (SE)
which is an estimated measure of the system's entropy. The higher the SE, the greater the extent to which different tasks share the same processor and the higher the number of code fetch requests. SE is a function of the average number of non-resident tasks per processor. SE = f (Avg. num. of different task types per processor)
Lower System Utilization Threshold
The Lower System Utilization Threshold (LSUT) is the SU value at which the SUC holds down the SUL for a certain period of time until it drops below the SE limit.
System Utilization Controller State Machine
The SUC is a state machine which varies the SUL according to the SE. It updates its current state on every interrupt the scheduler is invoked. The steps followed by the SUC state machine are described below. The graphical representation of the state machine is given figure 8.
Implementation Results
The CSHA was implemented in C and profiled on an intel Pentium-111 733MHz processor. The implementation supported 2048 tasks on P = 12 processors in a time period of T = 10 msec divided into S = 40 time-slots of 0.25 msec each. New task scheduling requests were being generated at the rate of calls were established at the rate of 1 every lOmSec or 100 new tasks per second.
Simulation time: 25,040 * 10 = 250,400 mSec Real time: 13,774.457 mSec %-age time spent in functions that will not be used in the real scheduler ,i.e. file accesses, call duration counter updates related functions = 94.1% %-age time spent in functions that will be part of the finat implementation of the scheduler 
Conclusions
From the CSHA code's profiling results it can be seen that this algorithm can be implemented for use in realtime systems. Potential applications include multiprocessor DSP systems, i.e. radars, voice and video encoders/ decoders, in which samples of data of multiple channels are processed periodically.
