ABSTRACT
INTRODUCTION
Support for digital audio and video as 1/0 media is an important direction of computer systems research. We call audio and video continuous media (CM) because they are perceived as continuous, in contrast with discrete media such as graphics. There are various ways to incorporate CM in computer systems; in the integrated approach, CM data (digital audio and compressed digital video) is handled by user-level programs on general purpose operating systems such as Unix or Mach.
On existing general purpose 0Ss, integrated CM applications can suffer from poor performance; ACME [4] is one such application, ACME is a user-level 1/0 server that provides shared, network-transparent access to devices such as video cameras, speakers, and microphones (see Figure 1 ). We have implemented a prototype of ACME for a Sun SPARCstation running SunOS 4.1. It suffers from timing errors and lost data when there is concurrent system activity, even though the hardware is easily able to handle the data rates (e.g., 64 Kb/sec audio data). The server also cannot supply the low delay needed for a telephone conversation client.
These problems are partly due to the overhead of userkernel interaction mechanisms by which userIevel programs invoke system functions such as CPU scheduling and 1/0. This overhead includes userlkernel domain switches and mapping switches between different user virtual address spaces. For example, the UNIX asynchronous 1/0 mechanism requires up to ten domain switches and two mapping switches to read a block of data. The expense of these operations can be amortized by hysteresis and increased granularity (techniques used Figure 1: Audio playback is a basic integrated CM application. The client reads CM data from a file and sends it to the ACME server (bold line). The client also provides a graphical interface for making selections and controlling the playback parameters.
and pipes). For CM applications, however, these techniques may increase delay excessively.
With the goal of better supporting integrated CM applications, we have designed OS mechanisms for scheduling and IPC. In this approach each user virtual address space (VAS) contains multiple lightweight processes (LWPS). The scheduler is partitioned into user-level and kernel-level parts, which communicate via shared memory. The information in shared memory is used to correctly prioritize LWPS in different VASS, and avoid domain and mapping switches where possible. Split-level scheduling can be used with many scheduling policies; we discuss its use for dead/ineAvorkahead scheduling, a real-time policy designed for CM. q Memory-mapped streams.
A memory mapped stream (MMS) is a shared-memory FIFO used for communicating CM data between user and kernel VASS. Once the MMS has been setup, no explicit kernel requests are needed to transfer data, and a minimal number of domain switches are needed for producer/consumer synchronization and 1/0 initiation.
In the next section we explain the process structure of the ACME server and the deadiine/workahead scheduling policy in more detail. Sections 3 and 4 describe the new mechanisms.
Section 5 gives some performance estimates, and Section 6 discusses related work.
PROCESS STRUCTURE AND SCHEDULING FOR CM APPLICATIONS
To motivate subsequent sections, we sketch a typical CM application (the ACME 1/0 server), and describe the deadline-workahead CPU scheduling policy.
2.1. The ACME Continuous Media 1/0 Server ACME (Abstractions for Continuous Media) [4] supports applications such as audio/video conferencing, editing, and browsing. ACME allows its clients to create /ogica/ devices, associate them with physical 1/0 devices (video display or camera, audio speaker or microphone), and do 1/0 of CM data over CM connections (network connections carrying CM data). The data stream on a given CM connection may be multiplexed among different logical devices. ACME provides mechanisms for synchronizing different streams.
The ACME server performs multiple concurrent activities, and it is convenient to structure it as a set of concurrent processes. Our prototype uses the following processes (see Figure 2 ): q For each CM connection, a network 1/0 process transfers data between an internal buffer and the network.
It may do software processing (e.g., volume scaling for audio streams). A CM application such as the ACME server consists of multiple processes sharing a single address space. Some of these processes handle streams of CM data, while others handle discrete events.
. For each CM 1/0 device there is a device //0 process. For an output device, this process merges the data from the logical devices mapped to it and writes the resulting data to the device. q Event-handling processes handle non-real-time events such as commands from the window server and requests for CM connection establishment.
The current implementation of ACME runs on the Sun SPARCstation.
It is written in C++ and uses a preemptive lightweight process library. 1/0 is done using UNIX asynchronous 1/0. The server handles telephone-quality (64 Kbps) audio 1/0 and video output, both compressed and uncompressed.
Deadline/Workahead Scheduling
The Deadline/Workahead Scheduling (DWS) CPU scheduling policy is designed for integrated CM [2] . In the DWS model, a process that handles CM data is called a real-time process. There are two classes of non-real-time processes: interactive (for which fast response time is important) and background.
A real-time process handles a sequence of messages each with a /ogica/ arrival time /(m), either derived from a timestamp in the data or implicit from its position in the stream. Each real-time process has a fixed /ogica/ de/ay bound the processing of each message should be finished within this amount after its logical arrival. At a given time t, a real-time process is called critical if it has an unprocessed message m with /(m )s t (i.e., m's logical arrival time has passed).
Real-time processes that have pending messages but are not critical are called workahead processes.
The DWS policy is as follows (see Figure 3) . Critical processes have priority over all others, and are preemptively scheduled earliest deadline first (the deadline of a process is the logical arrival time of its first unprocessed message plus its delay bound). Interactive processes have priority over workahead processes, but are preempted when those processes become critical, Non-real-time processes are scheduled according to an unspecified policy, such as the UNIX time-slicing policy. The scheduling policy for workahead processes is also unspecified, and may be chosen to minimize context switching. Figure 4) . The ULS checks whether its VAS still contains the globally highest-priority LWP; this is done by examining an area of memory shared with the kernel. If so, the LWP context switch is done without kernel intervention. Otherwise, a kernel trap is done, and the kerne/-/eve/ scheduler (KLS) decides which VAS should now execute, again based on information in shared memory. ' The technique is applicable to multiprocessor scheduling as well. For brevity we deseribe only the uniproceasor case.
While split-level scheduling can be used with many scheduling policies, we focus on its implementation for the deadlinehvorkahead (DWS) policy described in the previous section. We also describe a related mechanism for efficient mutual exclusion between LWPS. For simplicity, we consider only the scheduling of real-time processes.
It is straightforward to handle interactive and background processes as well (a VAS could contain a mixture of process types). An LWP calls this to suspend its execution until the given time; at this point it becomes runnable and Cp is set to the current time. This may be used by processes that do time-based output with no device synchronization (e.g., slow video) or for rate-based flow control.
IO_wait
(DESCRIPTOR iodesc, TIME critical_time) ;
An LWP calls this to wait for l/O to become possible on the given 1/0 descriptor representing a file, socket, l/O device or MMS (Section 4). When data arrives on the descriptor, the process becomes runnable and its CP is set to the given value.
These calls bracket "critical sections" within which the calling LWP cannot be preempted by an LWP in the same VAS.
3.2.
Implementation of the Split-1-evei DWS Scheduler in this section we first describe the control and shared memory interfaces between the ULS and the KLS (see Figure 5 ). We then describe the implementation of each level. We defer discussing synchronization issues (e.g., mutual exclusion on shared data structures) until Section 3.3. scheduler decides which user VAS should execute, and each VAS has a user-level scheduler (ULS) that manages the LWPS in that VAS. In this example, the KLS chooses VAS Sz to run because it has the globally earliest deadline. The ULS in that VAS executes P5, which has this deadline. User/kernel interactions can often be avoided: in this example, if P5 yields then the context switch to PG (the next earliest deadline) can be done without a kernel call.
User/Kernel Control Interface
The control interface between a ULS and the KLS consists of system talk and user-interrupts.
The system call mechanism is the same as in UNIX-type systems: a trap instruction and return. The split-level scheduler needs one new system call: yield () yields the processor to another VAS.
User-interrupts are like UNIX signals except that the handler does not end with a system call to reset the signal mask (hence there is one domain switch rather than three). Each ULS registers the addresses of its handlers during initialization.
Three types of user-interrupts are used:
INT_TIMER is delivered when a timer elapses, INT_IO_READY is delivered when 1/0 becomes possible on an 1/0 descriptor, and INT_RESUME is delivered when a user VAS resumes after being preempted.
User/Kernel Shared Memory Interface
The ULS for each VAS A shares a region of physical memory with the kernel. This region consists of two parts: the usched area and the ksched area (see For each 1/0 descriptor, a ready_for_/O flag to indicate that data has arrived on that descriptor.
We use the following additional notation:
PA: the highest priority runnable LWP in A, If DA k finite then PA is the earliest-deadline runnable critical LWP. Otherwise, PA k set to an arbitrary runnable LWP (the choice of PA in this case depends on the policy for workahead processes, which we do not specify). P-: the globally highest priority LWP.
A*: the VAS containing P*.
ULS Implementation
The ULS of VAS A is responsible for scheduling LWPS in A. If the ULS detects from its ksched area that A #A*, itcalls yield ( ).
Similarly, if the KLS . detects from A's usched area that A~A -, itpreempts A.
The ULS may need to preempt the currently running LWP when the critical time of a sleeping LWP is reached or a non-running workahead LWP becomes critical.
This requires an INT_TIMER user-interrupt from the kernel. To reduce the number INT TIMER user-inter?upt deliveries, the following policy i~used (see Figure'6 ):
Let X be the set of sleeping and workahead LWPS P in A such that DP < DA. Let Taltic,l= min(Cp: P G X). Then it is sufficient for the U LS to maintain a timer for T.M=I.
In addition to the data in the usched area, the ULS maintains queues of sleeping, critical and workahead LWPS. Figure 6: At a given time Tmw, the ULS for a VAS A must have a pending INT_TIMER user-interrupt for the earliest critical time of a sleeping or workahead process P= A such that DP < DA. In this example, PS k critical and P, and P2 are workahead.
If P3 is still running when CP, arrives, P2 becomes critical and must preempt P3. On the other hand, PI cannot preempt P3 because its deadline is greater. Therefore a timer is needed for C2 but not Cl. respectively), then does the following (see Figure 7) :
For each LWP P in the workahead and sleep queues such that CP < TWW,inserl P in the critical queue. For each LWP P sleeping on an 1/0 descriptor for which the reacfy_for_/O flag is set, insert P into the workahead or critical queues as appropriate.
(2) Update D. in the usched area. An INT_RESUME user-interrupt is delivered to a VAS when it resumes execution after having been preempted.
Between when the VAS was preempted and Tmw, an indeterminate amount of time has elapsed. The same is true when the VAS returns from a yield ( ) system call. In both cases, the ULS performs steps 1-5 above to update its state.
KLS Implementation
The KLS is responsible for updating Da in the ksched area of the currently executing VAS A. If in doing so it detects that A # A*, it preempts A and switches to A*.
Changes to Dz can occur when a sleeping LWP wakes up or a workahead LWP becomes critical. T!mers have to be set for these moments. KLS timer management is analogous to that of a ULS. The KLS maintains a timer for the earliest Cp such that Dp < Dx; this is computed from the tables in the usched areas of all VASS not currently executing. If, when the timer expires, the current VAS A is no longer A *,, the KLS preempts A and switches to the new A *. Additionally, the kernel clock interrupt handler polls T.ex in the usched area of A*, delivering an INT_TIMER if necessary.
The yield () system call determines A*. It then computes DA, writes it to A *'s ksched area and updnte~the pending timer if necessary.
Finally, it
switches to A*, either by returning from an earlier yield ( ) SYSWTI Call, or by delivering an INT_RESUME P. :---:: 3.3. Split-Level Synchronization ULS/KLS shared memory can be concurrently accessed by muitipie entities (LWPS, user-interrupt handlers and kernel interrupt handlers).
We require mechanisms to synchronize access to this shared memory.
By analyzing how specific shared data structures are accessed by clifferent entities, we can obtain a set of specialized synchronization mechanisms that minimize user/kernel interactions.
First, ULS data structures such as the criticai, workahead and sieeping queues are read and written by LWPS and user-interrupt handlers. To synchronize access to such structures it suffices to inhibit (or "mask") user-interrupts (since preemptive context switches within the VAS take piace oniy in user-interrupt handlers, this inhibits LWP preemption as well). User-interrupt masking can also be used to implement mask_LWP_preempt ion ( ) and unmask_LW>reemption ( ) , which provide mutual exclusion for client-defined data structures.
A technique called virtual user-interrupt masking provides user-interrupt masking without user/kernel interactions in the normal case. This technique uses a mask /eve/ in the usched area and a request flag in the ksched area. The request flag is a bitmap with one flag per user-interrupt type.
To mask userinterrupts, the ULS increments the mask level. Whenever the kernel wants to deliver an interrupt and finds its mask level nonzero, it sets the corresponding bit in the request flag. When the U LS unmasks userinterrupts it decrements the mask level. If this returns to zero and the request flag is set, the ULS calls the appropriate handler to service the interrupt.
Second, the tables of sleeping and workahead LWPS in the usched area are written by the ULS and read by the KLS. These tables are read by the KLS only while the VAS is preempted or has yielded. If a VAS is preempted while the ULS is writing the tables, the KLS sees inconsistent data. To prevent this, we need a VAS preemption masking mechanism.
"Virtual" masking can also be used to implement this mechanism, using a preemption mask flag in the usched area and a preemption request flag in the ksched area. While the mask is nonzero, the VAS cannot be preempted by another VAS. Upon unmasking preemption, if the ULS finds the request flag set, it Cak3 yieldo.
Third, several items in the ksched area (Dx, T now , ready_ for_/0) are written by kernel interrupt handlers (clock, 1/0) and read by the ULS. It is possible to do virtual masking of kernel interrupts, but this has the drawback of requiring a system call to service interrupts that occur while kernel interrupts are masked.
By exploiting specific properties of these items, simpler solutions are possible, For example, if reading or writing a single word is atomic, then a data structure consisting of a single word (e.g., the ready_ for_/O flag in an l/O descriptor) requires no synchronization mechanism. For multi-word quantities such as Dz and T~w that are monotonically increasing or decreasing, a consistent value can be obtainea by repeatedly reading me quantity until two successive reads result in the same value.
Finally, several items in the usched area (e.g., DA, Tneti, runnable and waiting_ for_lO) are read by kernel interrupt handlers and written by the ULS. Again, we can exploit specific properties of these items to achieve simple synchronization mechanisms. Single-word flags require no synchronization if word access is atomic. For multi-word quantities (DA and T..fi) the ULS masks preemption during access. If a kernel interrupt handler finds that preemption is masked, it assumes that a multi-word quantity is inconsistent and takes appropriate action. For instance, if T.em k inconsistent, the clock interrupt handler delays checking for INT_TIMER delivery until the next clock tick; if DA is inconsistent, the preemption request flag is set.
Discussion
Split-level scheduling introduces new protection problems: a malicious or incorrect program may keep VAS preemption masked indefinitely, or it may execute indefinitely without changing its deadline. Either of these actions would starve all other VASS. A "watchdog timer" can be used to detect such conditions, and to kill or demote the offending process.
Deadline/workahead scheduling has both "hard" and "soft" variants: the distinction is whether or not processes reserve CPU capacity in advance. In the hard variant, each new LWP specifies its workload (message rate and CPU time per message). The KLS conducts a schedu/abi/ity test to determine whether the workload can be accommodated and if so, with what logical delay bound. This test involves a simulation under worst-case load, and is described in [2] . in the soft variant, no such screening is done, and it is possible for the system to fall behind schedule.
SLS is not restricted to deadline/workahead scheduling; it can be adapted to other policies, such as static priorities or usage-based timesharing policies. The policy dictates the contents of ULS/KLS shared memory; in general, the usched area contains the highest priority among runnable LWPS in the address space, while the ksched area contains the highest priority among runnable LWPS in other address spaces.
MEMORY-MAPPED STREAMS
Each real-time LWP in a CM application handles a stream of CM data. The source and sink of each stream are typically l/O devices, and CM data must be moved to or from the kernel address space. A mechanism for this usedkernel IPC has three components:
. Control and synchronization:
This includes 1/0 initiation and preducer)eancwmer eynehranizatian.
. Data location transfer: If the addresses of data buffers in the user VAS change, they must be transferred from the user to the kernel (if the user determines the buffer addresses) or vice versa.
. Data transfer: The actual transfer of data, perhaps by copying or VM remapping.
Traditional user/kernel IPC mechanisms require a user/kernel interaction for one or more of the above components in every 1/0 operation. For example, the UNIX read () system call performs all three components.
UNIX stn.mfure in an area of memory shared between user and kernel. For concreteness, we discuss the synchronization structure and mechanism for the case when a user LWP (scheduled by a split-level scheduler) reads CM data from an MMS. The synchronization structure contains the following data:
2 The basic technique of MMS (shared-memory syncJwonization structures) can also be used for user/user IPC. We desoribe only the user/kernel case here.
3 Streams in which a storage device sources or sinks data typically have large end-to-end delay bounds (e.g., a second or more), so buffering may be used to increase the system efficiency and responsiveness.
Streams that are part of an inter-human conversation or conference have low end-to-end delay bounds (tens of milliseconds) must use smaller buffers. The buffer size may change dynamically; for example, the ACM E audio output process must use a small buffer if any of the streams it is currently handling is part of a conversation; otherwise it can use a large buffer. This race condition is avoided, however, by setting waiting_for 10; if an 1/0 interrupt occurs during the critic~peri~d, it will 4 ACM 1/0 device such as a D/A converter is always active it continually does I/O, periodically generating interrupt when a block of data has been input or output. A file system is generally passive; 1/0 must be initiated by a system call; this call may trigger a chain of operations via l/O completion interrupts, but eventually another system call is needed to restart VO. A passive stream, such as a file, can be made active by using a time-based kernel activity (e.g., polling) to restart l/O wthout intervention from the client. 
4.3, Dataand Data Location Transfer
The mechanisms for transferring data location, and the data itself, are largely independent of control and synchronization.
Some possibilities are:
Datais passedin pages of physical memory that are statically shared between kernel and user. Data location is implicit. Data copying may still be necessary: for user writing, the kernel may need a copy of the data (e.g. for retransmission) after the page has been reused; for user reading, the client may need to write the data to another MMS.
Data is passed in a fixed range of virtual pages that are mapped dynamically to physical pages. Data location is implicit, and copying can be avoided in some cases.
The kernel and user share an array of "message descriptors" that contain pointers to 'blocks of da~a. Data may be transferred by remapping, by copying, or by copy-on-write, The optimal choice of mechanism depends on factors such as remapping cost and message size. The control and synchronization mechanism described earlier may have to be slightly modified in some cases; for example, the Nr.ad and Nwtie variables may need to be defined in terms of pages or messages instead of bytes.
PERFORMANCE
In this section we show by example how splitIevel scheduling and memory-mapped streams reduce the number of user/kernel interactions.
We then compare the performance of split-level scheduled LWPS and MMSS with other alternatives for scheduling and 1/0.
Example Scenario
To see how split-level scheduling and MMSS together reduce the number of user/kernel interae tions, consider the following scenario (see Figure 8 ). An application (say the ACME server) has two realtime LWPS and one background LWP: a device 1/0 LWP P~for audio output, a network 1/0 LWP PN reading from a CM connection, and an event-handling LWP PE. P~has an MMS for output to the audio output device, which interrupts every 30 ms. This MMS'S buffer is small (e.g., because the stream it is handling is part of a low-delay conversation).
PN has an input MMS from its network connection; 1/0 is passive and the MMS buffer is large (e.g., because the data is coming from a file). The LWPS are scheduled using a split-level scheduler. A typical sequence is as follows.
(1)
At time 10 ms P. completes processing a block of audio data and calls~S_write In this scenario, the only user/kernel interaction is the user interrupt at time 30 ms. No system calls for 1/0 or scheduling are needed. An INT_IO_READY at time 35 ms is also eliminated.
Performance Evaluation
In this section we compare the following alternatives for structuring CM applications:
(1) Spiit-ievel scheduled LWPS (SLS-LWPS) using MMSS for 1/0.
Threads using separate system calis for scheduling and 1/0. (3) LWPS without spiit-ievel scheduling (pure LIWs) using UNiX asynchronous, nonblocking i/O. In this case, if an LWP does not find data available when it does a non-blocidng read (), it has to wait for a signal and then do a select ( ) before Caiiing read ( ) again.
We have implemented prototypes of split-level scheduling and memory-mapped streams, and measured the CPU times of their basic operations on a DECstation 3100 (a 14 MIPS machine representative of current RISC workstations).
For the other approaches, we measured scheduling and i/O synchronization costs on a DECstation 3100 running Mach 2.5.
Consider a thread or LWP that reads a CM message from an i/O descriptor, processes the message, and then changes its deadline to that of the next message in the stream. Table 1 shows the total scheduling and 1/0 synchronization overhead per message for various scenarios.
The times shown in Tabie 1 are a significant fraction of typical CM message processing costs. For instance, scaling a 2K block of 8-bit audio samples takes 1.0 ms, and mixing two 2K blocks takes 1.1 ms. Thus, the scheduling and i/O synchronization overhead using threads and pure LWPS ranges from about 15-25% of the total message processing time. inputting an audio stream and distributing it on two CM connections.
Thus, the server has five (workahead) network i/O processes and three (periodic) device 1/0 processes.
Threads incur 4 times the overhead of SLS-LWPS at all message rates; pure-LWPs are 6 times as expensive as SLS-LWPS.
Pure LWPS and threads incur CPU overheads of 33% and 23% at 200 messages per second. This message rate is realistic for some Iow-deiay applications which need an end-toend delay on the order of 10 ms (200 message/see represents a packetization delay alone of 5ms). Moreover, such a high message rate may aiso be achieved instantaneousiy by moderate-delay applications when they are working ahead.
RELATED WORK
The work described in this paper is related to several directions of current OS research.
. User-ievel functionality.
Modern operating systems such as Amoeba, Chorus, and Mach shift functionality from kernel to user level to improve software structure.
in contrast, our work shifts functionality to user level to increase performance. overhead for an ACME server as a funotion of message rate. The server workload consists of 5 (workahead) network 1/0 processes and 3 (non-workahead) device 1/0 processes. well-suited to continuous media (more generally, it may not be well-suited to future distributed systems in which speed-of-light delays dominate throughput limits).
MMSS provide efficient local asynchronous communication.
Example of related work inolude the asynchronous RPC proposed by Gifford [8] and the dataflow model of Synthesis [10] . II Efficient local data transfer. In UNIX-type systems, 1/0 and IPC performance is limited by the overhead of data copying. Systems such as Mach, DASH and Topaz have attacked this problem using techniques such as VM remapping and shared memory [11, 12, 14] . The MMS mechanism is complementary to this work it attacks the overhead of control rather than data movement, SLS is closely related to recent work on multiprocessor operating system support for parallelism, including the Psyche multiprocessor system [13] and scheduler activations [3] . In Psyche, ULSS schedule LWPS on kernel-supported threads.
The kernel notifies the user VAS of events, such as blocking cross-domain invocations, that affect the ULS. User/kernel shared memory is used to efficiently communicate LWP identifiers and to request timer userinterrupts. A schedu/er activation k a thread-like execution oontext in which an LWP can run. As in Psyche, user-interrupts notify user VASS of scheduling related events.
Scheduler actuations do not use shared memory to communicate scheduling information between ULSS and the kernel.
In these two approaches, however, the kernel and the ULS scheduling policies are independent. As a result, the approaches cannot correctly prioritize LWPS across threads that may be running in different address spaces that are contending for the the same processor.
They also cannot exploit policy-specific information (e.g., LWP priorities) to reduce user-kernel interactions.
For CM applications, the CPU scheduling approach of Synthesis [10] represents an alternative to split-level scheduling. The Synthesis model is based on a rate-control feedback.
Processes make no oalls to indicate their temporal progress; instead, the kernel adjusts time-slice quanta based on queue lengths. This approach is well-suited to some situations (e.g., audio DSP with little slack CPU time).
The deadline/workahead scheduling policy is derived from the earliest-deadline-first policy [9] , but differs in its allowance for workahead.
In Symunix II [~, parallel applications are implemented as a collection of UNIX prooesses communicating through shared memory. Processes use virtual preemption masking while holding short-duration busy-waiting locks and virtual signal masking while updating shared memory. Unlike virtual user-interrupt masking, virtual signal masking requires a new system call to handle pending signals after unmasking interrupts.
Like MMSS, DEMOS links [5] can have associated shared memory to transfer data between address spaoes.
However, this approach does not reduce control overhead; synchronization is necessary after each transfer.
MMSS differ from memory-mapped files in several ways. A CM stream may be larger than a VAS, and an MMS need not contain the entire CM stream; since CM streams are accessed sequentially, a small circular buffer suffices. M MSS avoid the overhead of page faults. Also, sinoe data is "released" explicitly, page replacement algorithms are not needed.
Pinally, the URPC mechanism developed by Bershad [6] uses shared memory to reduce kernel interaction in local cfient/server IPC on sharedmemory multiprocessors. This is similar in spirit to MMS, though the setting is different.
CONCLUSION
Existing operating systems incorporate design principles that are contrary to the needs of applications directly handling real-time streams of continuous media data (digital audio and video):
The request/rep/y paradigm (the basis of centralized systems as well as RPC-based and objectoriented distributed systems) is non-optimal for stream-oriented CM data.
Assumptions
about temporal locality and delay tolerance of data accesses leads to the use of caching and buffering, which are often inappropriate for CM data.
Scheduling policies in current systems have the goals of~airness, maximum system throughput, and fast interactive response.
CM applications have real-time requirements that may conflict with these goals.
Starting with the goal of supporting CM applications, we have developed two interrelated mechanisms, sp/it-/eve/ scheduling and memory-mapped streams, for scheduling and IPC. We have described their use in a typical CM application (the ACME server) and have compared their performance with that of the analogous mechanisms in UNIX. They improve performance by reducing the number of user/kernel interactions.
Split-1evel scheduling is most effective when switches between LWPS within a VAS are more f requent than switches between VASS. This is typical of CM systems when there is at most one VAS with lowdelay processes (such as ACME's device 1/0 processes) that require CPU time at frequent intervals. To best exploit split-level scheduling, the 1/0 server should be the only application run on the workstation, CM playback and record applications have only highdelay processes; hence compute servers and data servers may run multiple applications of this type and still benefit from split-level scheduling.
These mechanisms are applicable for purposes other than CM.
For example, memory-mapped streams could be used for access to a sequential disk file or a network stream connection.
Process control applications (e.g., [1]) have scheduling requirements similar to those of CM. Split-level scheduling could be used with a time-slicing policy for a situation where a VAS contains both interactive and background processes, More generally, the mechanisms may be useful in any situation where the rate of 1/0 and scheduling operations, and the cost of user/kernel interactions, are high.
