INTRODUCTION
Over the past decade there has been rapid growth in the need for reliable, robust, and high-performance communication networks. This has been driven in large part by the demands of the Internet and general data communications. New protocols, services, standards, and network applications are being developed continuously. However, the ability to deploy these in the current Internet is greatly inhibited by the need for changes in the forwarding loops of routers, which for performance considerations are usually implemented in custom logic. To overcome this obstacle, it has been proposed to place general-purpose processing engines in the data path of routers. Such network processors extend the traditional store-and-forward paradigm to a storeprocess-and-forward paradigm, which opens vast possibilities for applications like simple quality of service (QoS) forwarding to complex payload transcoding for wireless clients.
In terms of QoS, general purpose processing introduces an additional level of complexity into the system, since not only link bandwidth, but also computational resources have to be shared among packets of competing flows. While a significant amount of work has been done with respect to designing systems, which can provide guaranteed QoS to Ilows competing for bandwidth, processor sharing poses several new problems in this domain. The problem that we are considering is aimed at routers, where packet processing is performed at the output port. The data path through the output port is shown in Fig. 1 . Packets are received from the switch fabric and queued in per-flow queues. Then the processor scheduler assigns packets from the n queues to the m processing engines as they become idle. After processing, packets are again queued in per-flow queues before the link scheduler assigns them to be transmitted on the link. The processor scheduler can view each processing engine as a separate resource to be scheduled if they individually have capacities exceeding the requirements of any single Ilow. The scheduler can also consider all the processing engines as a single processing resource, which can be scheduled using multi-server variants of single server scheduling algorithms [ 11. In either case, the essential problem reduces to designing an efficient scheduling algorithm for sharing a single processing resource.
We provide mechanisms for such a system to give guaranteed bandwidth and computational resources to incoming flows. Guarantees in these two dimensions mean that a Ilow always gets its reserved shares except when: 1. A flow requires computational resources in excess of its reserved capacity and hence only a fraction of the incoming traffic is processed and forwarded to the link scheduler, possibly giving the Ilow a lesser share of its reserved bandwidth. 2. Or equivalently, a flow exceeds its link share resulting in too many packets being queued up at the link scheduler, which forces the processor scheduler not to give the Ilow its processing share.
Realizing such a system is fundamentally complicated by the fact that the execution times of various applications on packets are not known in advance, which limits applicability of well known bandwidth scheduling algorithms. Also, at a Ilow level, it is not clear as to how explicit or implicit admission control can be done as the processing requirements of a single Ilow are not known.
In this paper, we first present actual execution times of various applications on packets of varying lengths, measured on a programmable router. We show that for the restricted class of network applications, the processing times are strongly correlated to the size of the data being processed (i.e., packet length). We then use this correlation to predict packet execution times to perform admission control and to schedule packets for processing. We present a scheduling algorithm called Estimation-based Fair Queuing (EFQ), which unlike bandwidth schedulers uses the estimates of packet execution times and provides better delay bounds than processor scheduling algorithms which do not use packet execution times at all.
The paper is organized as follows: Section 11 discusses related work. Section 111 demonstrates the predictability of packet processing times and shows how admission control on processing resources can be done. Section IV describes the scheduling algorithm EFQ in detail and Section V presents the simulation results. Conclusions are drawn in Section VI.
II. RELATED WORK
A significant amount of work has been done in defining architectures for software based programmable routers [ 2 1 [ 3 1 [ 4 1. In particular, we have extensively used the router plugins architecture in our work [ 51 [ 41. Most systems enforce isolation of packet processing between Ilows (e.g., malicious packets cannot effect the proper processing of other packets). However, QoS issues at the level of processing are addressed only in a few cases. The commonly used NodeOS specification [ 61 asks for packets [Z%ii+ queue n to be processed by individual threads to allow for an accounting mechanism. However, methods for admission control and QoS scheduling are not described. Reference [ 7 1 describes the problem of scheduling computational resources among competing flows, but relies on being able to pre-determine the processing time of packets. Also the more important issue of correlating the cycle rate of a flow to the bit rate is not addressed. There are also approaches where the expressiveness of the processing environment is restricted (e.g, no loops) to give execution time guarantees [ 8 1, which limits the usefulness to simple header processing applications.
Packet service disciplines and their associated performance issues have been widely studied in the context of bandwidth scheduling in packet-switched networks [ 91. The performance of these disciplines has been compared to Generalized Processor Sharing (GPS) [ 101, which has been considered an ideal scheduling discipline based on its end-to-end delay bounds and fairness properties. Packet Fair Queuing (PFQ) disciplines, however, cannot be used for processor scheduling. PFQ disciplines like WFQ, WF"Q [ 111 use a notion of virtual time, whose correct update in a processor scheduler, requires precise knowledge of execution times of various packets in advance. Efforts have been made to design service disciplines which isolate the scheduler properties that give rise to ideal fairness and delay behavior, without emulating GPS [ 12 1. Notable among these are a class of schedulers called Rate Proportional Servers [ 13 1, which decouple the update of system virtual time from the finish times of packets in queues. But even these service disciplines, while avoiding the complexity of GPS emulation, schedule packets in order of pre-determined finish times, which in turn requires the knowledge of execution times of various packets in advance.
An exception to these disciplines is Start-Time Fair Queuing (SFQ) [ 141, which has been deemed suitable for CPU scheduling [ 1.51. Since SFQ does not need prior knowledge of the execution times of packets (packet lengths in a bandwidth scheduler), it is also applicable to scheduling computational resources. However, the worst case delay under SFQ increases with the number of Ilows and can in fact worsen in the presence of correlated cross-traffic as shown in [ 161. As we will show in later sections, SFQ tends to favor (provide lesser queuing delays) to Ilows which have a higher average processing time per packet to reserved processing rate ratio.
Our work is aimed at providing a way of estimating execution times of packets, which is used on a Ilow level for admission control and for QoS scheduling at a packet level.
RESERVATION OF PROCESSING RESOURCES
A key component of quality of service is the definition of the service that is requested by a how. While this is straightforward and well understood for link resources, reservations for computational resources are not as clearly defined. This comes from the unpredictability of general purpose processing. In principle, the halting problem states that it cannot be determined if an arbitrary program ever terminates. Thus, the execution time of an arbitrary piece of instruction code cannot be determined, in particular, when the execution time depends on data fields in the packet.
However, networking applications often require very regular, predictable processing. Our measurements, which are discussed below, indicate that for certain application classes, the processing times are very tightly correlated to the packet size. This holds true on a per-flow granularity, where processing requirements are dependent on the Ilow bandwidth, as well as on a perpacket granularity, where the processing time is dependent on the packet size. This correlation can be exploited to predict processing requirements of packets and Ilows and use the prediction for admission control and scheduling.
A. Predictability of Pwcessing Requirements

A. 1 Application Types
Applications that process packets on routers can be divided into two categories: header-processing applications and payload-processing applications [ 171. Header-processing applications are characterized by the fact that the processing of the packet is restricted to read and write operations in the header of the packet. This means that the processing complexity is in general independent of the size of the packet. Examples of header-processing applications are IP forwarding, transport layer classification, and QoS routing. Payload-processing applications, in contrast, are characterized by read and write operations to all the data in the packet, in particular, the payload of the packet. It is here that the processing complexity strongly correlates to the packet size. Typically, payload processing applications also show a header-processing overhead in addition to the payload processing. Examples of payload-processing applications are IPSec encryption, packet compression, and packet content transcoding (e.g., image format transcoding).
A.2 Measurements
We have measured the processing times for four applications: IP forwarding, which is a header-processing application, and encryption (CAST), compression (Adaptive Huffman Coding), and forward error correction (Reed-Solomon), which are payload-processing applications. The packet processing times were acquired using a programmable line card [ 18 1 on the Washington University Gigabit Router [ 191. Processing was performed in the Crossbow [ 5 ]/ANN [ 41 operating system.
The measured results over a range of packet sizes are shown in Fig. 2 . The average processing times are shown as lines and the error bars indicate the range of the 95% percentile of processing times. Note that we use time as the unit for processing cost. This is done to simplify the description of the scheduling algorithm and its analysis. In a realistic network, processing cost should be translated to processor cycles per second and then adapted to the particular router system, where the packets get processed, as described in [ 201.
For IP forwarding, the processing time is practically constant for all packet sizes, which shows the per-packet processing cost of header processing. However, the processing times of the three payload processing applications are clearly dependent on the packet size. The per packet processing time for these applications can be extrapolated for packets of size 0. With these observations, we can approximate the processing cost c of a packet of length 1 when processed by application a as C=a,+p,~l,
where pi, is the per packet processing cost and Pa is the per byte processing cost of application a. Thus, the processing requirements of these applications can then be described by two parameters: cr, and Pa. These parameters for the three applications are shown in Table 1 .
A.3 Online Estimation
Though the parameters a, and Pa have been determined from traces, given this strong correlation between packet sizes and execution times, it is possible to determine these parameters online and in fact improve them, using simple linear least squares regression techniques. As packets are processed the router maintains variables denoting the sums, C ci, Cl,, C cf, c If, c(c2 .2,) for each application a. These variables are updated on the arrival of a new (c,+i ,2,+i) pair. The parameters to be used in the estimation can then be computed as Pa = c, c, li -c, cz c, ldn c, 1; -c, li . C, h/n '
%=~r:i-p,-~ls
n n
It should be noted that there are also applications, where the processing time cannot be as nicely correlated to packet size as shown above. An example for such an application is MPEG encoding. For MPEG encoding a whole video frame is required to perform effective compression. With unencoded video frames typically exceeding a packet size, processing can only be performed once several packets of a flow are buffered. In this case the processing time varies significantly between packets, but it can be expected to be more evenly distributed over frames (i.e., maintained for the group of packets constituting a single frame, which are always processed together.
B. Bandwidth Expansion
Processing of packets on routers can affect the size of the packets after the processing is completed. For many types of applications (e.g., encryption, routing lookup) the packet size is not changed, but a few applications can significantly change the bandwidth of a Ilow (e.g., compression, FEC). To take these changes into account, we define an expansion factor, ya, that is the average output bandwidth divided by the input bandwidth. This factor is also shown in Table I . Note that the expansion factor can be dependent on packet size and data as for the compression application.
In an environment, where we want to be able to give service guarantees to data flows, it is typically necessary to explicitly reserve resources for that Ilow. This happens during the Ilow setup and allows the network to route a new Ilow in such a fashion that enough bandwidth is available on the chosen path. Now that we have shown that the processing requirements for a stream of data can be described in a simple manner, we can integrate this information into the Ilow setup process.
A reservation for a Ilow j with incoming bandwidth Bj that is processed by application a needs to reserve ya . Bj bandwidth on the outgoing link. The amount of processing Pj that is required (as fraction of one processor) depends on the bandwidth of the Ilow, the average size of packets lj, and the application parameters:
BJ . c PJ = ~ 1 = +J (a, + pa 1)
Thus, Ilow j can be admitted to any router that has Pj processing power and ya . Bj outgoing bandwidth available. How an efficient or optimal route can be found with these parameters is outside the scope of this paper. In principle, an optimal route can be found by combining processing and transmission costs into one metric [ 211. Another approach is to aggregate processing availability information together with bandwidth and topology data similar to PNNI [ 221.
IV. PROCESSORSCHEDULING
The choice of the packet service discipline for scheduling the processing resources is an important issue in guaranteeing end-to-end delay bounds and ensuring fair sharing of processors among competing flows.
We note that the exact processing time of an application on a packet of a given size cannot be pre-determined and hence precludes the use of many well known packet scheduling algorithms. However, we can use a good estimate of the execution time using parameters obtained in Section 111 for designing a scheduling algorithm that has good delay and fairness properties. In this section we describe how we build upon the class of rate-proportional servers, which have desirable properties that allow the use of these estimates to design a processor schedul-I-frame to I-frame). In such a case the parameters should be ing algorithm called Estimation-based Fair Queuing (EFQ). Rate Proportional Servers (RPS) are a class of scheduling algorithms designed according to the methodology presented in [ 131, which allows the designer to trade fairness of the algorithm with implementation complexity. Generally speaking, a rate-proportional server is a work-conserving server with the following properties: 1. The server has an associated system potential, which is updated to reflect the total work done by the server. 2. Each flow in the system has an associated potential. When a flow becomes backlogged, its potential is set equal to the system potential. When a flow is already backlogged, its potential is updated to reflect the normalized service received from the server.
By imposing conditions for the potential functions as given in [ 131 and by serving packets from flows such that at any instant the individual potentials of all backlogged flows are equal, it can be shown that rate proportional servers have delay and fairness properties comparable to GPS. WF'Q+ [ 161 is an important example of a scheduler belonging to the RPS class.
We build on this methodology in designing the EFQ processor scheduling algorithm for two important reasons. First, the methodology helps in designing algorithms with delay bounds and fairness comparable to GPS without the complexity of GPS emulation. More importantly, the methodology provides us with enough flexibility to decouple the update of system potential from the exact finish times of the packets in the queues, which addresses the problem of not knowing the exact processing times in advance.
A.2 Packet Selection Policy
A scheduling algorithm with optimal fairness would have to schedule single processing cycles according to the fluid Rate Proportional Server. However, in network processors, the smallest unit of processing is a complete packet. Context switching between packets is not considered here, because saving and recovering processing state is a relatively expensive operation compared to the short overall processing time for a packet.
Thus, to approximate a fluid RPS, packets should be sched-This accounts for the time spent by the server in servicing addiuled in order of their finish time with the earliest finish time tional traffic from flow B before processing packet from flow A. first. While this works perfectly fine for bandwidth schedulers, It is these additional delays caused by misordering of packets the lack of the knowledge of the actual execution times of the that we intend to reduce using the estimates of the packet execupackets, makes an exact implementation infeasible for proces-tion times we derived in Section 111 which improves the schedsor schedulers.
ulers knowledge of WJ,. However, to derive an approximate scheduler of this class, we can generalize the definition of a packet-by-packet RPS. Such a scheduler schedules two packets, j and Ic, of Ilows A and B, in the order in which they are more likely to finish processing, i.e., if Fi and F[ are random variables representing the finish times of these packets in the fluid RPS, then packet j is scheduled for service before Ic. if
B. Estimation-Based Fair Queuing
Estimation based Fair Queuing (EFQ) is a scheduling discipline designed for processor schedulers that uses the estimates of the packet execution times in ordering packets of various Ilows for processing. While the packet selection policy of any Rate Proportional Server can be changed to use these estimates, EFQ is derived by modifying WF'Q+ which is known to have the tightest delay bounds and low time-complexity among bandwidth schedulers. P(F, > F;) > 0.5.
(-3
Hence, it is the knowledge of the distributions of F,j and F[ which determines the accuracy with which schedulers can approximate GPS even if they use the same potential (or virtual time) functions. Also, since the potentials of individual flows are updated according to the normalized service received by the flows from the system, the finish time F,j is F;=P,+F, a where P, is the potential and R, is the rate of service reserved by Ilow A. While these are known in advance when determining Fj, W,j, which represents the service time required by packet j, is not. Thus, the random variable F,j is directly determined by W,j. Start-time Fair Queuing (SFQ) [ 141 (with a modified system virtual time) and WF"Q+ [ 161 are scheduling algorithms belonging to this class that represent the extremes with respect to the amount of knowledge of F,j. SFQ does not use any information about the service time of a packet and hence, according to the above policy, SFQ schedules packets in increasing order of P,, which makes it suitable for processor scheduling. WF"Q+, on the other hand, assumes that the exact service times of all packets are known in advance and thus determines the right order of servicing packets with probability 1.
A.3 Misordering Delay
Different schedulers using the same potential functions and ordering packets for execution according to the above defined policy can give varying delays to Ilows based on their knowledge of the random variables Wj.
To quantify these delays, assume that a scheduler of this class can be characterized by random variables xaJ ,+, which denote the event that the scheduler (with its knowledge of WJ, and Wf) makes a mistake in ordering packets j and Ic. I.e., P[x a3 ,+ = 0] is the probability that the scheduler orders the packets of these two Ilows couuectly, while P[xa3 ,bk = l] is the probability that the scheduler makes a mistake in the ordering. Then, the average misordering delay, 6 a, as seen by a packet of Ilow A is the additional delay caused by the scheduler misordering packets of Ilow A and Ilow B, which is 6, = P[X,j,$"
.
(7) a EFQ, like WF"Q+, uses a notion of system virtual time (system potential), defined by EFi is updated using (9) (10) (11) where Ef is the estimated number of instructions required to process packet Ic. This estimate is derived from the length of the packet Lf and the parameters a, and Pa of the application processing the Ilow using Equation 1:
Et =~i,+&Lf. (12) When the processor finishes processing this packet, the actual finish tag Fi is updated using feedback from the processor: (13) where A: is the actual number of instructions required to process packet Ic. This ensures that each Ilow is correctly &aged for processing time, even if the initial estimate was incorrect.
Given these tags, the EFQ scheduler, schedules packets in increasing order of their estimated finish time tags EFQ.
0-7803-7476-2/02/$17.00 (c) 2002 IEEE.
C. Example
The following illustrates the behavior of EFQ and compares it to that of SFQ and WF"Q+. Consider a set of flows, all of which send packets of the same length but at different rates and are processed by the same application. Fig. 3 shows six such flows, with flow 1 reserving 50% of the processing resource and the rest of the flows reserving 10% each. The size of a packet in Fig. 3 represents the actual processing time of that packet. Note, however, that the estimates for all packets are the equal, since they all have the same length and are processed by the same application.
WF"Q+ achieves an optimally fair schedule, because it is assumed the scheduler knows the actual processing times. Thus, the packets of flow 1 and the other flows alternate (due to the rate reservations). Out of flows 2-6, the packet of flow 2 is processed first, because it has the lowest actual execution time and therefore the lowest finish time.
EFQ expects all packets to have the same execution times. Thus, EFQ could pick any order of packets 2-6 to alternate with packets from Ilow 1. The worst case, which introduces most misordering delay, is shown in Fig. 3 . Here, the packet of Ilow 2 is processed after packets of Ilows 6,5,4 and 3 are processed, which all use more processing time than expected by scheduler. As a result, the packet from Ilow 2 experiences an additional delay due to the variation in actual processing times of these packets. However, these variations are much smaller (and bounded, for the applications in consideration) than the total processing times of the packets themselves. In particular, these delays are much smaller than those introduced by SFQ.
As shown in the example, in the worst case SFQ could delay the processing of the first packet of flow 1 until packets from all other flows are processed. This is due to all initial packets having the same start time.
In summary, EFQ processes most packets in the same order as WF"Q+. When either a Ilow reserves a much higher rate than others or has greatly differing processing requirements (due to differing packet sizes or applications), the variations in the actual executions times compared to estimated execution times do not change the scheduling order. Even in the case when the scheduling order of packets in EFQ varies from that of WF "Q+, the additional delay that is experienced by a packet is bounded by the variation in execution times as opposed to the total execution times of packets as in SFQ.
D. Analysis
From the example given above, it can be seen that for N flows, in the worst case, SFQ introduces a misordering delay of (14) This is obtained by using V'lc : xar,+ = 1 with the misordered packets being of maximum size and using Yb : Pb = P, in Equation 7, since the scheduler can make a mistake only when P,, < P,. Results in Section V also show that SFQ actually favors (i.e., gives lesser delays to) Ilows with packets which require greater average normalized service (i.e., higher F).
To analyze EFQ, assume that for a given packet length, the packet execution time estimates obtained in section 111 can be represented by uniform random variables Wf lying in the range [Ei -VJ, Ei + VJ. The EFQ scheduler misorders packet j and Ic when it determines that but the actual processing times are such that
In the worst case, we get maz V,
Vp""
Hence from Equation 7, the misordering delay for packet j due to packet k is limited to 6, = P[x~~,~~ = 11. 2 . (F + F) (18) and the worst case misordering delay is bounded by
From the above equation we can see that as the number of Ilows increases, &EFQ only increases with the variations in execution times as opposed to SSFQ which increases with total processing times. Also note that, with a better estimation, e.g., by including higher order moments in characterizing Wi, EFQ can more accurately determine the right scheduling order, resulting in a smaller ~EF& and thus approximating WF"Q+.
V. SIMULATION EXPERIMENTS
In this section, we present simulation experiments to demonstrate the improved performance of EFQ as compared to SFQ.
A. Simulation Setup
To compare the delay characteristics of the two schedulers, we use the following simulation setup. First, we obtain traces of the actual execution times of packets from different flows that are processed by different applications on the programmable router. These traces are then used by a packet generator to feed the two simulated schedulers: SFQ and EFQ. The speed of the processor in the simulator is 2GHz (about 10 times the speed of the processor on the Smart Port Card (SPC) [ 181 on which the actual measurements were made). The system has 32 flows with different packet sizes, which are processed by the four different applications. All the flows reserve the same procesing rate and adjust their sending rates to just saturate their share of the processing resource. These flows together require just below 100% of the system's processing resources. Thus, they can all be admitted and the measured delays are only due to scheduling and not due to queuing backlog. B. Delay Plots Fig. 4 shows the delays of vartous packets of a flow, which is processed by the forwarding application. The interarrival time of the packets of the flow is approximately 163 microseconds, which is just enough to saturate the flow's share of processing resources. Note the high and bursty delays experienced by the packets of the flow when scheduled by SFQ as shown in Fig. 4(a) . Since SFQ, always schedules packets with the minimum virtual time, a single packet of a flow can be delayed in the worst case by the equivalent of the sum of one packet processing time of all other flows. In the simulation this translates to a worst case misordering delay of 8218 microseconds. The maximum delay actually observed in Fig. 4(a) is about 6100 microseconds, implying an observed maximum misordering delay of 6100 -163 = 5937 microseconds.
For EFQ, much lower delays can be seen in Fig. 4(b) . This illustrates two things. Firstly, given the small execution time of forwarding as compared to other applications, the finish times of the packets of this flow where so different compared to the finish times of the packets of other flows that the errors in estimates did not change the scheduling order (i.e., Equation 17 was not satisfied for most comparisons of finish times). Secondly, the worst case delay that could be experienced by these packets is only 1312 microseconds which would occur if there were maximum variations in the estimated execution times for packets from all other flows at the same time. In the simulation, the maximum misordering delay observed is about 900 -163 = 737 microseconds. Fig. 5 shows the delays experienced by a flow being processed by the CAST encryption application, with the average packet size of the flow being 200 bytes and has a higher average processing time per packet compared to the forwarded flow. While the average delays experienced by the packets when scheduled using EFQ is close to the interarrival time of the packets indicating a very low misordering delay, the average delays seen in Fig. 5(a) are about thice the interarrival time of the packets. Fig. 6 shows the delays experienced by a flow being processed by the FEC application which requires much greater processing time per packet compared to the above flows. Here, the average delays seen by the packets when scheduled by SFQ are actually less than the interarrival time of the packets! indicating an average negative misordering delay, while those due to EFQ are are just about the interarrival time of the packets.
Two important conclusions can be drawn from these plots: 1. SFQ gives much higher misordering delay bounds than EFQ. 2. Across flows, while the misordering delays due to EFQ are on an average close to zero, they vary from high positve misordering delays (e.g., the delay of about 3.5 times the interarrival rate seen by the forwarding flow) to low negative misordering delays when scheduled using SFQ.
C. Biased Delay Bounds Due To SF&
The second conclusion can be explained by the work conserving nature of the two schedulers. If SFQ gives high positive misordering delays to some flows, there should be flows in the system which get low and in fact negative misordering delays, while EFQ gives low (close to zero) average misordering delays for all flows. We actually show a correlation between the misordering delay experienced by the packets of a flow and the average processing time per packet to reserved processing rate ratio (i.e., R Ea"US), SFQ favor; and gives less misordering delays to flows with higher average processing time to reserved rate ratio over flows with a lower ratio. Given a set of flows with the same potential, since SFQ can schedule them in any random order, it is very likely that a packet of a flow with higher average processing time to reserved rate ratio is scheduled before at least a,few flows with lower ratios, resulting in lower delays for such flows. EFQ by just using the estimates is able to rightly reverse this order. Fig. 7 shows the average misordering delay introduced by the two schedulers plotted with increasing average packet execution times. Note that all the flows have the same reserved processing rates. This plot clearly shows the above conjectured correlation between average misordering delay and average processing time per packet to reserved rate ratio.
D. Simulation Summavy
In summary, the simulation shows three main results. One is that the analytically derived worst case misordering delay is almost reached by the SFQ scheduler as shown in Fig. 4(a) . Second, EFQ shows a much lower and smoother scheduling delay. This is due to the delay depending on the variance of the processing times rather than the absolute processing times as in SFQ. Third, SFQ introduces unfairness by favoring flows with high processing time to reserved rate ratios. This behavior is not shown by EFQ, which provides fairness over a wide range of processing requirements.
VI. CONCLUSIONS
In this work, we have presented an approach to providing QoS guarantees for flows that are processed on nodes in the network. We have shown that network processing applications exhibit very regular and predictable processing patterns, which help overcome the obstacle of theoretically undeterminable computation times of arbitrary programs. The processing time estimations can be approximated by a linear function that we use for admission control. The Estimation-based Fair Queuing (EFQ) algorithm also uses these estimates to fairly and efficiently assign packets to processing engines. The analysis and simulation results show that EFQ performs significantly better in terms of misordering delay and fairness than a SFQ scheduler.
We believe these results are an important step in providing the type of QoS guarantees that are common for bandwidth schedulers in an environment where flows compete for processing resources.
