Abstract-In order to handle mixed criticality flows in a realtime embedded network, switched Ethernet with Quality of Service (QoS) facilities has become a popular solution. Deficit Round Robin (DRR) is such a QoS facility. Worst-Case Traversal Time (WCTT) analysis is mandatory for such systems, in order to ensure that end-to-end delay constraints are met. Network Calculus is a classical approach to achieve this WCTT analysis. A solution has been proposed for switched Ethernet with DRR. It computes pessimistic upper bounds on end-to-end latencies. This pessimism is partly due to the fact that the scheduling of flows by end systems is not considered in the analysis. This scheduling can be modeled by offsets between flows. This modeling has been integrated in WCTT analysis of switched Ethernet with First In First Out (FIFO) scheduling. It leads to a significant reduction of delay upper bounds.
I. INTRODUCTION
Switched Ethernet has become a popular solution in the context of embedded systems. For example, the Avionics Full DupleX switched Ethernet network (AFDX) is the de facto standard for the transmission of critical avionics flows. It implements a First-in First-out (FIFO) service discipline in switch output ports. Actually, two priority levels are available, but they are rarely used. In this avionics context, a WorstCase Traversal Time (WCTT) analysis is mandatory, in order to ensure that timing constraints are respected. Network Calculus (NC) is classically used for this WCTT analysis [1] . It considers FIFO scheduling. This approach gives pessimistic upper bounds on end-to-end latencies, due to over estimation of network traffic and under estimation of network service for the reason of mathematical feasibility. These upper bounds can be significantly reduced by taking into account the scheduling of flows by source end systems. Indeed NC approach in [1] makes no assumption on this scheduling. Thus it considers the worst-case scenario. Taking into account this scheduling comes to associate an offset to each flow. NC approach in [1] has been extended with offsets in [2] . Since end systems interconnected by an AFDX network are not synchronized, offsets are defined between flows generated by the same end system. The approach in [2] leads to significantly lower delay upper bounds (more than 40 % on a typical industrial configuration). This extended approach can be applied with any offset assignment algorithm. In literature the effect of offset integration in different networks is shown, for instance, [3] shows the effect of offset integration in FIFO and Priority Queue in the context of CAN network. A response time analysis is also done in [4] , [5] , [6] for CAN messages with offset.
The fact that worst-case scenarios have a very low probability to occur leads to a very lightly loaded network. Typically, less than 10 % of the available bandwidth is used for the transmission of avionics flows on an AFDX network embedded in an aircraft. One solution to improve the utilization of the network is to introduce Quality of Service (QoS) mechanisms. Deficit Round Robin (DRR) is such a mechanism and it is envisioned for future avionics networks.
DRR scheduling was proposed in [7] in order to achieve fair sharing of network resources among the flows and it is wellknown for its low complexity O(1), under specific constraints, but an undeniable latency. In literature [8] , [9] , [10] , [11] a significant improvement in latency and fairness have been proposed along with some implementation techniques while still preserving O(1) complexity.
A WCTT analysis for DRR has been proposed in [9] . It is based on network calculus and it doesn't make any assumption on the scheduling of flows by end systems.
The first goal of this paper is to integrate offsets in the WCTT analysis for DRR in [9] . This integration is done in the same way as in [2] . The second goal of the paper is to evaluate the reduction brought by offsets on delay upper bounds, considering a realistic case study.
The paper is organized as follows. Section II presents the context of the study. It includes a description of network and flow models (II-A), DRR scheduling policy (II-B) and its latency (II-C), recall of NC approach (II-D) and delay bound computation using NC (II-D3). In section III, we present a method to integrate offset in NC. Section IV shows results on a case study based on a realistic industrial configuration. Section V concludes the paper and gives directions for future works.
II. CONTEXT A. Network model
This paper considers a real-time switched Ethernet network. The switched Ethernet network interconnects a set of end systems by full duplex links, defined by IEEE 803.1e., with maximum transmission rate of R Mbps. A flow v i from each end system is forwarded through output port h of switch S j in its path based on a predefined forwarding table. A set of buffers in each output port is managed by a scheduler supporting a scheduling policy like First-In-First-Out (FIFO), Fixed Priority (FP) queuing or Round Robin (RR) etc. In this paper, we consider that the network uses Deficit Round Robin (DRR) scheduler at each output port. An example of such network is shown in Figure 1 which interconnects 5 end systems (e 1 . . . e 5 ) to transfer 13 flows (v 1 . . . v 13 ) via 3 switches (S 1 . . . S 3 ). Each link provides a bandwidth of R = 100M bps. Table I shows the temporal characteristic of each flow. Each flow v i from a source end system initiate a sequence of frames according to the minimum inter-arrival duration T i imposed by a traffic shaping technique. The size of each frame of flow v i is constrained by a maximum frame length (l max i
) and a minimum frame length (l min i ). Each flow v i follows a predefined path P i from its source end system till its last visited output port, and then arrives at its destination end system. Table II show the DRR scheduler configuration at each output port
B. Deficit round robin scheduler
Deficit Round Robin (DRR) was designed in [7] to achieve a better quality of service by fair sharing of available network bandwidth among the flows. DRR is basically a variation of Weighted Round Robin (WRR) which allows sharing of server bandwidth among variable length flow packets. A DRR scheduler divides the flow traffic based on few predefined classes and serves each class sequentially based on the presence of a pending flow in a class buffer and the credit assigned to the class. The DRR service cycle is divided into rounds. In each round all the active classes are served. A class is said to be active when it has some flow packet waiting in output port buffer to be transmitted. The basic idea of DRR is to assign a credit quantum Q h x to each class C x at each switch output port h. Q h x is the number of bytes allocated to class C x in each round at port h. The quantum assigned to a class should be at least the maximum frame size l max x of class C x flows at port h. At any time, an active class can receive service of Q h x bytes plus a deficit ∆ h x from previous round. Each time when a class C x is selected by the scheduler, Q h x is added to its deficit ∆ h x . As long as a C x queue is not empty and the remaining credit is larger than the size of the head-of-line packet, this packet is transmitted and the credit is reduced by this packer size. Thus the scheduler moves to the next active class when either C x queue is empty or there is not enough credit to serve the next packet. In the former case, there is no deficit for the next round, i.e. ∆ h x = 0. In the latter one, the remaining credit is kept as a deficit ∆ . The packets are sent as long as the queue is not empty and the deficit is larger than the head-of-line packet (line [8] [9] [10] [11] [12] . If the queue becomes empty, the deficit is reset to 0 (lines 13-14).
C. Deficit round robin scheduler latency
A DRR scheduler serving n active classes at a given output port h defines a long-term service rate ρ h x to the class C x , which can be computed by:
It is worth noting that this is a long-term service rate for class C x , the actual service rate could be different on a smaller interval. The actual service rate can be given by the stair case curve shown in Figure 2 . The DRR scheduler latency Θ h x experienced by a class C x flows at output port h is defined as the delay before C x packets are served at their long-term service rate. Before the class C x starts receiving its service at its long term service rate, it could wait for a cumulative latency due to the nature of the DRR scheduling. This cumulative latency has been characterized in [8] and considered as two parts:
• The delay X h x before class C x receives service for the first time.
• Another delay Y h x before class C x receives service at long-term service rate, if it was served at reduced rate in the first round. The delay X h x is due to the fact that if a class C x flow arrives at the output port at an instant when it just missed its turn to be served in the present round, then it must wait for the next turn. It is shown in [8] that this delay is maximized when class C x has to wait for all the other classes which are being served with the maximum transmission capacity. This maximum delay is computed as:
Where ∆ max,h j is the maximum deficit of class C j at node h. Since, in any DRR schedule round, class C j packets are served as long as the remaining credit is not smaller than the size of the head-of-line packet, thus the maximum deficit will always be smaller than the size of the largest packet l max,h j of class C j flows.
The delay Y h x is based on the fact that class C x might receive minimum service in the first round, i.e. it might be served at less than its guaranteed long-term service rate and it has to wait for the next turn to get its long-term service rate. According to [8] , this delay is maximized when class C x receives minimum service and all the other classes receives maximum service in first round. For a class C x , the minimum used service is considered to happen when the maximum credit deficit ∆ max,h x is left after its opportunity. Thus, minimum service received by any class C x at port h is computed by:
Whereas the maximum service is when all the credit is consumed by the given class.
Thus, the delay Y h x for the class C x flows can be computed by:
For further explanation on derivation of equation (4), readers can refer to [8] . Finally, the DRR scheduler latency Θ h x experienced by a class C x flows at output port h is given by:
D. Existing network calculus approach for end-to-end delay calculation
In this section we summarize the worst case traversal time (WCTT) analysis for DRR with Network Calculus (NC) modeled in [9] .
The NC theory is based on the (min, +) algebra. It has been proposed for worst-case backlog and delay analysis in networks [12] . It models the traffic and network elements by piecewise linear curves called arrival curves and service curves respectively.
1) Arrival Curve: In NC, an arrival curve represents an over-estimation of the traffic of a flow v i at an output port h. At any instant t, an arrival curve can be used to model a flow v i at its source end system as: , where T i is the minimum inter-frame arrival time of flow v i . A frame of flow v i can experience jitter due to the fact that it can be delayed by other frames before it arrives at a port h. This jitter can be integrated into the arrival curve by left shifting the curve by jitter value, for more information on jitter integration readers can refer to [1] .
In a DRR scheduler, the arrival traffic of a class C x at an output port h is due to the queuing of different flows from class C x , thus the class C x traffic can be defined by an overall arrival curve which is the sum of individual arrival curves of each flow and is given by:
where F h Cx is the set of C x flows traversing port h.
2) Service
+ means max{a, 0}. In a DRR scheduler, the service is shared by all the DRR classes at the output port h and each class C x receives a fraction of maximum service rate R based on the assigned quantum Q h x as shown in equation (1) . Moreover, a class C x experiences the DRR scheduler latency Θ h x given by equation (5) . Therefore, the residual service to each class C x is given by:
In DRR scheduler, as the class C x flows alternate between being served and waiting for DRR opportunity, the actual service curve is a staircase curve but, for computation reasons, the NC approach considers a convex curve given by equation (7) which is an under-estimation of this actual staircase curve This curve is also shown in Figure 2 .
Fig. 2: NC DRR Service Curve
3) End-to-end delay bound: At a switch output port h, the delay experienced by a class C x flow v i is bounded by maximum horizontal difference between the arrival curve α h Cx (t) and the service curve β h Cx , and it is computed by:
A dataflow computation is implemented. At each output port, the output traffic curve for each flow is obtained by shifting to the left the input curve by the jitter in the port. This jitter is the maximum waiting time in the port buffer. At the end of the process, the end-to-end delay upper bound of a class C x flow v i is computed by adding delays in switch output ports:
III. INTEGRATING OFFSET IN NC FOR DRR SCHEDULER
The computation summarized in previous section makes no assumption on the scheduling of flows by the end systems. Thus, it assumes, for any flow, the scheduling which maximizes the waiting delay in output buffers. This worst-case scheduling is modeled by Equation 6 . Thus, it considers a burst of traffic where there is one frame from each flow at the same instant. This situation is most of the time impossible, since an end system distributes frame generations over time in order to produce temporal separation between transmission of frames. Such temporal separation is classically modeled by the assignment of an offset to each flow. In NC, the integration of offsets affect the computation of arrival curves. The offset integration in NC was first proposed in [2] for First-In-FirstOut (FIFO) scheduler. In this paper we extend this approach for DRR schedulers.
A. DRR scheduling with offset at source end system
Scheduling of the flows emitted by a given end system is characterized by the assignment of offsets which constrain the release times of flows and, consequently, their arrivals at switch output ports.
In the context of FIFO, [2] defines two kinds of offsets: In a source end system port, it is implemented by considering all frame generations within a time interval which includes all possible situations (e.g. twice the least common multiple of flow periods). O h r,m,n is the smallest possible duration between one frame from v m and one frame from v n within this interval. For details about the computation of offset at source end system readers can refer to the algorithm given in [2] .
In a switch output port, the computation of Relative offset O h r,m,n has to take into account flow jitters. Typically, f m delay between the source end system and the considered switch output port can be longer than f n delay, leading to a smaller Relative offset. [2] implements offset computation on a per end system basis. Indeed, with FIFO, all the flows generated by a given end system share the same bandwidth. Considering DRR, each class is considered separately and it gets a dedicated bandwidth. Therefore, for each end system, the effect of offsets is applied on a class by class basis.
It has to be noted that offsets cannot be defined between flows from different source end systems, since there is no common clock between end systems.
Let's consider the network configuration in Figure 1 . For the rest of the paper, we assume the definite offsets for flows at their respective end system as given in Table III. TABLE III: Definite Offset computation results for source end systems in Figure 1 v Figure  1 v Figure 3 illustrates the temporal separation of class C 2 flows v 6 and v 7 frames due to relative offset at e 1 , S In order to compute Relative offsets at switch output ports, we need to compute flow delays till these ports. This computation takes into account offsets in previous ports. It is detailed in the following section.
B. Delay computation
In [2] , an aggregation technique is used to integrate offset in NC. In DRR scheduler, flows of each class C x , from same source end system, can be aggregated as a single flow. This is valid because the flows of a class C x transmitted from same source end system are affected by temporal separation and share the same bandwidth. The aggregation technique takes into account the relative offset between the class C x flows. Now, we show delay computation through an example given in Figure 1 .
Let us calculate the node delay at output port S 1 1 for class C 2 flow v 6 . At an output port h the overall arrival curve α h Cx of a class C x flows can be computed as : a)
Step 1: Make i subsets SS i of class C x flows, based on the flows sharing same source end system. Since the arrival traffic at S 1 1 from class C 2 is due to flows v 6 , v 7 and v 4 and since v 6 and v 7 share the same source node e 2 , they belong to a subset SS 1 = {v 6 , v 7 }. Whereas, v 4 from source node e 1 belongs to another subset SS 2 = {v 4 }. b)
Step 2: Aggregate the flows of each subset SS i as one flow and characterize its arrival curve α h SSi . Based on the configuration given in Table I , definite offset given in Table III and the relative offset computed using equation (10) we have O Thus, we have α
SS2 as shown in Figure 6 . The service curve β C2 for class C 2 flows at output port S 1 1 can be computed using equation (7), which is also shown in Figure 6 . Now with this overall arrival curve α Now we show the evaluation of proposed approach on an industrial-size configuration. It includes 96 end systems, 8 switches, 984 flows, and 6276 paths (due to VL multicast characteristics). The flows are divided into three classes namely critical flows, multimedia flows and best effort flows. Table V shows the DRR scheduler configuration at each output port. Definite offsets are generated, using the algorithm in [2] . Figure 8 shows a comparison between classical NC approach and NC approach with integrated offset, the average improvement of the E2E delay bound computed in the given industrial configuration is 26.9% and a maximum gain of 70.05%. This is a significant improvement. However, as shown in [2] , a much higher average gain was obtained with FIFO scheduling on a similar configuration. This result is not surprising. Indeed, with DRR, only flows from the same class are offset dependent. It leads to smaller sets of flows. It has been shown in [1] that, bursts in each output port can be limited, since flows arriving from the same input link are serialized and, consequently, they cannot arrive at the same time. This serialization effect can be directly integrated in arrival curves, in the same manner as in [1] . As shown in Figure 9 , it leads to a further average reduction of 2.43%, with a maximum reduction of 14.75%. The reduction is small because, thanks to offsets, there are only few bursts.
In the figure 8 and 9 , the paths are sorted by decreasing order of comparative gain in E2E Delay computation. For example, in Figure 8 , there are at least 4000 flow paths for which the gain is more than 20%.
V. CONCLUSION
In this paper, we combine two existing contributions in the context of real-time switched Ethernet networks:
• worst-case traversal time analysis for deficit round robin service discipline, based on network calculus, • integration of offsets in worst-case traversal time analysis in the context of FIFO. First, we show how offsets can be integrated in WCTT analysis for DRR. Second we evaluate the benefit of this integration. On a realistic case study, the average reduction of worst-case end-to-end latencies is 26.9%. This result shows the significant impact of the scheduling of flows at their source nodes on worst-case latencies.
As future work, we plan to optimize WCTT analysis for DRR. Indeed, the existing approach builds service curves without considering effective traffic. Thus it considers that all the classes are always active, which might not be the case.
We also plan to extend our work to other service disciplines such as Weighted Round Robin, which leads to simpler implementations in switches.
