Real-time applications such as multimedia and gaming require stringent performance guarantees, usually enforced by a tight upper bound on the maximum end-to-end delay. For FIFO multiplexed on-chip packet switched networks we consider worst-case delay bounds for Variable Bit-Rate (VBR) flows with aggregate scheduling, which schedules multiple flows as an aggregate flow. VBR Flows are characterized by a maximum transfer size (L), peak rate ( p), burstiness (σ ), and average sustainable rate (ρ). Based on network calculus, we present and prove theorems to derive per-flow end-to-end Equivalent Service Curves (ESC), which are in turn used for computing Least Upper Delay Bounds (LUDBs) of individual flows. In a realistic case study we find that the end-to-end delay bound is up to 46.9% more accurate than the case without considering the traffic peak behavior. Likewise, results also show similar improvements for synthetic traffic patterns. The proposed methodology is implemented in C++ and has low run-time complexity, enabling quick evaluation for large and complex SoCs.
INTRODUCTION
In networks-on-chip, resources like wires, buffers, and switches are shared among multiple communication flows to provide cost efficiency. At the same time many applications have real-time requirements and, consequently, delay and throughput constraints on the communication. To guarantee maximum delay and minimum throughput for one given communication flow, the interference in the shared resources from other flows has to be analyzed and bounded. We assume that all traffic can be well characterized as flows and scheduled as aggregate, which means multiple flows are scheduled as an aggregate flow. For a given flow, we study the maximum interference of all other flows based on the network calculus theory [Le Boudec et al. 2004] .
In network calculus, flows are characterized as arrival curves and the service offered to flows by a network element such as a link or a switch is abstracted as service curve. Since the network contention for shared resources includes not only direct contention but also indirect contention, predicting the worst-case performance is extremely hard.
To calculate the accurate delay bound per flow, the main problem is to obtain the endto-end Equivalent Service Curve (ESC) and internal output arrival curves of individual flows in an arbitrary network of servers in terms of the latencies of the individual schedulers in the network. Since the required theorems for calculating performance metrics of VBR traffic transmitted in the FIFO order and scheduled as aggregate have not been represented so far, we [Jafari et al. 2011 [Jafari et al. , 2012 have defined and proved them based on network calculus [Chang 2000; Le Boudec et al. 2004] . In Jafari et al. [2011] , we proposed and proved the required theorem for deriving the output characterization of VBR traffic under the defined system model to have exact vision about output metrics used for obtaining performance bounds. In Jafari et al. [2012] , the required theorems for computing end-to-end ESC and end-to-end delay bound are defined and proved. Moreover, we presented a simple example to show how the proposed theorems can be used in the network. The method presented in Jafari et al. [2012] only considers direct contentions of a tagged flow. In this article, we use the proposed theorems [Jafari et al. 2011 [Jafari et al. , 2012 ] to present a formal approach for performance analysis modeling both direct and indirect contentions.
VBR is a class of traffic in which the rate can vary significantly from time to time, containing bursts. Real-time compressed voice and video and time-sensitive bursty data traffic are examples of VBR traffic. Real-time VBR flows can be characterized by a set of four parameters, (L, p, σ, ρ) , where L is the maximum transfer size, p peak rate, σ burstiness, and ρ average sustainable rate [Le Boudec et al. 2004] . For instance, in a NoC with a link data width of 32 bits, frequency of 500 MHz. This means a link bandwidth of 16 Gbits/s (32 bits×500 MHz). An HDTV video stream can be characterized with L = 32 bits, p = 16 Gbits/s, σ = 960 Kbits, ρ = 76 Mbits/s. Our assumption is that the application-specific nature of the network enables to characterize traffic with sufficient accuracy.
For an individual flow, called a tagged flow, we first consider resource sharing scenarios (channel sharing, buffer sharing, and channel&buffer sharing) in the routers and then build analysis models for different resource sharing components. We assume that the routers employ round robin scheduling to share the link bandwidth. Based on these models, we can derive the intra-router ESC for an individual flow. To consider the contention that a flow may experience along its routing path, we present a recursive algorithm to classify and analyze flow interference patterns. The algorithm uses the proposed theorems to analyze the effect of contention flows on the tagged flow. Based on this algorithm, we derive the end-to-end ESC and then Least Upper Delay Bound (LUDB) for a tagged flow under the mentioned system model. To show the potential of our method, we experiment three case studies to derive delay bounds and compare them with simulation results. It is worth mentioning that the article does not deal with the back-pressure, but calculates the buffer size thresholds to make sure the back-pressure does not occur in the network.
The remainder of this article is organized as follows. Section 2 gives an account of related works. In Section 3, we introduce the basics of network calculus. Section 4 discusses the underlying system model and notations in our analysis. Section 5 is devoted to the theorems required for computation of performance metrics. We present our formal method for the performance analysis and computation of LUDB in Section 6. Numerical results are reported in Section 7. Finally, Section 8 gives the conclusions and highlight directions for future work.
physics that can account for nonstationary observed in packet arrival processes. They also investigated the impact of packet injection rate and the data packet sizes on the multifractal spectrum of NoC traffic.
Network calculus is a mathematical framework for deriving worst-case bounds on maximum latency, backlog, and minimum throughput in network-based systems. It is able to model all traffic patterns with bounds defined by arrival curves. In this respect, designers can capture some dynamic features of the network based on shapes of the traffic flows [Bakhouya et al. 2011] . Network calculus can also abstract many scheduling algorithms and arrival classes at single queue with multiplexed arrival flows, by service curves. The service curves through a network can be convolved as a single service curve. Hence a multi-node network analysis can be simplified to a single-node analysis. Regarding these two features, network calculus can analyze many scheduling algorithms and arrival classes over a multi-node network in a uniform framework while classical queuing theory separately models different combination of them [Ciucu et al. 2012] . The probabilistic version of (deterministic) network calculus is stochastic network calculus. In some networks, such as wireless networks, the service offered by a communication channel may vary randomly over time due to channel contention and impairment. Such networks can provide only stochastic services and guarantees. For example, Rizk and Fidler [2012] use stochastic network calculus to derive per-flow endto-end performance bounds in a network of tandem queues under open-loop fBm cross traffic, which is a model for self-similar and long-range dependent aggregate Internet traffic. Since we employ deterministic network calculus, in the rest of our article, network calculus refers to the deterministic type. A network calculus-based methodology [Bakhouya et al. 2011] analyzes and evaluates performance and cost metrics, such as latency, energy consumption, and area requirements in on-chip interconnects. Authors in this article compare 2D mesh, spidergon, and WK-recursive topologies using a given traffic pattern and show that WK-recursive outperforms mesh and spidergon in all considered metrics. The proposed model in this article is simple without considering virtual channel effects and modeling all interferences between flows sharing a resource in the network. Moreover, the model does not investigate the peak behavior of flows that leads to less accurate bounds while we consider performance analysis for VBR traffic in on-chip networks employing aggregate resource management.
The performance evaluation of real-time services in networks employing aggregate scheduling is particularly challenging because of its complexity. Aggregate scheduling arises in many cases. In addition to NoC, for example, it can also be applied for obtaining scalability in large-size networks. The Differentiated Service (DiffServ) [Blake et al. 1998 ] is an example of an architecture based on aggregate scheduling in the Internet. Despite the research efforts, few results have appeared on this subject. A survey on the subject can be found in Bennett et al. [2002] . Charny et al. [2000] consider a closed-form delay bound for a generic network configuration under the fluid model assumption. It is also extended by Jiang [2002] to consider packetization effects. However, these works can derive bounds only for small utilization factors in a generic network configuration. Martin et al. [2006 Martin et al. [ , 2003 and Bauer et al. [2010] employ Trajectory Approach (TA) to compute end-to-end delay bounds in FIFO systems. The Trajectory Approach computes all the possible trajectories of a system under constraints and then takes maximum end-to-end delays on them. Bauer et al. [2010] compare Network Calculus and the Trajectory approaches on a real avionics AFDX configuration and shows that The Trajectory approach computes upper bounds that are tighter than the upper bounds computed by the network calculus one. However, they derive delay bounds by summing per-node bounds, expectedly not arriving at tight bounds but reported as being at least close under practical conditions. Least Upper Delay Bound for VBR Flows in Networks-on-Chip with Virtual Channels
35:5
The computation of delay bounds through network calculus in feed-forward networks under arbitrary multiplexing has already been addressed in different lectures [Schmitt et al. 2008; Kiefer et al. 2010; Bouillard et al. 2010] . One of these works [Bouillard et al. 2010 ] describes the first algorithm that can compute the worst-case end-to-end delay for a given flow for any feed-forward network under blind multiplexing, with concave arrival curves and convex service curves. Since the problem is intrinsically difficult (NP-hard), the authors show that in some cases, like tandem networks with cross-traffic interfering along intervals of servers, the complexity becomes polynomial. Then, the approach is refined [Bouillard et al. 2011 ] in order to take into account fixed priorities. Bouillard et al. study networks with a fixed priority service policy that means each flow is assigned a fixed priority and try to take into account the pay multiplexing only once (PMOO) phenomenon. This stream of works deal with networks of arbitrary multiplexing also known as general or blind multiplexing, which means no assumption is made about the service policy while by assuming an explicit multiplexing scheme like FIFO, tighter bounds can be obtained.
A related stream of works [Lenzini et al. 2006 , Bisti et al. 2010 propose a methodology that calculates delay bounds in tandem networks of rate-latency nodes traversed by leaky bucket shaped flows. They also introduce a software tool, called DEBORAH, which implements algorithms employed in their methodology to compute delay bounds. These works consider servers in tandem or sink trees, while our proposed method computes end-to-end delay in a generic topology of NoC. Moreover, these works investigate computing delay bounds only for average behavior of flows and they do not consider peak behavior, which results in less accurate bounds.
Boyer [2010] tries to model shaping for an end-to-end delay where each server is shared by two flows. An applicative token bucket γ r,b is shaped by the bit-rate of the link λ R , leading to a two-slopes affine arrival curve, which is similar to one we consider for double leaky buckets. The paper investigates a simple topology, a sequence of ratelatency servers, each one shared by two flows with a FIFO policy, and a simple case of nested contentions. Moreover, authors state that their modeling is incomplete: when computing the worst-case traversal time of a flow, they model only the shaping on the considering flow, not on the interfering ones (leading to the title half-modeling of shaping) In this article, we investigate both nested and crossed contentions in general to model all flows (even interfering ones) with complex interferences in on-chip networks.
All aforementioned works in the subject of aggregate resource management compute delay bounds in various network infrastructures but not on-chip networks. As regards to NoC architecture, analytical models are very close to the reality of the system. For instance, a router in on-chip networks can be modeled in pure hardware, which means the microarchitecture is feasible for analysis. Therefore, network calculus can provide the analysis more accurate in on-chip networks. Qian et al. [2010] present analytical models for traffic flows under strict priority queueing and weighted round robin scheduling in on-chip networks. They then derive per-flow end-to-end delay bounds using these models. Like most of mentioned works, the proposed model by Qian et al. [2010] does not deal with peak behavior of flows, which results in less accurate bounds. The proposed method in this article considers performance analysis for VBR traffic characterized by (L, p, σ, ρ) in on-chip networks employing aggregate resource management. As such, our method achieves more accurate delay bounds.
NETWORK CALCULUS BACKGROUND
Network calculus is a mathematical framework to derive worst case bounds and analyze performance guarantees in networks. This article uses Traffic SPECification (TSPEC) [Wroclawski 1997 ] to model the average and peak characteristics of flow f j as 35:6 F. Jafari et al. arrival curve α j (t) = min(L j + p j t, σ j + ρ j t) in which L j is the maximum transfer size, p j the peak rate ( p j ≥ ρ j ), σ j the burstiness (σ j ≥ L j ), and ρ j the average (sustainable) rate. We denote it as f j ∝ (L j , p j , σ j , ρ j ). As shown in Figure 1 
In this article, we also consider a class of curves, namely pseudoaffine curves [Lenzini et al. 2006] , which is a multiple affine curve shifted to the right and given by β = δ T ⊗ [⊗ 1≤x≤n γ σ x ,ρ x ]. In fact, a pseudoaffine curve represents the service received by single flows in tandems of FIFO multiplexing rate-latency nodes. Due to concave affine curves, it can be rewritten as
, where the non-negative term T is denoted as offset, and the affine curves between square brackets as leaky-bucket stages. It is clear that a rate-latency service curve is in fact pseudoaffine, since it can be expressed as β = δ T ⊗ γ 0,R .
Given arrival curve α and service curve β, the delay is bounded by the horizontal deviation between the arrival and service curves.
SYSTEM MODEL AND NOTATIONS
As depicted in Figure 2 , we consider an NoC architecture in which every node contains a router and a core that performs its own computational, storage, or I/O processing functionality, and is equipped with a Network Interface (NI). As you can see in the figure, buffers are arranged to construct VCs in each input channel. To characterize flows based on their defined TSPEC, we assume unbuffered leaky bucket controllers (regulators) that do not buffer the packets, but stall the traffic producers or IPs [Jafari et al. 2010] .
Assumptions in this work are listed as follows.
-The NoC architecture can have different topologies.
-Packets have fixed length and traverse the network in a best-effort fashion with virtual-cut-through switching technique using a deadlock-free deterministic routing. -Routers have only input buffers and VCs. -Buffers are bounded and the network is lossless. -The router can have multiple VCs per in-port. VC allocation is deterministic and each VC receives an aggregate service. -All traffic is the part of TSPEC flows f = TSPEC (L, p, σ, ρ) at the entry into the network. -In each node that guarantees to serve the flow a pseudo affine service curve β = δ T ⊗ γ σ x ,ρ x , it is assumed that ρ ≤ ρ x and p ≥ ρ x . -Flows are classified into a prespecified number of aggregates.
-Traffic of each aggregate is buffered and transmitted in the FIFO order, denoted as FIFO multiplexing. -Different aggregates are buffered separately and each aggregate is guaranteed a rate-latency service curve. -We use a concrete policy, in this case, round-robin arbitration, to support the assumption on rate-latency service curve. Indeed, it can use some other arbitration policies as well. We also assume a fixed word length of L w in all of flows. -The peak rate is limited by the hardware. It is always 1 flit/cycle.
NoC designers can obtain per flow end-to-end delay bound in NoC architectures by the proposed method in this article under the mentioned assumptions.
Most of assumptions in this article have been widely used by some previous models [Qian et al. 2009; Jafari et al. 2010 ]. The system model in this article is more general than the mentioned models [Qian et al. 2009; Jafari et al. 2010 ] because they consider a Constant Bit Rate (CBR) flow in NoCs, defined by (σ, ρ), which is a special case of TSPEC. Furthermore, we have relaxed a significant limitation of the previous analytical model [Jafari et al. 2010] , which presumes the number of VCs for each PC is the same as the number of flows passing through that channel.
We use an example depicted in Figure 2 to explain terminology used in the article. The figure shows a network with 16 nodes numbered from 1, 2, . . . , 16 connected by links. There are 5 flows in the example denoted as f 1 , . . . , f 5 . Multiple flows share the same buffer and channel in the router are scheduled as a flow called aggregate flow. For instance, f {1,2} in router 3 is an aggregate flow. A tagged flow is the flow that we shall derive its delay bound and other flows that share resources with the tagged flow are contention flows. In this example, f 1 is the tagged flow, and f 2 , f 3 , and f 4 are contention flows. Notations in the article are listed in Table I .
We use subindex "( f i , r j )" for notations to indicate that they are related to flow f i in router r j . For example, α ( f 1 ,r 2 ) denotes the arrival curve of flow f 1 in router r 2 . We also employ subindex "(s i , r j )" to state notations are related to f s i in router r j . In this case, f s i can be one flow or an aggregate flow. For instance, β ({1,2,3},r2) indicates the service curve of aggregate flow f {1,2,3} in router r 2 .
PROPOSED THEOREMS
In this section, we review the earlier proposed theorems [Jafari et al. 2011 [Jafari et al. , 2012 , which are required for analyzing performance of VBR flows in a FIFO multiplexing network. 
The maximum transfer size of f i (flits) p i
The peak rate of f i (flits/cycle)
The burstiness of f i (flits)
The average rate of f i (flits/cycle)
The source node of f i r j
Router j β j
The service curve of r j R
The minimum service rate in a rate-latency service curve
The maximum processing latency of the arbiter in the router (cycles) T HoL
The maximum waiting time in the FIFO queue of the router (cycles)
The total processing delay that comes from contention flows and equals to the sum of T l and T HoL (cycles)
The word length in the flow (flits) C
The channel capacity (flits/cycle)
The minimum service rate in a pseudo affine service curve
The set of contention flows of tagged flow f t in the network The set of flows that share the same buffer in router r j with flow f s i
The number of virtual channels that passing flows from them share the same channel of router r j with flow
The set of flows passing through V C k in physical channel PC i of router r j
We first represent a theorem for computing delay bound as follows.
THEOREM 1 (DELAY BOUND). Let β be a pseudo-affine curve, with offset T and n leakybucket stage
γ σ x ,ρ x , 1 ≤ x ≤ n,
this means we have
, then the maximum delay for the flow is bounded by
PROOF. We have proved it in Jafari et al. [2012] . See Appendix A, in the online appendix available in the ACM Digital Library.
In the rest of the article, we apply Theorem 1 on the end-to-end ESC to calculate LUDB for a tagged flow. Due to our proposed method in Section 6, to obtain the end-to-end ESC, we should able to subtract contention flows from a service curve. To this end, we propose Proposition 1 and Theorem 2. In Proposition 1, we derive ESC with FIFO multiplexing where service curve is a pseudo affine curve. We then use Corollary 1 which is an immediate consequence of Proposition 1 to propose Theorem 2. This theorem is employed for deriving ESC in the underlying system model.
In Proposition 1 and Theorem 2, we obtain ESC with FIFO multiplexing under different assumptions.
PROPOSITION 1 (EQUIVALENT SERVICE CURVE). Let β be a pseudo affine curve, with offset T and n leaky-bucket stage γ σ x ,ρ x , 1 ≤ x ≤ n, this means we have
PROOF. We have proved it in Jafari et al. [2012] . See Appendix B in the online appendix available in the ACM Digital Library.
The following corollary is an immediate consequence.
be a pseudo-affine curve, with offset T and one leaky-bucket stage
PROOF. We can easily obtain this corollary by applying Proposition 1 for service curve β when n = 1.
We can specifically capitalize on Corollary 1 to obtain a parametric expression for the ESC of a tagged flow passing through a rate-latency node. We assume the number of flows passing through this node is K + 1. Therefore, for computing equivalent service curve for the tagged flow, we should subtract the arrival curves of other K flows. It can be calculated by iteratively applying Corollary 1 for K times. Without loss of generality, we presume that the tagged flow is flow K + 1. We now present the following theorem. 
THEOREM 2 (EQUIVALENT SERVICE CURVE FOR RATE-LATENCY SERVICE CURVE WITH K + 1 FLOWS). Consider one node with a rate-latency service curve β
PROOF. We have proved it in Jafari et al. [2012] . See Appendix C in the online appendix available in the ACM Digital Library. Theorem 3 states how output arrival curve of a VBR flow in a FIFO multiplexing node can be calculated.
THEOREM 3 (OUTPUT ARRIVAL CURVE WITH FIFO). Consider a VBR flow, with TSPEC (L, p, ρ, σ ), served in a node that guarantees to the flow a pseudo-affine service curve
PROOF. We have proved it in Jafari et al. [2011] . See Appendix D in the online appendix available in the ACM Digital Library.
We apply this theorem to calculate internal output arrival curves. For instance, in Section 6.2, we obtain the output arrival curve of a crossed flow when it is split into two nested flows.
FORMAL METHOD FOR LUDB DERIVATION
We have presented and proved the required theorems for deriving LUDB for VBR flows in on-chip networks based on aggregate scheduling with multiple virtual channels. As mentioned before, to calculate LUDB per flow, we should first obtain the end-to-end ESC that the tandem of routers provides to the flow. For calculating the end-to-end ESC, we propose two following steps.
-Step 1: Intra-router ESC -Step 2: Inter-router ESC In the first step, we consider resource sharing scenarios in the routers and then build analysis models for different resource sharing components. Based on these models, we can derive the intra-router ESC for an individual flow. In the second step, we consider the contention that a flow may experience along its routing path. Therefore, we present recursive algorithm End-to-End ESC to classify and analyze resource sharing models and flow interference patterns. Based on this algorithm, we can derive the end-to-end ESC for a tagged flow passing through the tandem of routers.
Step1: Intrarouter ESC
To compute intrarouter ESC for a tagged flow, it is necessary to investigate resource sharing. At each router, we identify three types of resource sharing, namely, channel sharing, buffer sharing, and channel and buffer sharing. Channel sharing means that multiple flows share the same out-port and thus the output channel bandwidth. Buffer sharing means that multiple flows share the same buffer but not channel. In channel and buffer sharing, multiple flows share both buffers and channels. They are scheduled as a flow called aggregate flow. 6.1.1. Channel&Buffer Sharing. Figure 4 depicts an example of flows sharing both channel and buffer in the router. As shown in the figure, we consider these flows as an aggregate flow. When an aggregate flow includes the tagged flow, it is called as tagged aggregate flow. In this respect, we calculate intra-router ESC for the tagged aggregate flow in the router instead of the tagged flow. In Section 6.2, we show how ESC of the tagged flow is extracted from the ESC of the tagged aggregate flow by removing contention flows one by one. For simplicity, in the rest of the article, "tagged flow" refers to both tagged flow and tagged aggregate flow.
6.1.2. Channel Sharing. Figure 5 depicts a channel shared between three flows f 1 , f 2 , and f 3 . Since the arbitration policy determines how much the flows influence each other, it has to be known. We assume that, while serving multiple flows, the routers employ round robin scheduling to share the channel bandwidth. Assuming a fixed word length of L w in all of flows, round robin arbitration means that each flow f s i in router r j gets at least a C |V (s i ,r j ) | of the channel bandwidth, where C is the channel capacity and |V (s i ,r j ) | the number of virtual channels that passing flows from them share the same channel of router r j with flow f s i . A flow may get more if other flows use less, but we now know a worst-case lower bound on the bandwidth. Round robin arbitration has good isolation properties because the minimum bandwidth for each flow does not depend on properties of the other flows.
Since network calculus uses the abstraction of service curve to model a network element processing traffic flows [Le Boudec et al. 2004] , we can also model a round robin arbiter in router r j for flow f s i as a rate-latency server [Gebali et al. 2009 ] that its function is as
+ , where R (s i ,r j ) is the minimum service rate and
is the maximum processing latency of the arbiter in router r j for flow f s i . R (s i ,r j ) and T l (s i ,r j ) are defined as follows:
35:12 F. Jafari et al. 
where D router is the delay for packet routing decision in a router. As mentioned in Section 5, a rate-latency service curve is in fact a pseudoaffine. Therefore,
6.1.3. Buffer Sharing. Figure 6 shows a buffer shared between two flows f 1 and f 2 . In this type of sharing, in addition to maximum processing latency for link sharing, T l , we introduce the head-of-Line delay for a tagged flow as follows.
Head-of-Line Delay (HoL). Given a flow comes at time t in a router, the maximum waiting time in the FIFO queue would be in time t + T HoL . Therefore, the total processing delay that comes from contention flows for tagged flow f s i in router r j , T T otal
We assume f 1 in Figure 6 is the tagged flow. According to Equation (7), r) is equal to the maximum delay for passing packets of flow f 2 in the buffer. According to the network calculus theory [Le Boudec et al. 2004] , the maximum delay for flow f j is bounded by Equation (8).
Therefore, we formulate T HoL ( f 1 ,r) as follows:
If there is more than one flow sharing the buffer with the tagged flow as shown in Figure 7 , HoL delay for tagged flow f s i in router r j is given by is calculated as follows:
Therefore, router r j can serve flow f s i by curve
We analyze the buffer space threshold for each VC based on traffic specifications of flows passing through that VC, and also interference between them. The buffer space threshold for virtual channel V C k in physical channel PC i of router r j is given as follows:
where
is the set of flows passing through V C k in physical channel PC i of router r j .
Step2: Inter-router ESC
We have analyzed and modeled three kinds of sharing to compute the intra-router ESC. After analyzing per-router resource sharing (intra-ESC), the effects of buffer sharing and channel sharing on tagged flow have been considered and we can view an analysis model that keeps only channel&buffer sharing for tagged flow. This model is called aggregate analysis model. For example, suppose that a tagged flow f 1 traverses a tandem of routers, and is multiplexed with contention flows as depicted in Figure 8 Now, we consider aggregate analysis model to recognize interference patterns and remove contention flows one by one. A tagged flow directly contends with contention flows. Also, contention flows may contend with each other and then contend with the tagged flow again. To consider inter-ESC in the aggregate analysis model, we decompose a complex contention scenario to two basic contention patterns, namely, Nested and Crossed. Figures 8, 9 , 10, and 11 illustrate examples of different kinds of nested contentions and an example of crossed contention is shown in Figure 12 . In the following, we will describe these examples with more details.
We use the algebra of sets to recognize the contention scenarios. To facilitate our discussion, we define convenient notations by the example in Figure 8(b) . In the example, the tandem of servers is as {β ({1},r 1 ) , β ({1,2,3},r 2 ) , β ({1,2},r 3 ) , β ({1},r 4 ) } and S = {s i } = -The problem is strictly transformed to the combination of two nested flows.
To remove a contention flow from a service curve and derive the new service curve from that, we apply the proposed corollary 1 in Section 5.
When s m is not unique, each of them can be selected. In this article, we choose the first one from the left side in the aggregate analysis network.
In [Le Boudec et al. 2004 ]. It will be discussed in Section 6.3.
In the following, we give examples for various contention patterns.
6.2.1. Nested Flows. Four different types of nested contention are exemplified as Figures 8, 9 , 10, and 11. Flow f 3 is nested in flow f 2 in Figures 8, 9 , and 10, and it is also nested in flow f 4 in Figure 11 . - Figure 8(b) shows the first type of nested flows after applying intra-ESC, in which s m = {1, 2, 3}, s Prev = {1}, and s Next = {1, 2}. In this case, s Prev ⊂ s Next and due to step 2 of contention recognition procedure, we remove flow f {1,2,3}−({1,2,3}∩{1,2}) = f {3} from β ({1,2,3},r2) and derive β ({1,2},r2) , as depicted in Figure 8 (c).
-The second type of nested flows in the aggregate analysis model is depicted in Figure 9 . Due to Figure 9 (b), s m = {1, 2, 3}, s Prev = {1, 2}, and s Next = {1}. In this case, s Next ⊂ s Prev and flow f {1,2,3}−({1,2,3}∩{1,2}) = f {3} is eliminated from β ({1,2,3},r3) regarding step 3 of contention recognition procedure. Figure 9 (c) shows aggregate analysis model after removing f 3 . - Figure 10 shows an example of the third type of nested contention. Based on aggregate analysis model depicted in Figure 10( , due to step 4.a) of contention recognition procedure, the case is nested contention and flow f {1,2,3}−({1,2,3}∩{1,2}) = f {3} is removed from β ({1,2,3},r3) , as shown in Figure 10 (c).
- Figure 11 shows a type of nested contention related to step 4.b) of contention recognition procedure. Due to Figure 11( , it is a nested contention and Figure 11 (c) shows that flow f {1,3,4}−({1,3,4}∩{1,4}) = f {3} is eliminated from β ({1,3,4},r3) . Figure 12 shows contention flow f 2 crossed with f 3 . Regarding Figure 12 (b), s m = {1, 2, 3}, s Prev = {1, 2}, and s Next = {1, 3}. Since s Prev is not a subset of s Next , and vice versa and also both of them are a subset of s m , due to step 4.c) of contention recognition procedure, this case is a crossed contention. There are two cross points, one between r 2 and r 3 and the other between r 3 and r 4 . We cut f 3 at the second cross point, i.e., at the ingress of r 4 , f 3 will be split into two flows, f 3 andf 3 , as shown in Figure 12 (c). Then the problem is strictly transformed to the combination of nested flows such that f 3 is nested in flow f 2 andf 3 in f 1 . It is clear that the arrival curve α ( f 3 ,r 3 ) equals to α 3 and the arrival curve α (f 3 ,r 3 ) equals to α * ( f 3 ,r 3 ) . To compute α * ( f 3 ,r 3 ) , we need to get the ESC of r 3 for f 3 , β ( f 3 ,r 3 ) . Then, we calculate the output arrival curve of f 3 as α * ( f 3 ,r 3 ) = α ( f 3 ,r 3 ) β ( f 3 ,r 3 ) by applying the proposed Theorem 3 in Section 5. Now, nested flows f 3 andf 3 can be removed from the tandem as shown in Figure 12 
Crossed Flows.

End-to-End ESC
We show a high-level analysis flow for deriving the end-to-end ESC in Figure 13 and then present end-to-end ESC algorithm along with more details and one example.
To calculate end-to-end ESC, we first obtain intrarouter ESC for the tagged flow in each router. Then we use the theorem of concatenation of network elements [Le Boudec et al. 2004 ] to model nodes sequentially connected and each is offering a service curve on the same aggregate flows β (s i ,r j ) , j = 1, 2, . . . , n as a single server as follows:
In the next step, we calculate inter-router ESC by applying contention recognition stages and removing contention flows as described in Section 6.2. After that, the concatenation theorem is applied again to find more equivalent servers and reduce the number of service curves. For instance, after removing contention flow f 3 in Figure 8(c) , the service curve of subtandem {r 2 , r 3 } for aggregate flow f {1,2} is computed as β ({1,2},r 2,3 ) = β ({1,2},r 2 ) ⊗ β ({1,2},r 3 ) . If we repeat contention recognition steps, the next contention flow is f 2 nested in f 1 . If we similarly remove it from β ({1,2},r 2,3 ) and calculate convolution β ({1},r 1,2,3 ) = β ({1},r 1 ) ⊗ β ({1},r 2,3 ) , the end-to-end ESC of tagged flow f 1 is obtained.
Algorithm 1 explains the procedure of calculating end-to-end ESC with more details.
-Joining node. In Lines 2-8, the algorithm checks if source node of a contention flow f i is one of the nodes along the tagged flow's path or not. If it is not, this means that we should calculate input TSPEC of the contention flow f i in the point joined to the tagged flow's route (point A in Figure 14 when f 1 is the tagged flow). We obtain this point by function JoiningPoint( f i ) and call it joining node. Calculate X = ESC( f j , Src( j), joiningnode)
6:
end if 8: end for 9: Calculate intrarouter ESC based on Section 6.1. We give an example in Figure 15 to show how to derive an aggregate analysis model and obtain end-to-end ESC by following the proposed algorithm.
Assuming the tagged flow is f 1 , line 1 of the algorithm finds CF t which is { f 2 , f 3 , f 4 } in the example.
-Loop 1 in the algorithm (Lines 2-8): In Lines 3-4, the algorithm obtains joining node for each contention flow whose source node is not one of the nodes along the tandem. Then, end-to-end ESC of flow f j from the source node to joining node has been derived by recursively calling ESC( f j , Src( j), joiningnode) in Line 5. Line 6 computes output arrival curve, which is input arrival curve to the joining node and input TSPEC is extracted from that. In the example of Figure 15 (a), all source nodes of contention flows are in the tagged flow's route and lines 4-6 are skipped for them.
Line 9 obtains intra-router ESC for the tagged flow due to Section 6.1. Figure 15 (b) shows the aggregate analysis model for the example. Due to line 10, β ({1,2},r 3,4 ) = β ({1,2},r 3 ) ⊗ β ({1,2},r 4 ) . Figure 15 (c) depicts the example in this step. Regarding line 11, s m = {1, 2, 3}.
-Loop 2 in the algorithm (Lines 12-32). In Lines 13-29, we consider different contention scenarios along the route using the algebra of sets. In this step, we intend to remove contention flows one by one due to their effects on the tagged flow as mentioned in Section 6.2. Lines 13-21 consider nested contentions and lines 22-28 crossed one.
-Nested contention in the example. From Figure 15 (c), s m = {1, 2, 3}, s Prev = {1}, and s Next = {1, 2}. Since s Prev ⊂ s Next , due to line 13, flow f {1,2,3}−({1,2,3}∩{1,2}) = f 3 is removed from β ({1,2,3},r2) as shown in Figure 15 (d).
Lines 30-31 are the same as lines 10-11, which calculate concatenation of the nodes on the same aggregate flows and then obtain new s m , which result in β ({1,2},r 2,3,4 ) = β ({1,2},r 2 ) ⊗ β ({1,2},r 3,4 ) , and s m = {1, 2, 4} (Figure 15(e) ).
-Crossed contention in the example. If we repeat contention recognition steps in Loop 2, the next contention in the example is crossed. From Figure 15( , it goes to the else part (lines 22−28) of the algorithm. As shown in Figure 15 (e), contention flow f 2 is crossed with f 4 . There are two cross points, one between r 2,3,4 and r 5 and the other between r 5 and r 6 . Regarding the algorithm, we cut f 4 at the second cross point, i.e., at the ingress of r 6 , f 4 will be split into two flows, f 4 andf 4 , as shown in Figure 15 
LUDB Derivation
To compute the delay bound for a flow passing a series of nodes, one simple way is to calculate the summation of delay bounds at each node. However, this results in a loose total delay bound. To tighten the worst-case delay bound along the network, the end-to-end service curve of the flow is used as stated in corollary Pay Bursts Only Once [Le Boudec et al. 2004] . Hence, we first calculate the end-to-end ESC of the tagged flow based on the proposed algorithm and then obtain LUDB according to Theorem 1. We have implemented algorithms employed in our methodology.
EXPERIMENTS
Experimental Setup
To evaluate the capability of our method, we applied it to a synthetic traffic pattern and a realistic one. Throughout the experiments, we assume an SoC with 500 MHz frequency in which packets traverse the network using the XY routing algorithm.
, and each node guarantees the service curve of β R,T (t) = δ T ⊗γ 0,R , where the serving rate R is C flit/cycle and the latency T ,
We have implemented the proposed analytical model in C++ to automate analysis steps.
Synthetic Traffic Pattern
We synthesize a simple traffic pattern as shown in Figure 16 to follow the analytical approach step by step and derive numerical results. The figure depicts a network with 4 flows and 4 routers that serve flows in the FIFO order. f 1 is the tagged flow and f 2 and f 4 are contention flows. Step 1. We first calculate the intrarouter ESC for the tagged flow in each node. Then, we can model a flow passing through a series of routers as a series of concatenated pseudo-affine servers.
It is worth mentioning that TSPEC of each flow f j mentioned before is the TSPEC of the input flow to its source node, for example f 2 ∝ (L 2 , p 2 , σ 2 , ρ 2 ), which means ρ ( f 2 ,r 1 ) = ρ 2 and other characteristics can be obtained as well.
-In router r 1 . From Equations (6) and (7), the ESC for aggregate flow f {1,2} in node 1 is given by.
-In router r 2 . F B ( f 1 ,r 2 ) = { f 2 } and due to Equations (6) and (7), R ( f 1 ,r 2 ) = C and T l ( f 1 ,r 2 ) = 0. Furthermore, T + D router , because two VCs (one transmits f 2 and the other f 3 ) are sharing the ejection channel of router r 2 . In Equation (14), we should obtain TSPEC of input flow f 2 to r 2 , which is TSPEC of output flow f 2 from r 1 . Since TSPEC is derived from arrival curve, we obtain arrival curve of output flow f 2 from r 1 by applying the proposed Theorem 3 in Section 5. We assumed
1 ) where ρ ( f 2 ,r 1 ) = ρ 2 and σ ( f 2 ,r 1 ) = σ 2 . In this respect, we can say α ( f 2 ,r 2 ) = γ σ 2 +ρ 2 T ( f 2 ,r 1 ) ,ρ 2 . For deriving T ( f 2 ,r 1 ) , we should first obtain ESC for flow f 2 in router r 1 , β ( f 2 ,r 1 ) , as follows.
From Equation (13), β ( f{ 1,2} ,r 1 ) = δ 0 ⊗ γ 0,C . We then remove f 1 from aggregate flow f {1,2} according to Corollary 1 in Section 5, β ( f 2 ,r 1 ) is given by.
In this respect Equation (14) is rewritten as follows:
.
As mentioned before, r 2 ) . Therefore, the ESC for tagged flow f 1 in router 2 is given by.
-In router r 3 . Since VC of f 1 is sharing the ejection channel of r 3 with VC of f 4 , due to Equations (6) and (7),
Thus, the ESC for tagged flow f 1 in router 3 is given by:
Step 2. Now, we are able to compute per-flow ESC provided by the tandem of routers the tagged flow passes. Figure 17 depicts different steps of computing end-to-end ESC for tagged flow f 1 . After calculating intra-router ESC as mentioned in Step 1, we have an aggregate analysis model as shown in Figure 17 (b). Since we have investigated the effect of flow f 2 on tagged flow f 1 in router r 2 , when we calculated β ( f 1 ,r 2 ) in step 1, f 2 is removed from r 2 in Figure 17(b) . Similarly, f 3 and f 4 are eliminated from r 2 and r 3 , respectively. We then obtain end-to-end ESC for tagged flow f 1 by following Algorithm 1. Due to the algorithm, β ({1},r 2,3 ) in Figure 17 (c) is calculated as β ({1},r 2 ) ⊗ β ({1},r 3 ) .
We use the theorem of Concatenation of network elements [Le Boudec et al. 2004] . Given are two nodes sequentially connected and each is offering a latency-rate service curve β R i ,T i , i = 1 and 2. These nodes can be represented as a single latency-rate server as follows: 
Therefore, β ({1},r 2,3 ) is given by
In Figure 17 (c), s m = {1, 2}, s Prev = {}, and s Next = {1}. The algorithm then removes flow f 2 from aggregate flow f {1,2} in router r 1 . To this end, we apply the proposed corollary 1 to obtain ESC β ({1},r 1 ) by subtracting arrival curve of α 2 from β ({1,2},r 1 ) , as follows:
Figure 17(c) depicts the example after removing arrival curve of flow f 2 from β ({1,2},r 1 ) . Now, end-to-end ESC can be calculated by
Suppose that flows follow TSPEC, f 1 ∝ (1, 1, 8, 0.128), f 2 ∝ (1, 1, 2, 0.032), f 3 ∝ (1, 1, 2, 0.008), and f 4 ∝ (1, 1, 4, 0.128). Therefore, θ j is computed for each flow f j as θ 1 = (σ 1 − L 1 )/( p 1 − ρ 1 ) = (8 − 1)/(1 − 0.128) = 8.027, θ 2 = 1.033, θ 3 = 1.008, and θ 4 = 3.44. Also, we assume serving rate C = 1 flit/cycle, L w = 1 flit, and D router = 1 cycle. We then replace the variables in Equation (22) by numbers as follows:
7.2.2. Computation of LUDB. According to Theorem 1 and Equation (22), the maximum delay for flow f 1 is bounded by In what follows, we consider the accuracy of our proposed analytical method through the BookSim simulator [Jiang et al. 2013] and then compare it with the methods without considering the traffic peak rate behavior [Lenzini et al. 2006] . 7.2.3. Computation of Buffer Size Thresholds. As routers are assumed to be input-buffered, we derive buffer size threshold for each input channel in each router by following Equation (12). In the example of Figure 16 , we have assumed one VC per each PC. Therefore, buffer size thresholds are calculated and presented as Table II. The buffers size thresholds marked by "-" are not used by flows and thus not relevant for the threshold calculation. The value of buffer size thresholds per channel depends on the traffic load on that channel, which is affected by the number of flows passing through the channel, their traffic specifications, and the contention between them.
It is worth mentioning that while the calculated delay is an upper bound, the calculated buffer size threshold gives the lower bound of a buffer size to avoid back-pressure and buffer overflow. Therefore, the buffer sizes in simulations are set to be equal to or larger than the corresponding calculated buffer thresholds. To go into more details, Table III shows the delay bounds derived from both analytical model and simulation results for the tagged flow f 1 versus different values of the buffer size. As it is assumed that all routers has the same buffer space, the buffer size in the table should be equal to or larger than the maximum calculated space threshold in order to no buffer space threshold has been violated. Due to Table II, the maximum space threshold in this example is equal to 11 flits. As it can been seen from Table III , when the buffer size is equal to or larger than 11 flits, the delay bound calculated by the simulator is fixed and very close to the analytical result. Otherwise, the simulation results cannot be compared with the analytical results because the back-pressure and buffer overflow may happen and in turn the delay bound calculated from the model becomes invalid.
Simulation Result.
We investigate the accuracy of the proposed analytical model through BookSim simulator which is a cycle-accurate simulator [Jiang et al. 2013] . The simulation uses the same assumptions as the analytical model. We have considered a 2 × 2 mesh on-chip interconnect as shown in Figure 16 and input-buffered routers with 12 flits in each input channel. It takes 1 clock cycle to pass a flit within a router and 1 clock cycle to transmit a flit over wires between neighboring routers. We also consider the XY routing algorithm to route the data packets among cores.
Simulation result shows that worst-case delay for tagged flow f 1 in the previously mentioned system is equal to 19 cycles, which is below the LUBD of 20 cycles, predicted by our model.
We also change the value of σ 2 from 2 to 4 to consider more experiments. The LUDB calculated by our analytical model for tagged flow f 1 is equal to 24 cycles and the result from the simulation is also 24 cycles, again below the analytical LUDB.
7.2.5. Comparison. If we use (σ, ρ) instead of TSPEC, each flow j would be constrained by arrival curve α j = σ j + ρ j t = γ σ j ,ρ j . Therefore, flows in the example are represented as f 1 ∝ (8, 0.128), f 2 ∝ (2, 0.032), f 3 ∝ (2, 0.008), and f 4 ∝ (4, 0.128). We then follow the stages of computing individual delay bounds for a tagged flow as stated before. For this purpose, we can easily revise our proposed theorems for (σ, ρ) flows by substituting σ and ρ into L and p, respectively, in all formulas. We can also apply the method presented in Lenzini et al. [2006] . With both approaches, the same value for h(α 1 , β eq f 1 ) is achieved and equals to 26. Thus, our proposed method that calculatesD V BR has 23% improvement on the accuracy of the delay bound than the method with CBR flows
To analyze delay sensitivity, Table IV decreasing by reducing R, while the end-to-end processing delays and delay bounds are increasing as well. Also, it is worth mentioning that the improvement percentage (η) decreases because of reduction of R eq f 1 and increase of T eq f 1 ,C BR and T eq f 1 ,V BR . This is due to the relation between these parameters that we will elaborate it in the following. Figure 18 showsD C BR andD V BR for R eq = 0.5 where p ≥ R eq and the end-to-end ESCs are in the form of δ T eq ⊗ γ 0,R eq . According to the network calculus theory [Le Boudec et al. 2004] . η is calculated as follows:
To analyze the behavior of η, we compute the derivative of function η in terms of R eq as follows:
From Figure 18 , it is obvious that L + pθ ≥ σ and 
Realistic Traffic Pattern
We consider a real-time multimedia application with a random mapping to the tiles of a 4 × 4 mesh on-chip network. Figure 19 shows the task graph and flow mapping of a Video Object Plane Decoder (VOPD) [Bertozzi et al. 2002] in which each block corresponds to an IP and the numbers near the edges represent the bandwidth (in MBytes/sec) of the data transfer, for a 30 frames/sec MPEG-4 movie with 1920 × 1088 resolution [Van der Tol et al. 2002] . There are 21 communication flows characterized by TSPEC. We assume L i and p i for all flows are the same and equal to 1 f lit and 1 f lit/cycle, respectively. ρ i is determined in f lits/cycle due to associated bandwidth with flow f i in Figure 19 and also, σ i varies between 8 and 128 f lits for different flows. We derive delay bounds from the proposed analytical model,D f i ,V BR , and BookSim simulator,D f i ,Sim for the whole set of flows in Figure 20 . In order to have a better insight about the proposed model, for each obtained delay bound, the relative error with respect to simulation result is calculated. The calculations show that the maximum and average relative errors are about 12.1% and 6.8%, respectively, which confirm the accuracy of the proposed model.
As can be observed from Figure 20 , a flows may have larger (like f 7 ) or smaller (like f 14 ) worst-case delay bound than the other flows, which depends on its traffic specification (TSPEC) and the situation of that flow in the network. For example, if the worst-case delay bound of a particular flow is too large, (1) the flow is probably more limited by its TSPEC parameters for injecting to the network, (2) the flow may have a longer path from its source to destination, or (3) the flow may have more contentions (both direct and indirect) with other flows along its path. (25) to show the effectiveness of our model. Compared to previous models with two parameters, the proposed method improves the accuracy of the delay bounds up to 46.9% and more than 37% on average over all flows. 
Transpose Traffic Pattern
To investigate a larger network, we experiment a 8 × 8 mesh network under the transpose traffic pattern with 56 communication flows characterized by TSPEC. In this traffic pattern, the node with binary value a n−1 , a n−2 , . . . , a 1 , a 0 communicates with the nodeā n/2−1 , . . . ,ā 0 ,ā n−1 , . . . ,ā n/2 . For all traffic flows, we assume the same values for L i and p i , which are 1 f lit and 1 f lit/cycle, respectively. For different flows, ρ i varies between 0.001 and 0.03 f lits/cycle, and σ i between 2 and 128 f lits. Table VII presents the source and destination of flows along with the index assigned to them.
Similar to previous case studies, delay bounds from the proposed analytical model, D f i ,V BR , and BookSim simulator,D f i ,Sim are derived for all flows and presented as Figure 23 . As can be seen from this figure, all delays observed in simulations are below the LUDB but not too far, suggesting that the analytical bound is fairly tight since the simulation typically does not exercise the worst case.
To consider the accuracy of the analytical model, the relative errors with respect to simulation results are computed. The calculations show that the maximum and average relative errors are about 33.3% and 13%, respectively.
We also calculate per-flow delay bounds from our proposed method,D f i ,V BR , and CBR analytical model,D f i ,C BR , as depicted in Figure 24 and compare the results by computation of improvement percentages per flow, η f i . As shown in Figure 25 , our proposed analytical model is up to 39.3% more accurate than CBR analytical model and more than 31% on average over all flows.
The runtime of the proposed method in C++ is typically in the order of a few seconds. It is about 0.58 sec and 1.02 sec for the VOPD application and transpose traffic pattern, respectively.
Discussion about Other Metrics
Although the article targets an analytical model for latency bound, we briefly consider evaluating other metrics including throughput, communication load, energy consumption, and area requirements.
The network throughput is the sum of the data rates that are delivered to all ejection channels in a network and communication load is estimated by utilized bandwidth and calculated as the sum of the data rates injected to the network. As the article models the network that is not saturated, the throughput and communication load have the same values. This value is equal to 0.296 f lits/cycle for the synthetic example in Section 7.2 and 0.73 f lits/cycle for VOPD application in Section 7.3. Network calculus does not directly evaluate energy consumption and area requirements. However, we can present a comparative discussion between VBR and CBR analyses, which is the main contribution of this work. Since we study the classic inputqueuing virtual-channel router, there is nothing new or changed in the structure and design details of routers. In terms of area, what brings difference is in the calculated backlog, which determines the buffer size thresholds. In network calculus, the upper bound on backlog along the network is computed by the sum of the individual bounds on every element [Le Boudec et al. 2004] . Thus, the total required buffer for flow i is bounded byB
whereB ij is the upper bound on the buffer size for flow i in each channel j ∈ L f i and L f i is the set of channels along the path of flow i.B ij for VBR traffic flows,B
V BR ij
, and CBR traffic flows,B
C BR ij
, are given by Equation (27) and Equation (28), respectively.
B V BR ij
= σ i + ρ i T j + ((σ i − L i )/( p i − ρ i ) − T j ) + [( p i − R j ) + − p i + ρ i ](27)
B C BR i
In Equation (27) . In Section 7.2.3, we have calculated the required buffer size (buffer size threshold) in each input port of routers for a synthetic example. The sum of these values is the total required buffer size,B V BR , which is equal to 42 flits. If we calculate the total required buffer size for CBR analysis,B C BR , by Equation (26) and (28), it would be equal to 51 flits, which is about 21.4% larger thanB V BR . Similarly, B V BR is calculated for VOPD application as a realistic traffic pattern by summing buffer size bounds derived in Section 7.3. The calculations show thatB V BR = 1673 flits and B C BR = 2827 flits. Therefore VBR analysis leads to about 40.8% reduction of the total required buffers. We have also derivedB C BR andB V BR for the case study represented in Section 7.4, which is an 8 × 8 mesh network under the transpose traffic pattern. Due to calculations,B C BR = 18256 flits andB V BR = 12556 flits, which shows that the total required buffers is reduced about 31.2% by VBR analysis. As a result, under the same network and application, VBR analysis gives tighter backlog bound than CBR analysis and can thus give more accurate bounds on the buffer requirements. From the design perspective, the tighter backlog bounds lead to the area saving in the router buffers. Regarding power consumption, the network power comprises router power (buffer, switch, control circuit) and link power that, which are traffic dependent. It is notable that although VBR analysis derives tighter delay bounds, it does not change the packet transfer behavior, because it is only deriving more accurate analytical delay bounds without any change in design features of the router like switching, control, and link traversal. Therefore, the design decision of the router for instance, our analysis brings impact on is the buffer dimensioning. Assuming the same system model, VBR analysis can indeed derive tighter bounds than CBR analysis on buffer requirements, leading to power consumption saving. Following a power model for the buffers using. Orion [Shi et al. 2002] , we can safely assume that the power consumption for buffers will decrease proportionally to the buffer size.
CONCLUSIONS
In this work, we have derived the analysis procedure to investigate per-flow delay bound. To this end, we have given theorems to calculate end-to-end ESC and internal output arrival curves in a FIFO multiplexing network. Based on the proposed analysis technique, we have conducted case studies of worst-case performance analysis, considered the accuracy of the proposed model through simulation, and compared it with a method without considering the traffic peak behavior. We have developed algorithms to automate analysis steps. The algorithms run very fast and can be applied for larger networks with more flows. In the future, we plan to develop network calculus models to investigate different scheduling policies and then compare them. We also plan to extend the proposed analytical method in case of back-pressure in the network. There are some network calculus-based analytical models [Qian et al. 2009; Zhao et al. 2013] that analyze worst-case delay bounds for CBR flows due to back-pressure in the network. It would be interesting to derive possibly tighter delay bound for VBR flows. In this respect, we have to extend the analytical models under a given fixed buffer size rather than to-be-determined bounded buffer size.
ELECTRONIC APPENDIX
The electronic appendix to this article is available in the ACM Digital Library.
