Buffer-aware bounds to multi-point progressive blocking in priority-preemptive NoCs by Soares Indrusiak, Leandro et al.
This is an author produced version of Buffer-aware bounds to multi-point progressive 
blocking in priority-preemptive NoCs.
White Rose Research Online URL for this paper:
http://eprints.whiterose.ac.uk/124747/
Proceedings Paper:
Soares Indrusiak, Leandro, Burns, Alan and Nikolic, Borislav (2018) Buffer-aware bounds 
to multi-point progressive blocking in priority-preemptive NoCs. In: Proceedings of the 
2018 Design, Automation & Test in Europe Conference (DATE). Design, Automation and 
Test in Europe, 19-23 Mar 2018, Dresden. . (In Press) 
promoting access to
White Rose research papers
eprints@whiterose.ac.uk
http://eprints.whiterose.ac.uk/
Buffer-aware bounds to multi-point progressive
blocking in priority-preemptive NoCs
Leandro Soares Indrusiak∗, Alan Burns∗, Borislav Nikolic´†
∗Real-Time Systems Group, Department of Computer Science, University of York, York, UK
†CISTER Research Centre, ISEP/IPP, Porto, Portugal
{leandro.indrusiak, alan.burns}@york.ac.uk, borni@isep.ipp.pt
Abstract—This paper aims to reduce the pessimism of the
analysis of the multi-point progressive blocking (MPB) problem
in real-time priority-preemptive wormhole networks-on-chip. It
shows that the amount of buffering on each network node can
influence the worst-case interference that packets can suffer
along their routes, and it proposes a novel analytical model that
can quantify such interference as a function of the buffer size.
It shows that, perhaps counter-intuitively, smaller buffers can
result in lower upper-bounds on interference and thus improved
schedulability. Didactic examples and large-scale experiments
provide evidence of the strength of the proposed approach.
I. INTRODUCTION
Networks-on-chip (NoCs) with priority-preemptive arbitra-
tion have been widely studied for their ability to provide hard
real-time guarantees [2], [11], [10] and support for mixed-
criticality traffic [3]. Such guarantees are based on analytical
models that are able to show that, even in the worst-case
scenario, packet latencies will not exceed their deadlines.
Over the years, many analytical models of increasing com-
plexity have attempted to calculate upper-bounds to the latency
of packets injected in such a NoC [11], [7]. Those models
make assumptions about the traffic generated by the real-
time applications running on the NoC (e.g. bounds on packet
inter-arrival interval, jitter, size) as well as the NoC itself
(e.g. deterministic routing). As the state-of-the-art advances,
the assumptions behind each analytical model become more
realistic. The most recent development in this area was the
identification of the multi-point progressive blocking (MPB)
problem by Xiong et al. in [12]. Their observation has shown
that an assumption made by all previous analyses, namely that
each flit of a packet can cause interference on another packet
at most once, was not valid. In Section III we look into that
priority ID
«
highest priority
with remaining credit
data_in data_out
credit_in
routing
&
transmission
control
highest priority input channel
ith re aining credit
«
routing
&
flow control
credit_out
for each output, selects the
ʌe ʌf ʌhʌg
ʌi ʌj ʌlʌk
ʌa ʌb ʌdʌc
ʌm ʌn ʌpʌo
ȟ13 ȟ14 ȟ15 ȟ16
ȟ9 ȟ10 ȟ11 ȟ12
ȟ5 ȟ6 ȟ7 ȟ8
ȟ1 ȟ2 ȟ3 ȟ4
Fig. 1:Wormhole on-chip network with 2D mesh topology and detail
of a router with priority-driven virtual channels
problem in further detail, showing that most analytical models
produce optimistic latency upper-bounds in MPB scenarios,
except for the analysis reported in Xiong et al. in [13]. We
then show in Section IV that their analysis is unnecessarily
pessimistic, and propose a novel approach that reduces signif-
icantly the pessimism while still producing safe upper-bounds
even in the case of MPB. The paper is closed with extensive
experimental work with realistic and synthetically-generated
benchmarks, aiming to show the reduced pessimism of the
proposed approach.
II. SYSTEM MODEL
Figure 1 shows some details of the internal structure of a
router in a priority-preemptive NoC. It follows the architectural
templates first presented in [2], where each router includes a
flow controller based on priority-preemptive virtual channels
(VCs). By assigning priorities to packets, and by allowing high
priority packets to preempt the transfer of low priority ones,
network contention scenarios become more predictable and an
upper bound to the packet latency can be found. In each input
port, a different FIFO buffer stores flits of packets arriving
through different virtual channels (one for each priority level).
The router assigns an output port for each incoming packet
according to their destination. A credit-based approach [1]
guarantees that a router only forwards data to the next when
there is enough buffer space to hold it in the downstream
router. At any time, a flit of a given packet will be sent
through its respective output port if it has the highest priority
among the packets routed to that port, and if it has at least one
credit. If the highest priority packet cannot send data because
it is blocked elsewhere in the network and its buffers are full
(i.e. no credit), the next highest priority packet can access the
output link.
Let us model such a network as a set of
nodes Π = {pia, pib, . . . , piz}, a set of routers
Ξ = {ξ1, ξ2, . . . , ξm}, and a set of unidirectional links
Λ = {λa1, λ1a, λ12, λ21, . . . , λzm, λmz}. The function vc(ξi)
denotes the number of VCs supported by router ξi, which
in this model also means the number of priority levels it
is able to distinguish. The function buf(ξi) denotes the
FIFO buffer size implementing a single VC of that router. A
network router is able to transmit flits over its links at a fixed
rate. The amount of time taken by a router ξi to transmit a
flit over any of its links is represented by the link latency
function linkl(ξi). The routing of a packet header flit by a
router, i.e. the routing logic to decide which of its output
ports should arbitrate and transmit the flits of the input VC
of that packet, also introduces a latency which is likewise
represented by the routing latency function routl(ξi). In
the case of a homogeneous network, i.e. all the routers are
identical, all the functions defined over a specific router (e.g.
buf(ξi), routl(ξi)) are also defined over the complete set
(i.e. buf(Ξ), routl(Ξ)) with the same meaning.
The route between any two nodes of the network is given
by the function route(pia, pib) = {λa, . . . , λb}, denoting the
totally ordered subset of Λ used to transfer packets from node
pia to node pib (including the links connecting a node to its
respective router). The number of links of a route is given by
|routei|. We then define the function ordera,i(λa, routei) to
denote the order of a link λa over a route routei (i.e. 1 for first,
2 for second, etc.), and the respective convenience functions
first(routei) and last(routei) to single out the first and last
links of routei.
To model the traffic load injected to the network, we define
a set Γ of n real-time traffic-flows (or just flows for short)
Γ ={τ1, τ2, . . . τn}. Each flow τi gives rise to a potentially
unbounded sequence of packets. A flow has a set of properties
and timing requirements which are characterised by a set of
attributes: τi = (Pi, Ci, Ti, Di, Ji, pi
s
i , pi
d
i ). All the flows
which require timely delivery are either periodic or sporadic.
The lower-bound interval on the time between releases of
successive packets is called the period (Ti) for the flow. Each
real-time flow also has a relative deadline (Di) which is the
upper-bound restriction on network latency, assumed to be
Di ≤ Ti (so that the possibility of interference between
packets of the same flow can be dismissed). Any flow can
suffer a release jitter Ji, which denotes the maximum deviation
of successive packet releases from the flow’s period. That is,
a packet from τi will be released for transmission at most
Ji time units after its periodic tick, e.g. due to the time it
takes for its source node to generate it. Each flow also has
a priority Pi; the value 1 denotes the highest priority and
larger integers denote lower priorities. It also has source and
destination nodes on the network (pisi and pi
d
i ). Considering
the routes of any two packet flows τi and τj , we define a
contention domain cdi,j as the ordered set of links shared
by those flows: cdi,j = routei ∩ routej . We assume that a
contention domain will never be a disjoint set of links, which
is the case in all NoCs with dimension-order routing (e.g. XY).
The maximum zero-load network latency (Ci) is the maxi-
mum latency experienced by a packet of that flow, between the
release of its first flit to the reception of its last, when no flow
contention exists over the network. This value is a function
of the maximum number of flits Li of the packet, and the
length of its route. For convenience, we extend the notation
of the function route to also represent the route of a packet
from its source node to its destination: route(τi) = routei =
route(pisi , pi
d
i ). We can then formulate Ci as follows:
Ci = routl(Ξ) · (|routei| − 1) + linkl(Ξ) · |routei|
+linkl(Ξ) · (Li − 1)
(1)
Equation 1 shows that Ci is equal to the zero-load latency
of the header flit plus one additional link latency cycle per
payload flit (since they follow the header in a pipeline fashion).
The zero-load latency of the header is the time routl(Ξ) it
takes to be routed at each hop (in a route with |routei| −
1 routers, since the hop count includes the links connecting
nodes to their respective routers) plus the time linkl(Ξ) it
takes to cross each of the |routei| links along its way. Once
the header reaches the destination, the payload of the packet
takes one additional link latency time linkl(Ξ) for each of its
Li − 1 flits. Other formulations of Ci are also possible [12],
but that does not impact the approach presented here.
The goal of all the approaches reviewed in Section III, and
of the one we propose, is to use (part of) the model presented
above to calculate the worst-case latency Ri for each flow τi ∈
Γ. Ri is the highest latency experienced by a packet produced
by flow τi, and takes into account the packet’s own zero-load
latency plus the worst possible delays resulting from blocking
and preemptions from higher priority packets. A system is then
said to be schedulable if Ri ≤ Di for every τi ∈ Γ.
III. RELATED WORK
In [11], Shi and Burns proposed an analytical model (re-
ferred to as SB) that calculates the upper-bound interference
suffered by a given traffic flow τi considering both direct
and indirect interferences from other flows. Following Kim
et al. [9], they define a direct interference set SDi of τi as the
set of flows that have higher priority than τi and that share
with it at least one network link (i.e. a non-empty contention
domain): SDi = {τj ∈ Γ | Pi < Pj , cdi,j 6= ∅}. Similarly, the
indirect interference set SIi of τi is the set of flows that are not
in SDi , but that interfere with at least one flow in that set (i.e.
interfere with the flows that interfere with τi, but not directly
with τi itself): S
I
i = {τk ∈ Γ | τk ∈ S
D
j , τj ∈ S
D
i , τk /∈ S
D
i }.
In the case of direct interference, they assume that a packet
of τi may suffer interference from all packets of every flow
τj ∈ S
D
i . The amount of interference on each “hit” of a τj
packet on τi is upper-bounded by Cj , and the number of “hits”
is bounded by the number of packets of τj appearing during
the lifetime of τi (which can be found by the ceiling of the
ratio between Ri and Tj). Indirect interference is handled as
the increased interference a packet from τi can suffer from
two subsequent packets of a flow τj ∈ S
D
i . This can happen
if τj itself suffers interference from a flow τk ∈ S
I
i , delaying
the first of its packets to the point that it interferes on τi
right before the second one causes interference (the so-called
“back-to-back hit”).
Kashif and Patel proposed SLA [7], [8], aiming to reduce
the pessimism in SB, i.e. reduce the difference between the
upper-bounds provided by the model and the actual worst-
case behaviour of the NoC. They did that by calculating
interference on a link-by-link basis, and claimed that their
approach will always be tighter and upper-bounded by SB.
Experimental results show that their bounds are the same as
SB with minimal buffer sizes, and get increasingly tighter in
cases with larger buffer storage per VC.
Xiong et al. [12] have found a significant shortcoming in
both SB and SLA. They have identified using simulations that
downstream indirect interference can sometimes cause a single
packet of τj to directly interfere on τi by more than its basic
latency Cj , disproving one of the assumptions made by those
models. Specifically, they stated that a flit of a packet of τj
may interfere multiple times on a packet of τi over multiple
shared links, in case τj (1) suffers interference from a packet
τk that does not interfere with τi and (2) shares links with τk
downstream from the links it shares with τi. This is referred to
as multi-point progressive blocking (MPB), and both SB and
SLA produce unsafe latency bounds under such scenarios.
To account for the MPB problem, Xiong et al. proposed
a slightly different partitioning of indirect interference sets.
They define the upstream indirect interference set S
upj
Ii
as the
set of flows τk ∈ S
I
i that interfere with the flows τj ∈ S
D
i
before τj interferes with τi. Similarly, the downstream indirect
interference set S
downj
Ii
is the set of flows τk ∈ S
I
i that
interfere with the flows τj ∈ S
D
i after τj interferes with τi.
The notion of “before” and “after” used here refers to whether
the contention domain between τk and τj (i.e. the links they
share) appears upstream or downstream in τj , in comparison
with the contention domain between τi and τj . For clarity, we
review Xiong et al.’s definition of those two sets using the
notation introduced in Section II:
S
upj
Ii
= {τk ∈ S
I
i ∩ S
D
j | order(last(cdjk), routej) <
order(first(cdij), routej)}
S
downj
Ii
= {τk ∈ S
I
i ∩ S
D
j | order(first(cdjk), routej) >
order(last(cdij), routej)}
Based on those two sets, Xiong et al. defined two worst-
case interference terms Iupji and I
down
ji to denote the worst-case
interference Ikj suffered by τj from flows τk that interfere with
it, respectively, upstream or downstream from its contention
domain with τi:
Iupji =
∑
τk∈S
upj
Ii
Ikj (2) I
down
ji =
∑
τk∈S
downj
Ii
Ikj (3)
Their formulation for the worst-case response time Ri
bounds upstream indirect interference by Iupji and models
downstream indirect interference suffered from every τj as
direct interference over τi (i.e. by adding I
down
ji to Cj):
Ri = Ci +
∑
τj∈S
D
i
⌈
Ri + Jj + I
up
ji
Tj
⌉
(Cj + I
down
ji ) (4)
Indrusiak et al. [6] show with a counter-example that such
formulation is unsafe as the use of Iupji as an interference jitter
term in Equation 4 is unable to properly capture all possible
upstream indirect interference effects, and thus can produce
optimistic results. They also propose a fix to the analysis by
using JIj = Rj − Cj instead of I
up
ji , as it was the case in
the SB model. A corrected version of the analysis, using the
fix proposed in [6], has appeared in [13], which we refer as
XLWX and consider to be the current state-of-the-art:
Ri = Ci +
∑
τj∈S
D
i
⌈
Ri + Jj + J
I
j
Tj
⌉
(Cj + I
down
ji ) (5)
IV. PROPOSED ANALYSIS
The key motivation for the approach presented in this paper
is the treatment of the MPB problem in the XLWX analysis.
While Xiong et al. have clearly identified a type of interference
that has not been considered in the previous approaches, we
argue that their analysis approach does not properly address
the indirect interference effects that happen in wormhole
networks. Their handling of downstream indirect interference
as if it were direct interference is unnecessarily pessimistic,
so we aim to provide a tighter analysis by considering more
carefully the impact of MPB.
Let us carefully revisit that problem, caused by the down-
stream indirect interference identified in [12]: a single packet
of τj can directly interfere on τi by more than its basic latency
Cj when it suffers interference from any packet τk that does
not interfere with τi, and shares links with τk downstream from
the links it shares with τi. In this situation, every time τj is
blocked by τk, it can allow τi to flow through the network and
potentially overtake τj flits that had already blocked it earlier.
XLWX analysis correctly takes into account that the amount
of additional interference that τi can suffer from τj is upper-
bounded by the amount of time that τi is allowed to overtake
τj (and subject itself to additional interference), which is in
turn upper-bounded by the downstream indirect interference
that τj can suffer from any τk (which is expressed by I
down
ji ,
as shown in Equation 3).
Such scenario can be better understood through a simple
example with only three flows τi, τj and τk, as shown in Figure
2, aiming to clearly depict the nature of the MPB problem.
Assume that τi and τj have much larger periods and longer
packets (therefore larger C) than τk, and that τk’s releases are
not in phase with the other two. The priority order has τi with
the lowest and τk with the highest priority. In Figure 2(a), τi
and τj are released at the same time from node a, and the
higher priority τj gains access to the network, blocking τi.
In Figure 2(b), a packet of τk is then released and interferes
with τj (downstream from its contention domain cij with τi).
Since τk has the highest priority, it stops τj’s flits from using
the link between routers 3 and 4, which generate backpressure
on all subsequent flits of that packet of τj , forcing them to stay
a b dc
1 2 3 4
ʏi
ʏj
a b dc
1 2 3 4
ʏk
« « «
(a) (b)
ʏi
ʏj
Fig. 2: Downstream indirect interference
buffered along the route (depicted as stacked dots) all the way
to the source in node a. Once τj flits stop using the links
on τi’s route, τi then becomes the highest priority flow with
buffer credits so the routers starts transmitting its flits.
When τk finishes, the scenario returns to the situation
depicted in Figure 2(a), where only τj flows through the
network. However, before new flits of τj can flow out of
node a, its buffered flits must first make way and release the
backpressure along the route. This is key to the MPB problem:
it is those buffered flits of τj , which have already caused
interference on τi when they were first released out of node a,
that will again cause interference and as a consequence will
delay τi by more than τj’s zero-load latency Cj . We refer to
this effect as buffered interference, which in turn causes MPB.
Using examples like this one, Xiong et al. show in [12]
and [13] that SB and SLA analyses do not capture the MPB
problem caused by downstream indirect interference, and thus
produce optimistic results, while XLWX analysis provides an
upper bound in all cases.
By understanding the notion of buffered interference, one
can clearly see the intuition behind XLWX analysis, and why
its upper bound does not suffer from the same issues as SB
and SLA: the interference beyond Cj imposed by τj on τi will
never be larger than the amount of downstream interference
that τj suffers from τk, since that is the maximum amount
of interference from τj that could be buffered along its way.
Thus, by adding the maximum downstream interference Idownji
to Cj Xiong et al. effectively provides a safe upper-bound to
the multiple times τj can interfere with τi.
We claim, however, that such upper bound is unnecessarily
pessimistic, given that the amount of buffered interference will
also be upper-bounded by the maximum amount of buffer
space along the route of τj . Furthermore, we claim that the
amount of buffered interference of a single packet of τj that
can interfere multiple times with τi is proportional to the
length of their contention domain cdij . The intuition behind
our claims is based on the following observations regarding the
behaviour of a τj packet which is blocked due to a downstream
interference “hit” by τk:
- Flits of τj stored in buffers of routers that are downstream
to the contention domain cdij will not cause any further
interference on τi, so they will not contribute to MPB.
- Flits of τj stored in buffers within the contention domain
cdij are the only ones that will contribute to MPB.
- If τj does not suffer upstream interference, its flits arrive into
the contention domain cdij in a perfect pipelined transmission
(i.e. no gaps between flits), and if the packet is long enough
it will also be buffered over routers upstream from cdij .
When the downstream interference by τk is over, τj starts
flowing again. If there are flits stored upstream, the amount
of supplied flits into the contention domain is equal to the
amount of departed flits, so no buffering occurs, or if there
was buffering, the amount of buffered flits stays constant. When
τj is preempted once more by another τk “hit” downstream,
the build-up of its flits in the contention domain reoccurs.
Each downstream hit by an indirectly interfering flow τk can
cause at most one full contention domain worth of buffered
interference.
Based on that, we can define a formulation for the maximum
buffered interference over the contention domain cdij :
biij = buf(Ξ) · linkl(Ξ) · |cdij | (6)
We then use that value to propose a new upper-bound for the
downstream indirect interference:
Idownji =
∑
τk∈S
downj
Ii
⌈
Rj + Jk
Tk
⌉
biij (7)
The ceiling function in Equation 7 determines the number of
hits suffered by τj from every τk in the downstream indirect
interference set of τi, which is multiplied by the buffered
interference of each hit calculated by Equation 6, i.e. the time
it takes for the flits of τj buffered along cdij to flow and
potentially hit τi again. That time is given by the product of
the amount of buffer space per router buf(Ξ) on the virtual
channel of τj , the time it takes for each one of the buffered flits
to cross a network link, given by linkl(Ξ), and the number of
links in the contention domain of τj and τi given by |cdij |.
While the proposed upper bound in Equation 7 is often
tighter than the one presented by Xiong et al., that is not
always the case. In the cases that the downstream interference
on τj is not large enough to generate backpressure to fill up all
the buffers along the contention domain cdij , it is likely that
the maximum buffered interference biij could be larger than
the maximum downstream interference Ck + I
down
kj , making
the XLWX analysis tighter. Therefore, we rewrite Equation
7 to use, for every downstream interference hit, the smallest
value between biij and Ck + I
down
kj :
Idownji =
∑
τk∈S
downj
Ii
⌈
Rj + Jk
Tk
⌉
min(biij , Ck + I
down
kj ) (8)
The upper bound in Equation 8 can be optimistic in cases
when τj suffers from both upstream and downstream indirect
interference. In such cases, its packets can be “chopped-up” by
interfering flows and thus arrive in waves into the contention
domain cdij . If that happens, the amount of supplied flits into
the contention domain will not be equal to the amount of
departed flits, causing variations to the buffer interference.
Therefore, the proposed analysis (which we refer as IBN)
is applied as follows:
• Equation 8 calculates Idownji when computing down-
stream indirect interference caused by flows that do not
themselves suffer from both upstream and downstream
interference. This can make the proposed analysis tighter,
but never less tight than XLWX.
• Equation 3 calculates Idownji when computing down-
stream indirect interference caused by flows that do suffer
from upstream interference. In such cases, the proposed
analysis is exactly the same as XLWX.
• The appropriate values of Idownji are fed into Equation 5
to calculate the worst case response time of each flow.
a b dc
1 2 3 4
e f
5 6
ʏ3 ʏ1ʏ2
Fig. 3: Flow routes
V. DIDACTIC EXAMPLE
Let us consider a small didactic example to compare the
proposed analysis against XLWX and SB analyses. We assume
three flows τ1, τ2 and τ3 with sources, destinations and routes
shown in Figure 3, and with the flow parameters shown in
Table I, chosen to highlight the effects of the downstream
indirect interference of τ1 over τ3 through τ2.
We applied SB, XLWX and IBN analyses to this example,
which produced latency upper-bounds R for each flow. To
provide evidence that the proposed analysis can capture the
influence of the buffer and contention domain sizes on the
downstream indirect interference, we tabulate the results of
the proposed analysis considering different buffer sizes (2 and
10-flit buffers per VC), which are identified by the subscript
b = buf(Ξ). We also produced cycle-accurate simulation re-
sults for the same scenarios, and tabulated the worst observed
latency for each flow (using the same subscripts to identify
the buffer sizes used in each simulation scenario).
The results in Table II show, as expected, that both the
proposed analysis and XLWX provide upper-bounds to the
values found using simulation while SB provides optimistic
bounds. It also shows that the proposed analysis has much
tighter results than XLWX for τ3 (348 vs 460 for 2-flit buffer
networks, or 396 vs 460 for 10-flit buffer networks). This
happens because in this example the amount of buffered inter-
ference limits the amount of additional interference caused by
MPB, showing the real extent of the pessimism introduced by
XLWX in its accounting of that problem. The results for the
proposed analysis using different buffer sizes show that the
common practice of using small buffers in wormhole NoCs is
also advantageous in terms of time predictability, since smaller
buffers allow the proposed analysis to have tighter bounds
because of the limited amount of buffered interference that
can build up in the network.
TABLE I: Flow parameters
flow C (L, | route |) T D J P
τ1 62 (60, 3) 200 200 0 1
τ2 204 (198, 7) 4000 4000 0 2
τ3 132 (128,5) 6000 6000 0 3
TABLE II: Analysis and simulation results
flow RSB RXLWX RIBN
b=10
R
IBN
b=2
R
sim
b=10
R
sim
b=2
τ1 62 62 62 62 62 62
τ2 328 328 328 328 324 324
τ3 336 460 396 348 352 336
VI. LARGE-SCALE QUANTITATIVE EVALUATION
We now provide additional evidence on the tightness of
the proposed analysis. First, we performed a large-scale com-
parison using synthetically-generated flow sets of increasing
load. We used two configurations of a priority-preemptive
wormhole network-on-chip platform: a 16-core (4x4) and a
64-core (8x8). We used flow sets of increasing workload
by varying the number of flows in each set. The flows on
each set are based on the following characteristics: periods
uniformly distributed between 0.5 s and 0.5 ms, maximum
packet lengths uniformly distributed between 128 and 4096
flits, and deadlines equal to the respective periods. Sources
and destinations of packet flows are randomly selected, so the
average route is longer in the larger platform. Rate-monotonic
priority assignment is used despite sub-optimality, given that
no optimal assignment is known for this problem.
Figure 4 shows the percentage of cases that each of the
analyses is able to guarantee full schedulability: the proposed
analysis considering network routers with 2-flit buffers per VC
(referred to as IBN2), with 100-flit buffers per VC (referred as
IBN100), the unsafe SB analysis and the safe baseline XLWX.
Each point represents the percentage of schedulable flow sets
using each analysis out of a set with 100 flow sets, each of
them with the number of flows indicated over the X-axis.
0
10
20
30
40
50
60
70
80
90
100
40 70100130160190220250280310340370400430
%
 s
ch
e
d
u
la
b
le
 f
lo
w
se
ts
SB XLWX IBN2 IBN100
0
10
20
30
40
50
60
70
80
90
100
40 60 80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420
%
 s
ch
e
d
u
la
b
le
 f
lo
w
 s
e
ts
# flows per flow set
(a)
0
10
20
30
40
50
60
70
80
90
100
80 100 120 140 160 180 200 220 240 260 280 300 320 340 360 380 400 420 440 460 480 500 520
%
 s
ch
e
d
u
la
b
le
 f
lo
w
 s
e
ts
# flows per flow set
(b)
Fig. 4: Schedulability results for the proposed analysis against the
SB and XLWX baselines, for (a) 4x4 and (b) 8x8 NoCs.
The lines of IBN2 and IBN100 are very close, but a careful
look reveals a difference of up to 8%. This corroborates
the statement made in the previous section that large buffers
can decrease the predictability of the network because of
the more significant buffered interference effects they can
produce. We have performed the same experiments with a
range of different buffer sizes between 2 and 100, but did
not include them in Figure 4 to avoid cluttering the plots. We
have consistently observed that, in every case, the analysis was
able to guarantee schedulability of a smaller number of flow
sets when considering routers with larger buffers.
More importantly, both plots show that the difference in
schedulability between XLWX and IBN can be up to 58% in
the 4x4 case and up to 45% in the 8x8, showing how much
tighter the proposed analysis can be.
We then performed additional experiments using the au-
tonomous vehicle (AV) benchmark from [5] and using a larger
variety of NoC topologies, aiming to show the generality of
the proposed approach under a realistic scenario. We randomly
generated 100 mappings of the AV benchmark onto each of
the 26 chosen NoC topologies (from 4 to 100 nodes), and
applied the proposed analyses IBN2 and IBN100 as well as
the XLWX baseline to determine how many of those mappings
are deemed fully schedulable by each of them. Figure 5 shows
the results, and again the proposed analysis is shown to be
significantly better than XLWX for all topologies: its improved
tightness allows it to provide schedulability guarantees to
more mappings (up to 67% more). The comparison between
both variations of the proposed analysis shows that IBN2 can
provide schedulability guarantees to up to 6% more mappings
than IBN100.
VII. CONCLUSIONS
In this paper, we have reviewed the latest developments
in real-time analyses of priority-preemptive NoCs, focusing
specifically on the newly-identified problem of multi-point
blocking. We claim that XLWX, which is the only analysis that
is known to be safe under MPB, is unnecessarily pessimistic as
it treats indirect interference as if it were direct interference.
In practice, this means that it could deem unschedulable a
large number of network configurations that are in reality
schedulable and viable. We then propose a novel analysis that
takes into account buffering bounds, achieving tighter results
than XLWX while still safe under MPB scenarios, therefore
establishing the new state-of-the-art in this area. Extensive
experimental evidence backs our claim, and also shows a
counter-intuitive trade-off between buffer sizes and predictabil-
ity, as large buffers (which are known to provide improvements
on average-case performance) can result in more pessimistic
worst-case latencies using the proposed analysis.
We chose to provide intuitions, insight and experimental
evidence on the proposed analysis and its improvements, rather
than theorems or proofs. We claim that this does not reduce the
value of our contribution, since the analyses behind SB [11],
SLA [8] and the original XLWX [12] were all backed by
theorems and proof sketches, but that did not prevent each
0
10
20
30
40
50
60
70
80
90
100
40 70100130160190220250280310340370400430
%
 s
ch
e
d
u
la
b
le
 f
lo
w
se
ts
XLWX IBN2 IBN100
0
10
20
30
40
50
60
70
80
90
100
2
x2
3
x2
3
x3
4
x3
4
x4
5
x4
6
x4
5
x5
7
x4
6
x5
7
x5
6
x6
8
x5
7
x6
8
x6
7
x7
9
x6
8
x7
9
x7
8
x8
1
0
x7
9
x8
1
0
x8
9
x9
1
0
x9
1
0
x1
0
%
 s
ch
e
d
u
la
b
le
 m
a
p
p
in
g
s
network topology
Fig. 5: Schedulability results for the proposed analysis against the
XLWX baseline for the AV benchmark mapped onto the topologies
indicated over the X-axis).
of them from being subsequently found to be unsafe. We
therefore leave as future work the formalisation of such a
proof, as well as the evaluation of proof assistance approaches
(as those addressed in [4]) which could prevent such analyses
from being shown unsafe. At this point, we only claim is that
ours is the tightest analysis that has not been proven optimistic
by a counter-example.
REFERENCES
[1] T. Bjerregaard and S. Mahadevan. A survey of research and practices
of Network-on-chip. ACM Comput Surv, 38(1):1, 2006.
[2] E. Bolotin, I. Cidon, R. Ginosar, and A. Kolodny. QNoC: QoS
architecture and design process for network on chip. J Syst Arch, 50(2-
3):105–128, 2004.
[3] A. Burns, J. Harbin, and L.S. Indrusiak. A Wormhole NoC Protocol
for Mixed Criticality Systems. In IEEE Real-Time Systems Symposium,
pages 184–195, 2014.
[4] F. Cerqueira, F. Stutz, and B. B. Brandenburg. PROSA: A Case for
Readable Mechanized Schedulability Analysis. In ECRTS Conf, pages
273–284, 2016.
[5] L. S. Indrusiak. End-to-end schedulability tests for multiprocessor
embedded systems based on networks-on-chip with priority-preemptive
arbitration. J Syst Arch, 60(7):553–561, 2014.
[6] L. S. Indrusiak, A. Burns, and B. Nikolic. Analysis of buffer-
ing effects on hard real-time priority-preemptive wormhole networks.
arXiv:1606.02942 [cs], 2016.
[7] H. Kashif, S. Gholamian, and H. Patel. SLA: A Stage-Level Latency
Analysis for Real-Time Communication in a Pipelined Resource Model.
IEEE Trans Comp, 64(4):1177–1190, 2015.
[8] H. Kashif and H. Patel. Buffer Space Allocation for Real-Time Priority-
Aware Networks. In RTAS Symposium, pages 1–12, 2016.
[9] B. Kim, J. Kim, S. Hong, and S. Lee. A real-time communication
method for wormhole switching networks. In Int Conf on Parallel
Processing, pages 527–534, 1998.
[10] M. Liu, M. Becker, M. Behnam, and T. Nolte. Tighter time analysis for
real-time traffic in on-chip networks with shared priorities. In IEEE/ACM
NOCS Symposium, 2016.
[11] Z. Shi and A. Burns. Real-Time Communication Analysis for On-Chip
Networks with Wormhole Switching. In IEEE/ACM NOCS Symposium,
pages 161–170, 2008.
[12] Q. Xiong, Z. Lu, F. Wu, and C. Xie. Real-Time Analysis for Wormhole
NoC: Revisited and Revised. In GLSVLSI Symposium, pages 75–80,
2016.
[13] Q. Xiong, F. Wu, Z. Lu, and C. Xie. Extending Real-Time Analysis for
Wormhole NoCs. IEEE Trans Comput, 66(9), 2017.
