Real-Time Analysis of Priority-Preemptive NoCs with Arbitrary Buffer Sizes and Router Delays by Nikolic, Borislav et al.
This is a repository copy of Real-Time Analysis of Priority-Preemptive NoCs with Arbitrary 
Buffer Sizes and Router Delays.
White Rose Research Online URL for this paper:
http://eprints.whiterose.ac.uk/131704/
Version: Accepted Version
Article:
Nikolic, Borislav, Tobuschat, Sebastian, Soares Indrusiak, Leandro 
orcid.org/0000-0002-9938-2920 et al. (2 more authors) (2019) Real-Time Analysis of 
Priority-Preemptive NoCs with Arbitrary Buffer Sizes and Router Delays. Real-Time 
Systems. pp. 63-105. ISSN 1573-1383 
https://doi.org/10.1007/s11241-018-9312-0
eprints@whiterose.ac.uk
https://eprints.whiterose.ac.uk/
Reuse 
Items deposited in White Rose Research Online are protected by copyright, with all rights reserved unless 
indicated otherwise. They may be downloaded and/or printed for private study, or other acts as permitted by 
national copyright laws. The publisher or other rights holders may allow further reproduction and re-use of 
the full text version. This is indicated by the licence information on the White Rose Research Online record 
for the item. 
Takedown 
If you consider content in White Rose Research Online to be in breach of UK law, please notify us by 
emailing eprints@whiterose.ac.uk including the URL of the record and the reason for the withdrawal request. 
Noname manuscript No.
(will be inserted by the editor)
Real-Time Analysis of Priority-Preemptive NoCs
with Arbitrary Buffer Sizes and Router Delays
Borislav Nikolic´1 · Sebastian
Tobuschat1 · Leandro Soares Indrusiak2 ·
Rolf Ernst1 · Alan Burns2
Received: date / Accepted: date
Abstract Nowadays available multiprocessor platforms predominantly use a
Network-on-Chip (NoC) architecture as an interconnect medium, due to its
good scalability and performance. During the last decade, NoCs received a
significant amount of attention from the real-time community. One promising
category of approaches suggests to employ already existing hardware features
called virtual channels, and dedicate them, exclusively, to individual communi-
cation traffic flows. In this way, NoCs become more amenable to the real-time
analysis, which is an essential requirement for providing both safe and tight
worst-case analysis methods, and consequently deriving real-time guarantees.
In this manuscript, we present the approach which falls in the aforemen-
tioned category. Specifically, we propose a novel method for the worst-case
analysis of the NoC traffic, assuming the existence of per-flow dedicated virtual
channels. Compared to the state-of-the-art techniques, our approach yields
substantially tighter upper-bounds on the worst-case traversal times (WCTTs)
of communication traffic flows. By employing the proposed method, resource
over-provisioning can be mitigated to a large extent, and significant design-
cost reductions can be achieved. Moreover, we implemented a cycle-accurate
simulator of the assumed NoC architecture, and used it to assess the tightness
of derived WCTT bounds. Finally, we reached an interesting conclusion that
bigger virtual channel buffers do not necessarily lead to better results, and in
many cases can be counter-productive, which is a very important finding for
system designers.
Keywords Real-Time Systems · Embedded Systems · Network-on-Chip ·
Wormhole Switching · Virtual Channels · Priority-Preemptive Arbitration
1 Institute of Computer and Network Engineering, Technische Universita¨t Braunschweig, Germany
2 Real-Time Systems Group, Department of Computer Science, University of York, York, UK
Borislav Nikolic´(B)
bnikolic@ida.ing.tu-bs.de
2 Borislav Nikolic´1 et al.
1 Introduction
The Network-on-Chip (NoC) architecture (Benini and De Micheli (2002)) is
the predominant choice for interconnect mediums in nowadays available mul-
tiprocessor platforms. The popularity of NoCs can be largely attributed to
their good performance and scalability potential (Kavaldjiev and Smit (2003)).
NoCs can considerably vary in terms of various design-choices. One example
is the network topology. Currently available multiprocessors employ a ring
(e.g. Intel (2013)), a 2-D torus (e.g. Kalray (2014)) and a 2-D mesh (e.g Tilera
(2012) and Intel (2010)) approach. Moreover, NoCs employ different switching
mechanisms. For example, a store-and-forward technique is a viable strategy,
although more popular for off-chip networks than for NoCs. A more promising
mechanism is the wormhole switching technique (Ni and McKinley (1993)),
due to its good throughput and small buffering requirements (Kavaldjiev and
Smit (2003)). With this method, the communication packet is, prior to send-
ing, divided into small elements of fixed size, called flits. Flits are sequentially
injected into the NoC, and they travel in parallel, which is called the pipelined
traversal. The first flit is called the header flit, and it usually contains the
relevant information for the traversal of the packet across the NoC.
Another important design choice for NoCs is the routing mechanism. Of
interest for real-time systems are static routing algorithms, of which the most
popular one is the dimension-ordered routing method called the X-Y rout-
ing technique. With this approach, all flits constituting one packet first travel
on the X-axis, and once they reach the X coordinate of the destination, the
transfer continues on the Y-axis. This method is very appreciated by both the
academia and industry, primarily because of its relatively easy implementation
and a deadlock-free property (Hu and Marculescu (2003)). However, recent in-
sights (e.g. Nikolic´ et al (2016b) and Nikolic´ and Pinho (2017)) suggest that
this method may not be the most efficient routing approach. Some alterna-
tive strategies are to encode the entire path of the packet inside the header
flit (e.g. Kalray (2014)), or to preconfigure routers with the relevant routing
information (e.g. Stefan et al (2012)).
Yet another design choice is the flow-control strategy. This aspect is very
important, because its main purpose is to prevent buffer overflows and packet
drops. One of the most prominent approaches is the credit-based flow control,
which allows a flit transfer only if there are available credits in a given router.
Initially, all routers have credits. The amount of credits corresponds to the
available space in buffers. Each flit transfer (downstream) is followed by a
credit transfer in the opposite direction (upstream). Some alternative flow-
control mechanisms are back suction (Diemer and Ernst (2010)) and a source
router traffic shaping (Kalray (2014)), while some architectures do not use
any flow control mechanism and buffer overflows are prevented by design (e.g.
Schoeberl et al (2015)).
NoCs can also vary in terms of employed arbitration mechanisms. When
several packets compete for some shared NoC resource (e.g. a common output
link), these techniques organise accesses. One of the mainstream options is the
Real-Time Analysis of Priority-Preemptive NoCs 3
Fig. 1 Router with priority-preemptive arbitration and credit-based flow control
round robin mechanism. Alternative approaches, which are also more real-time
oriented, advocate prioritisation, where packets can have static (e.g. Indrusiak
et al (2016)) or dynamic (e.g. Nikolic´ and Petters (2014a)) priorities.
Finally, NoCs can differ in terms of additional hardware features, such
as virtual channels (Dally (1992), Dally and Seitz (1987)). A virtual channel
is nothing more than a buffer dedicated to a given port of a given router.
Virtual channels allow to simultaneously buffer flits from different traffic flows.
This can significantly mitigate negative effects of some infamous contention
scenarios which may cause severe performance deterioration, e.g. head of line
blocking (Dally (1992)). Another benefit of virtual channels is that when two
packets compete for the same resource, the higher-priority one can be granted
the permission to progress, while the lower-priority one can be stored inside its
virtual channel and in that way delayed until the next arbitration event (Song
et al (1997)). This gives the possibility to enforce priority-preemptive strategies
for NoCs (Shi and Burns (2008b)). Such a router architecture is illustrated in
Figure 1. This type of NoCs is amenable to the real-time analysis, because the
worst-case analysis methods can be efficiently applied to this model. Due to
these reasons, the priority-preemptive NoCs are currently considered to be a
promising approach for the interconnect medium in the forthcoming generation
of real-time oriented multiprocessors.
Contribution: In this manuscript, we present a novel analysis method for
computing the worst-case traversal times (WCTTs) of communication traf-
fic flows. The proposed technique is applicable to workloads deployed upon
priority-preemptive NoCs with the wormhole switching mechanism and per-
traffic flow dedicated virtual channels. Compared to the state-of-the-art ap-
proaches, the proposed method obtains significantly tighter upper-bounds.
This aspect is very important during the design phase of real-time systems,
because it allows to mitigate over-provisioning and achieve substantial design-
cost reductions. Moreover, we implemented a cycle-accurate simulator of the
assumed NoC architecture, and used it to assess the tightness of derived upper-
4 Borislav Nikolic´1 et al.
bounds on WCTTs of traffic flows. Finally, the experimental evaluation led us
to an interesting finding that bigger virtual channel buffers do not necessarily
yield better results, and in many cases can be counter-productive, which is a
very important discovery for system designers.
2 Related Work
In the real-time analysis of NoCs, there are two different strategies. One cate-
gory of approaches advocates to do a design-time temporal and/or spatial allo-
cation of NoC resources to given communication traffic-flows. In this way, any
contentions for shared resources can be avoided, and these methods are called
contentionless approaches. One popular strategy to achieve this is by arbitrat-
ing the access to the NoC and its resources in a time-division-multiplexing
(TDM) manner, while some others revolve around reserving all resources on
the entire path of the flow prior to its release, often called the virtual cir-
cuit method. Some notable approaches have been proposed by Millberg et al
(2004), Goossens et al (2005), Schoeberl (2007), Paukovits and Kopetz (2008),
Stefan et al (2012), Schoeberl et al (2015), Kasapaki et al (2016).
On the other hand, there are methods which allow contentions among
traffic flows, termed contention-aware approaches. For NoCs with round-robin
arbitration some relevant works have been developed around the recursive
calculus (e.g. Dasari et al (2013), Liu et al (2017)) and the network calculus
theory (e.g. Ferrandiz et al (2011), de Dinechin et al (2014a), de Dinechin et al
(2014b)).
In scenarios where a NoC has multiple virtual channels, preemptions among
traffic flows can be performed (Song et al (1997)). Shi and Burns (2008b) de-
veloped this approach further, employed several additional assumptions (con-
strained deadlines, distinctive priorities, per-priority virtual channels) are pro-
posed the analysis method for computing WCTTs of traffic flows. Shi et al
(2010) then extended the method and made it applicable to flow-sets with ar-
bitrary deadlines. Nikolic´ et al (2013) reduced hardware requirements of this
model by demonstrating that with a thoughtful allocation of virtual channels,
their number can be reduced from the total number of priorities (flows), to the
maximum number of contentions for any port. Regarding the same model, Shi
and Burns (2008a) and Liu et al (2015a) developed heuristic-based exhaustive
search methods for priority assignment. Nikolic´ and Petters (2014a) proposed
an arbitration policy based on the earliest-deadline-first (EDF) methodology,
and derived the accompanying WCTT analysis method for flows. Nikolic´ et al
(2016b) and Nikolic´ and Pinho (2017) explore the routing flexibility of priority-
preemptive NoCs, and propose a method to derive flow routes, which allows
to utilise platform resources more efficiently than the X-Y routing policy. Liu
et al (2015b) focus on the stochastic response time analysis. Moreover, Nikolic´
et al (2016a) and Liu et al (2016b) discovered that the analysis pessimism can
be reduced by considering only parts of paths shared by interfering flows.
Real-Time Analysis of Priority-Preemptive NoCs 5
The area of workload mapping has also been extensively studied, with the
main incentive to map the workload in such a way that existing resources
can be utilised more efficiently, and consequently additional traffic flows can
be accommodated. Shi and Burns (2010) use a task swapping strategy, Me-
sidis and Indrusiak (2011) and Racu and Indrusiak (2012) employ the genetic
algorithms metaheuristic, while Nikolic´ et al (2013) focus on the simulated an-
nealing metaheuristic. Sayuti and Indrusiak (2013) study the mapping process
with an emphasis on the NoC power consumption, while Nikolic´ and Petters
(2014b) proposed a heuristic-based application mapping approach for the Lim-
ited Migrative Model (LMM).
In terms of joint computation and communication guarantees (also called
end-to-end guarantees), Indrusiak (2014) proposed a schedulability analysis
method for a fully-partitioned many-core system, while Nikolic´ et al (2014)
derived the worst-case analysis approach for LMM. Additionally, Burns et al
(2014) and Indrusiak et al (2015) demonstrated that priority-preemptive NoCs
can accommodate the mixed-criticality workload.
All the aforementioned approaches rely on the assumption that, during
the worst-case analysis, entire flow routes are treated as indivisible resources.
Kashif et al (2014) proposed a different approach called SLA (stage level anal-
ysis), where the worst-case analysis is performed iteratively, by considering in-
dividual route elements in a sequential manner. One limitation of this method
is an unrealistic assumption that virtual channels should have sufficient ca-
pacity to store entire packets (Kashif and Patel (2014)). Later, Kashif and
Patel (2016) proposed an improved version of SLA which takes into account
buffer sizes, and demonstrated that their method derives upper bounds on flow
traversal times which are always equal to, or tighter than those produced by
the method of Shi and Burns (2008b).
The aforementioned studies consider flit-level preemptions. Recently, Liu
et al (2016a) proposed a method for the worst-case analysis assuming limited
preemptions via non-preemptive regions. This approach is an initial step to-
wards understanding flow interactions on the packet-level for priority-aware
NoCs. Additionally, Shi and Burns (2010) and Liu et al (2016b) analyse
priority-preemptive NoCs with shared virtual channels.
Yet another relevant approach is the Compositional Performance Analysis
(CPA) method, introduced by Henia et al (2005). Similar to SLA, CPA also
applies an iterative approach where network elements are analysed indepen-
dently. After that, the output events of each element are used as input events
for neighbouring elements, and the analysis is performed again. This process
is repeated until a converging point is reached (if one exists). Rambo and
Ernst (2015) proposed the CPA-based method for the worst-case analysis of
priority-preemptive NoCs.
Recently, Xiong et al (2016) discovered that the effect called backpressure
has a significant impact on the worst-case analysis of flow traversal times,
and that it was largely neglected by the community. In fact, their discovery
rendered the aforementioned approaches related to priority-premptive NoCs
optimistic. One exception is a scenario where virtual channel buffers are large
6 Borislav Nikolic´1 et al.
enough to store entire packets. In this case, backpressure effects are of no
significance for the analysis. However, in practical settings, virtual channel
buffers have a limited size, and the backpressure effects cannot be neglected.
Therefore, Xiong et al (2016) proposed a novel analysis method to compute
WCTTs of flows. Subsequently, Indrusiak et al (2016) demonstrated that the
aforementioned approach is also optimistic, and proposed a new approach.
That work also has a limitation with an unsafe treatment of flows with both
upstream and downstream indirect interference. Then, Xiong et al (2017) re-
vised their approach and made it safe. Here, in this work, we refer to that
approach as SOTA (short for State-Of-The-Art). Yet, most recently, Indru-
siak et al (2018) revised their approach and made it safe, hereafter referred to
as SOTA+. Note, that SOTA+ always produces equal or tighter bounds than
SOTA. Nonetheless, for the sake of completeness of this work, we will compare
our approach against both SOTA and SOTA+. A more detailed explanation
of these methods is given in Section 5, and for further details the reviewer is
advised to consult the work of Indrusiak et al (2018).
Following the aforementioned discoveries, Tobuschat and Ernst (2017) de-
veloped a CPA-based worst-case analysis method which takes into account the
backpressure effects. However, this is an initial backpressure-aware CPA-based
approach, and it is only applicable to NoCs with a single channel. Therefore,
it cannot be compared with neither SOTA, nor SOTA+, nor the method
proposed in this work.
3 System Model
3.1 Platform
The platform θ considered in this work is a multiprocessor system with m×n
processing elements (cores) {pi1, pi2, ..., pinm}, interconnected via a 2-D mesh
NoC. The NoC is composed of m× n routers {ρ1, ρ2, ..., ρnm} (one per core),
as illustrated in Figure 2. Note, that in Figure 2 and remaining figures, cores
and respective core-to-router input/output links have been omitted for better
clarity.
All routers are synchronised, identical, and, depending on its position, each
of them is connected with 2, 3 or 4 neighbouring ones. The routing delay is
denoted with dR, and it represents the time that it takes for a header flit to
be transferred within the router, usually from the input port to the output
port. We assume that the header size is equal to the size of one flit. The
routing process is typically performed in several pipelined stages. We assume
an arbitrary number of pipeline stages, each taking one clock cycle. Note, that
remaining flits do not suffer routing delay. Moreover, all flows are routed with
the X-Y routing policy.
A connection between each pair of adjacent routers is established via two
unidirectional links, while all links of the NoC have identical physical charac-
teristics. The link traversal delay is denoted with dL, and all flits experience
Real-Time Analysis of Priority-Preemptive NoCs 7
1
...
..
.
...
..
.
..
.
...
2 m
m+1 m+2 2m
(n-1)m+1 (n-1)m+2 nm
Fig. 2 Assumed NoC interconnect with 2-D mesh topology andm×n identical routers (for
better clarity, cores and respective core-to-router input/output links have been omitted)
the same delay when travelling between two adjacent routers. It is assumed
that dL takes one clock cycle. Additionally, each router ρi is connected via
two unidirectional links with its local core pii. Core-to-router links are identi-
cal to router-to-router links, i.e. a traversal delay of one flit is dL.
The data transfer is performed with the wormhole switching mechanism
and the credit-based flow control is used. It is assumed that the transfer of
credits takes less time than the transfer of flits, which is a reasonable assump-
tion, because of the considerable difference in the amount of transferred data.
Additionaly, we assume that the platform provides hardware support for data
transfer, in the form of virtual channels (VCs), and that the number of VCs is
at least equal to the maximum number of contentions for any port of the NoC.
This assumption assures that, at each router, each flow will have a dedicated
virtual channel (Nikolic´ et al (2013)). Moreover, we do not put any restrictions
of the sizes of individual VC buffers, and each VC can have a capacity to store
an arbitrary number of flits. Similarly, we do not put any restrictions on the
duration of the routing delay dR, which makes our approach applicable to a
wide range of NoC architectures with the priority-preemprive arbitration.
The assumed router architecture is illustrated in Figure 1. Worth noting
is that input ports are connected to output ports via a crossbar and flits
from the same input port may travel to different output ports in parallel.
The arbitration happens at each output port, and consequently, flits from the
highest priority input buffer with credits are transferred to the downstream
router. This implies that two flows may interfere only if they share the same
output port.
3.2 Workload
In this work, we take a communication-centric approach, and assume that a
workload is comprised of a collection of sporadic communication traffic flows
8 Borislav Nikolic´1 et al.
F = {f1, f2, ...fz}, hereafter also referred to as the flow-set. Each flow fi ∈
F is a source of a potentially infinite sequence of packets, and it has the
following characteristics: (i) its source core/router pisrci /ρ
src
i , (ii) its destination
core/router pidsti /ρ
dst
i , (iii) a set of traversed links between the source and
the destination (including the one connecting pisrci with ρ
src
i , and the one
connecting ρdsti with pi
dst
i ), termed Li, and also called the flow path, which
conforms with the X-Y routing policy, (iv) its size σi (including the header),
expressed in the number of flits, (v) its minimum inter-arrival time Ti between
two successive packets, (vi) its constrained deadline Di ≤ Ti, (vii) a unique
fixed priority Pi ∈ {1, 2, ...|F|}, where |F| symbolises the number of flows in
the flow-set, also called the cardinality of F , and finally (viii) a release jitter
JRi .
During each inter-arrival period, a flow may release one packet. Let R∗i be
the observed WCTT of any packet of fi, e.g. via simulations. If R
∗
i is less than
Di, it can be conjectured that fi will never miss its deadline. However, this
cannot be guaranteed, unless extensive simulations are performed, so as to
capture all possible system states. This can be prohibitively expensive both in
terms of computational capacities and time. In this work, we take a different
approach and derive an analytical upper bound on the WCTT of flow fi,
termed Ri. Thus, if we prove that derived Ri is a safe upper bound on the
WCTT of fi and that Ri ≤ Di, we also proved that fi will never miss its
deadline. If a flow never misses its deadline, it is considered schedulable. If all
flows of the flow-set are schedulable, the flow-set itself is considered schedulable.
4 Problem Formulation
The research problem which is tackled in this work can be summarised as
follows. Given the platform θ and the workload F , provide an analysis method
to examine the schedulability of F on θ, by obtaining safe and tight upper
bounds on the worst-case traversal times of all flows from F .
5 Background and Preliminaries
We start this section by summarising the aspects of interest in Table 1. Variable
LCDi,j denotes the set of shared links between flows fi and fj , which is hereafter
referred to as the contention domain. Similarly, LPREi,j and L
POST
i,j denote sets
of links on the path of fi before and after its contention domain with fj ,
respectively.
Now we define the traversal delay of flow fi in isolation, also called the
basic network latency, or the zero-load latency (Equation 1).
Ci =
header routing︷ ︸︸ ︷
(hi − 1) · dR +
header traversal︷ ︸︸ ︷
hi · dL +
payload traversal︷ ︸︸ ︷
(σi − 1) · dL (1)
Real-Time Analysis of Priority-Preemptive NoCs 9
Table 1 Platform and flow characteristics
β VC buffer size (in flits)
dR Routing delay of a header flit
dL Link traversal delay of one flit
Li Set of links traversed by fi
hi Number of elements in Li (hops of fi)
α(Li) The first element (link) of Li
ω(Li) The last element (link) of Li
order(ℓi,Lj) The order of link ℓi on Lj
e.g. order(α(Li),Li) = 1 and order(ω(Li),Li) = hi
LCDi,j Li ∩ Lj
LPREi,j
⋃
ℓk ∈ Li | order(ℓk,Li) < order(α(L
CD
i,j ),Li)
LPOSTi,j
⋃
ℓk ∈ Li | order(ℓk,Li) > order(ω(L
CD
i,j ),Li)
Pi Priority of fi, where 1 ≤ Pi ≤ |F| (unique priorities)
Ti Minimum inter-arrival time of fi
Di Deadline of fi, where Di ≤ Ti
σi Size of fi, including the header (in flits)
JRi Release jitter of fi
The first term in Equation 1 is the cumulative routing delay of the header
flit in all routers from ρsrci to ρ
dst
i (including them). The second term is the
cumulative traversal delay of the header flit across all links (including the one
connecting pisrci with ρ
src
i , and the one connecting ρ
dst
i with pi
dst
i ). The last
term is the delay of the traversal of remaining flits across the last link, due to
the pipelined transmission.
Equation 1 is often simplified with the dL = dR = 1 assumption, and then
it becomes Equation 2.
Ci = 2 · hi + σi − 2 (2)
An even further common simplification is dL = 1 and dR = 0, and then
Equation 1 becomes Equation 3.
Ci = hi + σi − 1 (3)
Definition 1 (Basic network latency) Regardless of the computation way
(Equations 1-3), Ci is referred to as the basic network latency, or the isolation
latency of fi.
Due to synchronised routers and the pipelined manner of the routing pro-
cess, as well as the fact that events occur with the granularity of one cycle, the
10 Borislav Nikolic´1 et al.
analysed flow cannot suffer any interference from lower priority flows. How-
ever, it can be preempted by some higher priority flows. Therefore, we need
to define them.
Definition 2 (Directly interfering flow) Flow fj is a directly interfering
flow of flow fi iff (if and only if) Pj > Pi and L
CD
i,j 6= ∅.
FD(fi) denotes the set of all directly interfering flows of fi. The interference
that analysed flow fi might suffer from fj ∈ FD(fi) in general case does not
depend only on fj , but also on its own directly interfering flows. Therefore, it
is necessary to perform an even further classification of flows from FD(fi).
Definition 3 (Directly interfering flow without indirect interference)
Flow fj is a directly interfering flow of flow fi without indirect interference iff
all directly interfering flows of fj are also directly interfering flows of fi.
F∅D(fi) denotes the set of all directly interfering flows of fi without indirect
interference. Each fj ∈ F
∅
D(fi) we call difo of fi.
Definition 4 (Directly interfering flow with only upstream indirect
interference) Flow fj is a directly interfering flow of flow fi with only up-
stream indirect interference iff the following two conditions are fulfilled:
– There exists at least one flow fk which is a directly interfering flow of fj ,
but not of fi, and the contention domain of fj and fk is located on the
path of fj upstream from the contention domain of fj and fi.
– There exists no flow fm which is a directly interfering flow of fj , but not
of fi, and the contention domain of fj and fm is located on the path of fj
downstream from the contention domain of fj and fi.
FUD(fi) denotes the set of all directly interfering flows of fi with only up-
stream indirect interference. Each fj ∈ F
U
D(fi) we call difu of fi.
Definition 5 (Directly interfering flow with only downstream indi-
rect interference) Flow fj is a directly interfering flow of flow fi with only
downstream indirect interference iff the following two conditions are fulfilled:
– There exists at least one flow fk which is a directly interfering flow of fj ,
but not of fi, and the contention domain of fj and fk is located on the
path of fj downstream from the contention domain of fj and fi.
– There exists no flow fm which is a directly interfering flow of fj , but not
of fi, and the contention domain of fj and fm is located on the path of fj
upstream from the contention domain of fj and fi.
FDD (fi) denotes the set of all directly interfering flows of fi with only
downstream indirect interference. Each fj ∈ F
D
D (fi) we call difd of fi.
Definition 6 (Directly interfering flow with both upstream and down-
stream indirect interference) Flow fj is a directly interfering flow of flow
fi with both upstream and downstream indirect interference iff the following
two conditions are fulfilled:
Real-Time Analysis of Priority-Preemptive NoCs 11
– There exists at least one flow fk which is a directly interfering flow of fj ,
but not of fi, and the contention domain of fj and fk is located on the
path of fj downstream from the contention domain of fj and fi.
– There exists at least one flow fm which is a directly interfering flow of fj ,
but not of fi, and the contention domain of fj and fm is located on the
path of fj upstream from the contention domain of fj and fi.
FU+DD (fi) denotes the set of all directly interfering flows of fi with both
upstream and downstream indirect interference. Each fj ∈ F
U+D
D (fi) we call
difud of fi.
Flow relations are formally summarised in Table 2.
Table 2 Flow relations
FD(fi) ∀fj ∈ F | Pj > Pi ∧ Li ∩ Lj 6= ∅
F∅D(fi) ∀fj ∈ FD(fi) | FD(fj) \ FD(fi) = ∅
∀fj ∈ FD(fi), ∃fk ∈ FD(fj), 6 ∃fm ∈ FD(fj) |
FUD(fi) order(ω(L
CD
j,k ),Lj) < order(α(L
CD
j,i ),Lj)
∧
order(α(LCDj,m),Lj) > order(ω(L
CD
j,i ),Lj)
∀fj ∈ FD(fi), ∃fk ∈ FD(fj), 6 ∃fm ∈ FD(fj) |
FDD (fi) order(α(L
CD
j,k ),Lj) > order(ω(L
CD
j,i ),Lj)
∧
order(ω(LCDj,m),Lj) < order(α(L
CD
j,i ),Lj)
∀fj ∈ FD(fi), ∃fk ∈ FD(fj), ∃fm ∈ FD(fj) |
FU+DD (fi) order(α(L
CD
j,k ),Lj) > order(ω(L
CD
j,i ),Lj)
∧
order(ω(LCDj,m),Lj) < order(α(L
CD
j,i ),Lj)
Obviously: FD(fi) = F
∅
D(fi)
⋃
FUD(fi)
⋃
FDD (fi)
⋃
FU+DD (fi).
In Figure 3 is given an example of flows to demonstrate flow relationships.
The interfering sets of flow f7 are as follows:
FD(f7) = {f3, f4, f5, f6},
F∅D(f7) = {f3}, F
U
D(f7) = {f4}, F
D
D (f7) = {f5}, F
U+D
D (f7) = {f6}
After defining flow relations, now we present the state-of-the-art analysis
methods for obtaining WCTTs, namely SOTA and SOTA+. The worst-case
traversal time of a flow can be computed by solving Equation 4.
12 Borislav Nikolic´1 et al.
f
f
f
f
f
f
f
1
3
4
5
6
7
2
1 2 3 4 5 6 7
Fig. 3 Example of interfering flows, where P1 > P2 > P3 > P4 > P5 > P6 > P7
Ri = Ci +
∑
fj∈FD(fi)
⌈
Ri + J
R
j + J
I
j→i
Tj
⌉
· (Cj +Bj→i) (4)
In Equation 4, JIj→i is the interference jitter, which can be computed by
solving Equation 5.
JIj→i =
{
Rj − Cj , if fj ∈ {F
U
D(fi)
⋃
FDD (fi)
⋃
FU+DD (fi)}
0, otherwise
(5)
Both SOTA and SOTA+ use Equation 4 and Equation 5. However, SOTA
and SOTA+ differ in the computation of the term Bj→i in Equation 4. The
term Bj→i accounts for additional interference that could occur due to back-
pressure, hereafter referred as the buffering interference. In SOTA, it can be
computed by solving Equation 6.
Bj→i =
∑
fk∈FD(fj)∧
fk 6∈FD(fi)∧
order(α(Lj,k),Lj)>
order(ω(Lj,i),Lj)
⌈
Rj + J
R
k
+ JI
k→j
Tk
⌉
· (Ck +Bk→j) (6)
In SOTA+, it can be computed by solving Equation 7.
Bj→i =
∑
fk∈FD(fj)∧
fk 6∈FD(fi)∧
order(α(Lj,k),Lj)>
order(ω(Lj,i),Lj)
⌈
Rj + J
R
k
+ JI
k→j
Tk
⌉
·
{
min
{
Ck +Bk→j , β · dL · |L
CD
i,j |
}
, if fj ∈ FDD (fi)
Ck +Bk→j , otherwise
(7)
From both Equation 6 and Equation 7 we see that the additional buffering
interference is caused by each flow fk which is directly interfering with fj
downstream from the contention domain of fj and fi.
There are three limitations of SOTA and SOTA+:
– It is assumed that a flow causes/suffers direct interference during its entire
traversal, whereas interference may occur only while interfering/interfered
flits traverse the contention domain.
Real-Time Analysis of Priority-Preemptive NoCs 13
– The buffering interference is unconditionally considered for all difd and
difud flows, while in some cases it may not occur.
– In cases where buffering interference does occur, SOTA and SOTA+ may
substantially overestimate it.
All three of these issues are addressed in the next section, and as already
mentioned, in this work we will compare the proposed approach against both
SOTA and SOTA+.
6 Proposed Approach
In this section, we will present a novel method to compute the WCTT of
a traffic flow. First, we will start by analysing only difo flows (Section 6.1),
and then gradually extend the analysis (Sections 6.2-6.4) until covering all
interfering flow categories identified in the previous section.
6.1 Interference from difo F∅D flows
The common property of all difo flows of fi is that they can suffer interference
only from each other, i.e., there are no other flows which can cause interfer-
ence to them, but not to fi. This allows us to analyse the system in a very
similar way to uniprocessors, where each difo flow can be treated as a higher
priority task. Shi and Burns (2008b) noticed that in these scenarios indirect
interference cannot occur, and therefore interference jitters of all difo flows are
zero, i.e. ∀fj ∈ F
∅
D(fi) : J
I
j→i = 0 (see Equations 4-5).
As already mentioned, both approaches SOTA and SOTA+ consider that
flows cause/suffer interference during their entire traversal. First, we will prove
that the analysed flow cannot suffer interference from a higher-priority flow
while its flits are not within the contention domain (Lemma 1).
Lemma 1 Consider two flows fi and fj, and let fj ∈ F
∅
D(fi). Moreover, let
us assume that a packet of fi was released at the time instant 0. Flow fi cannot
suffer any interference from fj during the following time intervals: [0 : γ
PRE
i,j ],
and [Ri − γ
POST
i,j : Ri], where γ
PRE
i,j and γ
POST
i,j can be computed by solving
Equation 8 and Equation 9, respectively.
γPREi,j =
(∣∣LPREi,j ∣∣− 1) · dR + ∣∣LPREi,j ∣∣ · dL (8)
and
γPOSTi,j =
∣∣LPOSTi,j ∣∣ · dL (9)
Proof Proven directly. In order to suffer interference from fj , flits of fi need
to be inside the respective contention domain. Thus, after its release, it will
pass at least γPREi,j time units before the header of fi reaches α(L
CD
i,j ) (the
first link on the contention domain of fi and fj). Therefore, during that time
14 Borislav Nikolic´1 et al.
f
f
1
2
1    2 3 4 5
(a)
f
f
1
2
1    2 3 4 5
(b)
Fig. 4 Detailed analysis of a single preemption, where P1 > P2, dR = 2 · dL and β ≥ 4
fi cannot suffer interference from fj . A very similar reasoning can be applied
to the last γPOSTi,j time units after the tail of fi departs from ω(L
CD
i,j ) (the last
link on the contention domain of fi and fj) and reaches the destination. Upon
its departure from ω(LCDi,j ), fi cannot suffer any interference from fj . ⊓⊔
To summarise, fi cannot suffer interference from fj before its header flit
reaches the contention domain and after its tail flit departs from the contention
domain. Terms γPREi,j and γ
POST
i,j provide lower bounds on the duration of
those intervals, respectively.
Now, let us analyse the interference that a single packet of a higher-priority
flow can cause to the analysed packet. Lemma 2 provides an upper bound.
Lemma 2 Consider two flows fi and fj, and let fj ∈ F
∅
D(fi). The maximum
interference caused to fi by a single packet of fj can be at most Ij→i, where
Ij→i can be computed by solving Equation 10.
Ij→i =
single link traversal︷ ︸︸ ︷
σj · dL +
auto-buffering︷ ︸︸ ︷(∣∣LCDi,j ∣∣− 1) ·min {dR, β · dL, σj · dL} (10)
Proof Proven directly. Let us analyse the scenario where a single higher-
priority packet causes interference to another packet. Figure 4 illustrates such
a scenario. Figure 4(a) shows the moment when the header of f1 started being
routed in router ρ3, and in the next 2 · dL intervals, 2 flits of f1 arrive in ρ3.
During the next dL interval, the header finally progresses to ρ4, while another
flit comes to ρ3 (illustrated in Figure 4(b)).
As observed, f1 can cause interference to f2 while all of its flits traverse
a single shared link (the first term in Equation 10). Moreover, due to the
routing delay, f1 may cause a self-imposed buffering, hereafter called auto-
buffering, in each traversed router. Buffered flits are important, because while
they are being stalled, e.g. due to the routing of f1, already preempted flits
of f2 may reach those routers, and get preempted again when buffered flits
start progressing. Thus, buffered flits may cause the interference twice (before
being buffered and after being de-buffered), while other flits can cause the
interference only once. From the perspective of f2, the auto-buffering of f1
matters only inside routers between α(LCD1,2 ) and ω(L
CD
1,2 ), e.g. only a single
router ρ3 in Figure 4. The autobuffering of f1 in the shared router before
the contention domain need not be considered, because the routing of f1 in
Real-Time Analysis of Priority-Preemptive NoCs 15
ρ2 does not affect the traversal of f2, and f1 affects f2 only when it becomes
eligible for transmission (starts competing for a common output link). Also, the
autobuffering of f1 in the shared router after the contention domain (ρ4) need
not be considered, because, f1 and f2 do not compete for a common output
link any more. Therefore, the maximum auto-buffering interference from f1 to
f2 is equal to
(∣∣LCDi,j ∣∣− 1) · dR, which corresponds to the cumulative routing
delay of the header flit along the contention domain. If the routing delay inside
a single router exceeds the time it takes to fill/empty a corresponding buffer,
the auto-buffering interference is then limited by the time it takes to fill/empty
the entire buffer
(∣∣LCDi,j ∣∣− 1) · β · dL. Finally, if a packet is so small that it
can entirely fit inside a single buffer, the auto-buffering interference is limited
by its traversal across the contention domain in a store-and-forward manner(∣∣LCDi,j ∣∣− 1) · σj · dL.
Thus, the total interference that a packet of f1 can cause to a packet of
f2 can be computed by summing up the interference caused by all flits of
f1 traversing a single link, and adding the auto-buffering interference (Equa-
tion 10). ⊓⊔
Note, that Equation 10 is often simplified with the dL = dR = 1 assump-
tion, and then it becomes Equation 11.
Ij→i = σj +
∣∣LCDi,j ∣∣− 1 (11)
An even further common simplification is dL = 1 and dR = 0, and then
Equation 10 becomes Equation 12.
Ij→i = σj (12)
Also note, that in architectures where, for a given output port, a header
routing of one flow and flit transfers of other flows cannot be performed in
parallel, Equation 10 becomes Equation 13.
Ij→i = σj · dL +
∣∣LCDi,j ∣∣ · dR (13)
Now, Lemma 3 proves that the findings of Lemma 2 can be extended to
three flows.
Lemma 3 Consider three flows fi, fj and fk, and let F
∅
D(fi) = {fj , fk}, and
Pk > Pj. The interference caused to the packet of fi by single packets of fj
and fk can be at most Ij→i + Ik→i.
Proof Proven directly. If packets of fj and fk cannot interfere with each other
(e.g. f1 and f2 in Figure 5(a)), or can interfere (e.g. f1 and f2 in Figure 5(b))
but traverse at distinctive time intervals, then the results of Lemma 2 apply
and the maximum interference that f3 can suffer is equal to I1→3 + I2→3.
Let us consider the case where the higher-priority flows interfere with each
other. First, let f2 cause interference to f3, while f1 has not been released yet
(Figure 6(a)). Now, let f1 preempt f2 and cause its buffering (Figure 6(b)).
16 Borislav Nikolic´1 et al.
f
f
1
2
1    2 3 4 5
f3
(a)
f
f
1
2
1    2 3 4 5
f3
(b)
Fig. 5 Example of 3 interfering flows, where P1 > P2 > P3
f
f
1
2
1    2 3 4 5
f3
(a)
f
f
1
2
1    2 3 4 5
f3
(b)
f
f
1
2
1    2 3 4 5
f3
(c)
f
f
1
2
1    2 3 4 5
f3
(d)
Fig. 6 Detailed contention analysis of 3 flows, where P1 > P2 > P3, dR = 2 · dL and β = 4
After f2 is fully buffered, f3 is allowed to progress until α(L
CD
1,3 ) and then it
also gets preempted by f1 (Figure 6(c)). After f1 stops interfering with f2 and
f3 (Figure 6(d)), the former continues its progress, and after that f3 finally
completes its transfer.
Of interest is the interval while f1 traverses. Notice, that its traversal can
cause the preemption and buffering of f2, where each flit of f1 can cause the
buffering of exactly one flit of f2 (that is, an interfering flit of f1 introduces a
disruption into the pipeline of f2 for exactly one flit traversal time dL). Also
notice, that flits of f1 which cause buffering of f2 cannot at the same time
cause interference to f3, due to (i) the flow positioning, and (ii) the fact that
f1 is progressing and f2 is being buffered. Thus, every flit of f1 can either cause
a buffering of f2 or a direct interference to f3, but not both. This means that
if f1 boosts the buffering interference that f2 causes to f3 by x time units, the
interference that itself can cause to f3 is at most I1→3 − x. This implies that
the maximum interference a packet of fi can suffer from individual packets of
two difo flows fj and fk is at most the sum of the maximum interferences that
they can individually cause. ⊓⊔
In order to generalise the findings of Lemmas 2-3, we have to prove that
fa, which is a difo flow of the analysed flow fi, cannot generate interference
Real-Time Analysis of Priority-Preemptive NoCs 17
f
f
2
3
1    2 3 4 5
f
4
 6 7
f
1
Fig. 7 Example of 4 flows with simultaneous buffering, where P1 > P2 > P3 > P4, dR =
2 · dL and β = 4
greater than Ia→i, even though it may cause simultaneous buffering of multiple
other difo flows of fi with intermediate priorities. This case is covered with
Lemma 4.
Lemma 4 Consider the flow-set F = {f1, f2, ...fi−1, fi}, where P1 > P2 >
... > Pi−1 > Pi and F
∅
D(fi) = {f1, f2, ..., fi−2, fi−1}. Moreover, consider that
each flow releases a single packet. Any higher-priority flow fa cannot generate
interference to fi (either by preempting, or causing the buffering of other flows)
larger than Ia→i.
Proof Proven directly. We claim that, although fa can cause simultaneous
buffering of multiple flows, it cannot induce interference to fi larger than
Ia→i. Consider an example illustrated in Figure 7. Notice that f1 causes the
buffering of f2 which in turn causes the buffering of f3. Although the traversal
of one flit of f1 causes two buffered flits (one of f2 and one of f3), this does
not imply that one flit of f1 generated two flits of interference to f4. In fact,
due to necessary flow positioning to invoke this scenario, as well as the fact
that f2 and f3 are being simultaneously buffered, flits of f4 could not suffer
interference from flits of f2 before those flits get buffered (due to f3 being
buffered simultaneously with it), and hence these buffered flits of f2 can cause
interference to f4 only once, while being de-buffered. Thus, buffering of f2 has
no effect on f4. Therefore, although the traversal of f1 can cause simultaneous
buffering of f2 and f3, the buffering of the former is irrelevant for f4, while
the effects of buffering of the latter have been covered with Lemma 3. ⊓⊔
Now we generalise the findings of the aforementioned lemmas with Theo-
rem 1.
Theorem 1 Consider the flow-set F = {f1, f2, ...fn−1, fn}. Let fi be the anal-
ysed flow, and let FD(fi) = F
∅
D(fi), i.e., fi has only difo flows. The maximum
interference that fi can suffer is at most I
∅
i , where I
∅
i can be computed by
solving Equation 14.
I∅i =
∑
∀fj∈F
∅
D
(fi)
⌈
Ri + J
R
j − γ
PRE
i,j − γ
POST
i,j
Tj
⌉
· Ij→i (14)
18 Borislav Nikolic´1 et al.
Proof Proven directly. If each flow emits only a single packet, then the proof
straightforwardly follows from Lemmas 1-4. Now, let us analyse the scenario
where each flow releases multiple packets. This case can be perceived as if each
flow is a set of unrelated same-priority flows, each releasing a single packet.
Due to constrained deadlines, if a flow-set is schedulable, at any time instant
there will be at most one flow of each priority. Lemmas 1-4 hold for that model,
and therefore hold for the model assumed in this work. ⊓⊔
Note, that Equation 14 includes the term Ri, which can be for now consid-
ered as a constant, and once we cover all interference scenarios, we will show
how to compute it (Equation 27).
6.2 Interference from difu FUD flows
Flows with only upstream indirect interference are very similar to the previ-
ously covered category of interfering flows. In fact, upstream indirect interfer-
ence can cause the following three effects, of which only the first two are of
relevance for analysed flow fi:
– Due to upstream indirectly interfering flows, each flow fj ∈ F
U
D(fi) can
appear and preempt fi twice during a time interval which is shorter than
Tj . In the literature, this is often called back-to-back interference.
– Due to upstream indirectly interfering flows, each flow fj ∈ F
U
D(fi) can be
divided into smaller pieces (sub-packets) that are not mutually pipelined.
We call this packet splitting.
– Due to upstream indirectly interfering flows, each flow fj ∈ F
U
D(fi) can be
buffered inside routers which are upstream of α(LCDi,j ) on the path of fj ,
but that has no effect on fi.
The first effect is taken into account by introducing the interference jitter
component JIj→i, as identified in the initial work of Shi and Burns (2008b). In
this work, we will treat the indirect interference in the same way as in all state-
of-the-art approaches, by modelling it with the interference jitter (Equation 5).
Now, let us analyse the second effect and check if packet splitting of higher-
priority flows affects the analysed flow. This is covered with Lemma 5.
Lemma 5 Consider two flows fi and fj and let F
U
D(fi) = {fj}. The maxi-
mum interference that a single packet of fj can cause to fi is the same as if
fj would be a directly interfering flow without the indirect interference, and it
is equal to Ij→i, where Ij→i can be computed by solving Equation 10.
Proof Proven directly. The only difference of the packet of fj from any packet
of any flow from F∅D is that it can be split into several pieces (sub-packets).
Let us assume that it was split into x packets of equal size σ′j , that is σj =
x ·σ′j . Now, let us compute the total interference that these x sub-packets can
cause. For clarity purposes, in the following computation we will assume that
dR < min{β · dL, σj · dL}.
Real-Time Analysis of Priority-Preemptive NoCs 19
∑
j′=1,...,x
Ij′→i =
first sub-packet
suffers routing overhead︷ ︸︸ ︷
σ′j · dL +
(∣∣LCDi,j ∣∣− 1) · dR+
remaining (x - 1) sub-packets
do not suffer routing overhead︷ ︸︸ ︷
(x− 1) · σ′j · dL =
x · σ′j · dL +
(∣∣LCDi,j ∣∣− 1) · dR = σj · dL + (∣∣LCDi,j ∣∣− 1) · dR = Ij→i
⊓⊔
Now, we can formulate the maximum interference that the analysed flow
can suffer from both difo and difu flows (Theorem 2).
Theorem 2 Consider the flow-set F = {f1, f2, ...fn−1, fn}. Let fi be the anal-
ysed flow, and let FD(fi) = F
∅
D(fi)
⋃
FUD(fi), i.e., fi has only difo and difu
flows. The maximum interference that fi can suffer is at most I
∅
i + I
U
i , where
I∅i can be computed by solving Equation 14, and I
U
i can be computed by solving
Equation 15.
IUi =
∑
∀fj∈FUD(fi)


Ri + J
R
j +
JIj→i︷ ︸︸ ︷
(Rj − Cj)−γ
PRE
i,j − γ
POST
i,j
Tj


· Ij→i (15)
Proof Proven directly. From Lemma 5 it follows that individual packets of difu
flows can cause the same amount of interference as individual packets of difo
flows. By applying the existing results on the indirect interference modelling
(Shi and Burns (2008b)), we can take into account the indirect interference
as the interference jitter (Equation 5). Finally, similar to Theorem 1, we can
elevate the conclusions from a packet level to a flow level. ⊓⊔
Note, that in the presence of both difo and difu flows, if individually ob-
served, difo flows may indeed cause more interference to the analysed flow fi
than I∅i , and similarly difu flows may cause more interference than I
U
i . This
is because of the additional buffering difus can cause to difos and vice versa.
However, the total interference that they can jointly cause to fi cannot exceed
the sum of individual terms, as proven by Theorem 2.
6.3 Interference from difd FDD flows
Unlike previously covered interfering flow categories, difd flows can cause ad-
ditional interference to the analysed flow. In both approaches SOTA and
SOTA+ this additional interference that a difd flow fj can cause to fi is
modelled with the term Bj→i (see Equation 4, Equation 6 and Equation 7).
In SOTA and SOTA+ it is always assumed that each difd flow will cause
20 Borislav Nikolic´1 et al.
f
f
f
f
f
f
2
3
4
5
6
7
1 2 3 4 5 6 7
f
f
8
1
Fig. 8 Example of flows with downstream indirect interference, where
min{P1, P2, P3, P4, P5, P6} > P7 > P8
this additional buffering interference, while that may not be always possible.
Therefore, in this section we will first derive conditions which are necessary
for this additional interference to occur.
6.3.1 Necessary conditions for additional buffering interference
Figure 8 illustrates traffic flows where f8 may suffer additional buffering in-
terference from flow f7 due to flows {f1, f2, f3, f4, f5, f6}. Now, we analyse
conditions which cause buffering interference from f7 to f8. A starting as-
sumption is that WCTTs of all flows with priorities higher than that of the
analysed flow are already computed (in our example R1, R2, R3, R4, R5, R6 and
R7). We perform the assessment by executing Algorithm 1 for our example.
Starting from ω(LCD7,8 ), get the first link p downstream the path of f7. Check
if there exists flow fk which traverses p, such that fk ∈ FD(f7)∧ fk 6∈ FD(f8)
(lines 2− 3 in Algorithm 1). In the example from Figure 8 it is f6. This flow
is added to the set of flows FD(f7), which contains all flows interfering with
f7 downstream of L
CD
7,8 (line 4 in Algorithm 1). If multiple flows fulfil this
condition, all are added to the aforementioned set.
Then, the number of intermediate routers is computed (line 6 in Algo-
rithm 1). In our example it is only one router ρ3. Now, we test if f7 can be
fully buffered inside ρ3 by evaluating the following condition:
Condition 1: numRouters · β ≥ σ7
If Condition 1 is fulfilled, then f7 cannot cause buffering interference to f8
as a consequence of its downstream indirect interference, and therefore it can
be treated as a flow with only upstream indirect interference (lines 7 − 8 in
Algorithm 1). If Condition 1 is not fulfilled, it should be tested whether f6 can
generate enough interference to cause buffering of f7 all the way upstream to
LCD7,8 (lines 11− 14 in Algorithm 1). This is tested with Condition 2:
Condition 2: numRouters · β · dL ≥
∑
∀fk∈FD(f7)
inf(fk, f7, p),
where inf(fk, f7, p) represents the maximum interference that fk may
cause to f7 on the part of the path currently analysed by Algorithm 1 (until
the link p, including it). This value can for now be considered as a constant,
Real-Time Analysis of Priority-Preemptive NoCs 21
Algorithm 1: BufferingInterferenceExists(fi, fj)
input : fi, fj
1 FD(fj) = ∅; // Initialise set of flows interfering with fj downstream of L
CD
i,j
2 foreach (p ∈ Lj : order(p,Lj) > order(ω(LCDj,i ),Lj)) do
3 foreach (fk ∈ FD(fj) : p ∈ Lk ∧ fk 6∈ FD(fi) ∧ fk 6∈ FD(fj)) do
4 add(fk, FD(fj)); // Add to flows interfering with fj downstream of L
CD
i,j
5 end
6 numRouters = order(p,Lj)− order(ω(LCDj,i ),Lj); // Routers for buffering
7 if (numRouters · β ≥ σj) then
8 return false; // Condition 1 fulfilled, buffering cannot occur
9 else
10 downstreamInf ← 0; // Variable to compute interference from FD(fj)
11 foreach (fk ∈ FD(fj)) do
12 downstreamInf ← downstreamInf + inf(fk, fj , p);
// inf(fk, fj, p) is the maximum interference from fk to fj until p
13 end
14 if (numRouters · β · dL ≥ downstreamInf) then
// Condition 2 fulfilled, still inconclusive
15 else
16 return true; // Condition 2 failed, buffering occurs
17 end
18 return false; // Condition 2 always fulfilled, buffering cannot occur
and after covering all flow interference scenarios, we will explain how it can
be obtained (Equation 26). In our example, FD(f7) has only one flow f6.
If Condition 2 is not fulfilled, then f7 can cause additional buffering inter-
ference to f8 and in the next section we will show how to compute it (line 16
of Algorithm 1). Conversely, if Condition 2 is fulfilled, that means only that
f6 alone cannot cause buffering of f7 to affect f8. Thus, this is not a sufficient,
but only a necessary condition for the absence of buffering interference, and
in order to deduce if buffering interference does exist, it is necessary to check
further downstream f7.
The process continues by searching further downstream the path of f7 for
new flows that may cause buffering of f7 (a new loop at line 2 in Algorithm 1).
In our example, there exist 2 new flows, f4 and f5. Then, Condition 1 is tested
again (for routers ρ3 and ρ4), and if it is fulfilled the buffering cannot occur.
Otherwise, it is tested whether Condition 2 is fulfilled (this time FD(f7) =
{f6, f5, f4}). If it is not fulfilled, the algorithm stops, because f7 can cause
buffering interference to f8. If it is fulfilled, new iterations are started, covering
bigger and bigger portions of the path of f7. If Algorithm 1 investigates the
entire path of f7 and has Condition 2 always fulfilled, this means that f7
cannot cause additional buffering interference to f8 (line 18 in Algorithm 1).
In summary, by iteratively applying Algorithm 1 to all downstream links on
the path of a preempting flow, starting with the first link after the contention
domain:
– If Condition 1 is fulfilled at least once ⇒ no buffering interference.
– If Condition 2 is always fulfilled ⇒ no buffering interference.
22 Borislav Nikolic´1 et al.
 
C1
C2
C1
C2
C1
YE
S
YE
S
YE
S
YE
S
YE
S
NO NO NO
NO NO
...
No buffering interference
Buffering interference
3
f6 f6 f5 f4
34 345
Fig. 9 Buffering interference check for flows f7 and f8 from Figure 8
– Otherwise ⇒ buffering interference exists.
Figure 9 demonstrates how Algorithm 1 is applied to the example of flows
from Figure 8.
6.3.2 Interference from FDND flows (no additional buffering interference)
We have seen in the previous section that difd flows for which Algorithm 1
returns a negative reply cannot cause additional buffering interference. This
implies that these flows can be treated as difu flows in the worst-case analysis.
Therefore, we can formulate the method to compute the WCTT of a flow
in the presence of difo, difu and difd flows for which Algorithm 1 returns a
negative reply (Theorem 3). Let us first divide difd flows into two categories:
those for which the buffering interference cannot occur FDND (fi) and those for
which the buffering interference can occur FDBD (fi).
Theorem 3 Consider the flow-set F = {f1, f2, ...fn−1, fn}. Let fi be the anal-
ysed flow, and let FD(fi) = F
∅
D(fi)
⋃
FUD(fi)
⋃
FDD (fi), i.e., fi has difo, difu
and difd flows. Moreover, let FDD (fi) = F
DN
D (fi), i.e. Algorithm 1 produces a
negative result for each difd flow. The maximum interference that fi can suffer
is at most I∅i +I
U
i +I
DN
i , where I
∅
i can be computed by solving Equation 14, I
U
i
can be computed by solving Equation 15, and IDNi can be computed by solving
Equation 16.
IDNi =
∑
∀fj∈FDND (fi)


Ri + J
R
j +
JIj→i︷ ︸︸ ︷
(Rj − Cj)−γ
PRE
i,j − γ
POST
i,j
Tj


· Ij→i (16)
Proof Follows directly from Theorem 2, Algorithm 1 and the discussion in
Section 6.3.1. ⊓⊔
Real-Time Analysis of Priority-Preemptive NoCs 23
6.3.3 Interference from FDBD flows (with additional buffering interference)
In this section we focus on difd flows for which Algorithm 1 returned a positive
reply. These flows cause additional buffering interference to the analysed flow,
and they were termed FDBD in the previous section. Notice, that flits of fj ∈
FDBD (fi) which cause the buffering interference to fi start grouping in the
routers between ω(LCDi,j ) and α(L
CD
i,j ), from back to front (from the former to
the latter). Given this observation, let us investigate what is the maximum
interference that already buffered flits of fj can cause to fi (Lemma 6).
Lemma 6 Consider two flows fi and fj. Let fj ∈ F
DB
D (fi). The maximum
interference that fi can suffer from n already buffered flits of fj cannot exceed
the traversal time of those flits through a single link (i.e. n · dL), regardless of
their traversal pattern.
Proof Proven directly. We analyse two scenarios:
Scenario (i): The n buffered flits leave the contention domain one by one,
with enough cycles between departures to stabilise the contention domain
(no flits can progress), thus effectively preventing the pipelined departure of
buffered flits.
Figure 10 gives a detailed view of this scenario. The highest priority flow f1
preempts f2 with 4 flits, and then has a pause for the duration of one dL due to
its upstream interference (not illustrated in Figure 10). Let us assume that this
pattern repeats indefinitely. This allows only a single flit of f2 to depart from
LCD2,3 , thus effectively preventing buffered flits of f2 to establish a pipeline.
Notice, that after 5 · dL time units the system returns to the initial state.
During this time, flow f3 suffered interference for only one dL, i.e. its arrival
at the destination router has been disrupted only during one link traversal
time (at the moment 3 · dL after the initial state there is no flit of f3 in ρ4).
This corresponds exactly to one flit of f2 leaving L
CD
2,3 . This process could
continue until either of the flows completes its transfer. Regardless, we see
that for this scenario the claim of Lemma 6 holds.
Scenario (ii): The n buffered flits leave the contention domain as soon as
possible, in a pipelined manner.
Figure 11 gives a detailed view of this scenario. After the existing flits of
f1 leave the network, no new flits of f1 will appear. This will cause a pipelined
transmission of flits of f2. The full pipeline of f2 is established 4 · dL time
units after the initial state. During that time, 2 flits of f2 departed from L
CD
2,3 ,
and that corresponds to the interference which f3 suffered during that time
(its arrival at ρ4 was interrupted for 2 · dL time units). After that moment,
f2 continues its pipelined transmission, and f3 can suffer the interference of
at most one flit transmission time per each flit of f2 departing from L
CD
2,3 .
Therefore, the claim of Lemma 6 holds for this scenario as well. ⊓⊔
Notice, that even if f1 would appear again in ρ4, exactly 3 · dL time cycles
after that the system would return back to the initial state, and the interference
24 Borislav Nikolic´1 et al.
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(a) Initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(b) 2 · dL time units after initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(c) 3 · dL time units after initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(d) 4 · dL time units after initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(e) 5 · dL time units after initial state
(identical to initial state)
Fig. 10 Buffered flits of f2 individually depart from LCD2,3 , where P1 > P2 > P3 and β = 4
suffered by f3 during the entire interval of observation would be again equal
to the number of flits of f2 which departed from the contention domain, as
proven with Lemma 6.
So far, we have seen that buffered flits grow from the end of the contention
domain towards the beginning, and that each buffered flit can cause the inter-
ference of at most 1 · dL. Now we will discuss the three upper bounds on the
amount of buffering interference that fi suffers from fj ∈ F
DB
D (fi), namely
the size bound BSj→i, the interference bound B
I
j→i and the buffer bound B
B
j→i.
Size bound BSj→i
This bound is developed around the observation that the maximum addi-
tional buffering interference that a flow can cause is limited by its size (Theo-
rem 4).
Theorem 4 Consider two flows fi and fj. Let fj ∈ F
DB
D (fi). The maximum
buffering interference caused by fj to fi has an upper bound which is equal to
Real-Time Analysis of Priority-Preemptive NoCs 25
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(a) Initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(b) 2 · dL time units after initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(c) 3 · dL time units after initial state
 
f
f
2
3
1    2 3 4 5  6 7
f
1
(d) 4 · dL time units after initial state
Fig. 11 Buffered flits of f2 depart from LCD2,3 in pipeline, where P1 > P2 > P3 and β = 4
the traversal across a single link of all of its flits, except those that could not
be buffered inside the routers within LCDi,j .
Proof Follows straightforwardly from the previous discussion and Lemma 6.
The number of potentially buffered flits is equal to the size of the flow, reduced
(at least) by what could be buffered in one router. This is because fj has
downstream indirect interference, and hence its flits could be buffered in the
router immediately after LCDi,j , without impacting fi. Thus, the maximum
buffering interference has an upper bound equal to BSj→i (Equation 17).
BSj→i = (σj − β) · dL (17)
⊓⊔
Interference bound BIj→i
Now we focus on the second bound, which has been identified by Xiong
et al (2017). This bound is developed around the observation that buffering
interference which fi suffers from fj ∈ F
DB
D (fi) occurs due to the existence of
flow fk, which causes downstream indirect interference to fi via fj . Thus, one
flit of fj is buffered when one flit of fk progresses. Therefore, the maximum
buffering interference of fj is limited by the maximum interference itself can
suffer from fk, and other flows which are interfering indirectly downstream.
We call this bound by interference, and it is equal to BIj→i (Equation 18).
26 Borislav Nikolic´1 et al.
BIj→i =
∑
∀fk∈FD(fj)∧
fk 6∈FD(fi)∧
order(α(LCDj,k ),Lj)>
order(ω(LCDi,j ),Lj)
⌈
Rj + J
R
k + J
I
k→j − γ
PRE
j,k − γ
POST
j,k
Tk
⌉
·(Ik→j+Bk→j)
(18)
where
Bk→j =


BDBk→j , if fk ∈ F
DB
D (fj)
BU+DBk→j , otherwise
(19)
Notice, that this bound requires to already have computed the buffering
interference from fk to fj (see the term Bk→j in the above equations). Similar
to Rj , for now this term, as well as its individual parts in Equation 19, can be
considered as constants, and once we cover all interference scenarios, we will
show how to obtain them (Equation 21 and Equation 24).
Buffer bound BBj→i
Now we propose the third bound, which we call the buffer bound. It is
developed around the following observation. Since every fj ∈ F
DB
D (fi) has
only downstream indirect interference, there exists no upstream interfering
flow of fj which could potentially interrupt the supply of fj ’s flits into the
contention domain. This is exploited in Theorem 5.
Theorem 5 Consider two flows fi and fj. Let fj ∈ F
DB
D (fi). The maximum
buffering interference caused by fj to fi has an upper bound which is equal
to the traversal across a single link of its flits which could be simultaneously
buffered inside the routers within LCDi,j .
Proof Proven directly. From Lemma 6 we know that each buffered flit can
cause additional interference of at most one dL. We also know that buffering
starts at the end of the contention domain and grows towards the beginning.
Since fj does not have upstream indirect interference, it cannot be split into
separate sub-packets, but travels continuously, unless preempted by other di-
rectly interfering flow, which does not have any effect (as discussed before).
While it traverses, the number of buffered flits may vary between the full
buffers along the contention domain, when the flow does not progress (see
flow f2 in Figure 10(a) and Figure 11(a)), and all buffers along the contention
domain having one flit less than the capacity, when the flow does progress in
a pipelined manner (see flow f2 in Figure 11(d)).
With Lemma 6 it was proven that no additional interference is generated
during these fluctuations of the buffer occupancy along the contention domain,
but each departed flit causes the interference of exactly one flit traversal. Let
Real-Time Analysis of Priority-Preemptive NoCs 27
us analyse the time interval between two initial states. Assume that during
that time n flits of fj departed from the contention domain, and hence cause
the buffering interference of n ·dL to fi. However, notice that when the system
again reaches the initial state, the departed flits were already replaced by the
new ones, which have not caused any interference so far. This implies that each
of those flits can cause interference for at most one flit traversal. By continuing
this reasoning, we see that with several consecutive fluctuations of buffered
flits, eventually all flits that could cause interference twice left the contention
domain, and the buffers are now occupied with flits that have not caused any
interference yet. From this, we can conclude that only the first buffered flits
which could completely fill the buffers along the contention domain have the
possibility to cause interference twice, while any other flit that appears later
can cause the interference only once.
Note, that this holds only as long as buffer fluctuations do not cause more
substantial changes, the extreme case being an entire emptying/refilling of
buffers along the contention domain. For difd flows, the buffer fluctuations
are limited between the maximum occupancy (see f2 in Figure 11(a)) and
each buffer having one flit less than the maximum (see f2 in Figure 11(d)).
Therefore, the maximum buffering interference caused by fj to fi has an upper
bound which is equal to the traversal across a single link of flits of fj which
could be simultaneously buffered inside the shared routers within LCDi,j , and it
is equal to BBj→i.
BBj→i =
(
|LCDi,j | − 1
)
· β · dL (20)
⊓⊔
Since all these bounds are safe, we can take the minimum of them:
BDBj→i = min{B
S
j→i, B
I
j→i, B
B
j→i} (21)
Now, we can compute the interference that the analysed flow suffers from
difo, difu and difd flows (Theorem 6).
Theorem 6 Consider the flow-set F = {f1, f2, ...fn−1, fn}. Let fi be the
analysed flow, and let FD(fi) = F
∅
D(fi)
⋃
FUD(fi)
⋃
FDD (fi), i.e., fi has difo,
difu and difd flows. The maximum interference that fi can suffer is at most
I∅i + I
U
i + I
DN
i + I
DB
i , where I
∅
i can be computed by solving Equation 14,
IUi can be computed by solving Equation 15, I
DN
i can be computed by solving
Equation 16, and IDBi can be computed by solving Equation 22.
IDBi =
∑
∀fj∈FDBD (fi)


Ri + J
R
j +
JIj→i︷ ︸︸ ︷
(Rj − Cj)−γ
PRE
i,j − γ
POST
i,j
Tj


· (Ij→i +B
DB
j→i)
(22)
28 Borislav Nikolic´1 et al.
f
f
2
3
1    2 3 4 5  6 7
f
1
8
9
f
f
4
5
Fig. 12 Example to show that BB is not safe for difud flows (P1 > P2 > P3 > P4 > P5)
Proof Follows directly from Theorem 4, Theorem 5 and the discussion in Sec-
tion 6.3.3. ⊓⊔
6.4 Interferece from difud FU+DD flows
In this section, the focus is on directly interfering flows with both upstream
and downstream indirect interference. Similar to difd flows, Algorithm 1 can be
applied to assess whether the buffering interference can occur. Therefore, let us
divide difud flows into two groups, those for which the buffering interference
cannot occur FU+DND (Algorithm 1 returns false), and those for which the
buffering interference can occur FU+DBD (Algorithm 1 returns true).
For the former category FU+DND , the interference can be computed in a
similar way to difu (Equation 15) and difd (Equation 16) flows which have the
same Algorithm 1 result. Equation 23 covers this case.
IU+DNi =
∑
∀fj∈F
U+DN
D
(fi)


Ri + J
R
j +
JIj→i︷ ︸︸ ︷
(Rj − Cj)−γ
PRE
i,j − γ
POST
i,j
Tj


· Ij→i
(23)
For the latter category FU+DBD , both the size bound and the interference
bound can be applied. However, as indicated in the previous section, the buffer
bound cannot be used in this case. This is because upstream indirectly inter-
fering flows can cause severe fluctuations of the buffering along the contention
domain, the most drastic one being a complete emptying/refilling of buffers
along the contention domain. Therefore, the buffer bound in general case does
not apply to flows from FU+DBD .
Here we provide an illustrative example to demonstrate this point (Fig-
ure 12). Flow f5 suffers interference from flow f4 which has both upstream
(flow f1) and downstream (flow f3) indirect interference.
If we compute the buffer bound from f4 to f5, assuming β = 5, we have
that BB4→5 = |L
CD
4,5 − 1| · β · dL = 15 · dL.
When we consider the flow parameters given in Table 31, with dR = 0 and
dL = 1, and perform the simulations on a cycle-accurate simulator, we see that
the interference which f4 causes to f5 exceeds I4→5+B
B
4→5, which is 1015. In
1 Other examples with different platform and workload parameters are also possible.
Real-Time Analysis of Priority-Preemptive NoCs 29
Table 3 Flow parameters for Figure 12
Flow σ JR D T
f1 50 0 100 100
f2 50 0 100 100
f3 1000 0 ∞ ∞
f4 1000 0 ∞ ∞
f5 2000 0 ∞ ∞
fact, our experiments reported the response time of f5 of 3130 cycles, which
implies that the interference it can suffer from f4 is 1125. This is because
the existence of f1, f2 and f3 periodically causes a complete buffer filling and
emptying of flow f4 along L
CD
4,5 , and as discussed before, in scenarios with such
fluctuations of buffered flits, the buffer bound is not safe.
Therefore, the buffering interference from difud flows can be computed as
follows:
BU+DBj→i = min{B
S
j→i, B
I
j→i} (24)
And finally, the maximum interference that a flow fi can suffer from all its
interfering flows is covered with Theorem 7.
Theorem 7 Consider the flow-set F = {f1, f2, ...fn−1, fn}. Let fi be the anal-
ysed flow. The maximum interference that fi can suffer is at most I
∅
i + I
U
i +
IDNi + I
DB
i + I
U+DN
i + I
U+DB
i , where I
∅
i can be computed by solving Equa-
tion 14, IUi can be computed by solving Equation 15, I
DN
i can be computed by
solving Equation 16, IDBi can be computed by solving Equation 22, I
U+DN
i can
be computed by solving Equation 23, and IU+DBi can be computed by solving
Equation 25.
IU+DBi =
∑
∀fj∈F
U+DB
D
(fi)


Ri + J
R
j +
JIj→i︷ ︸︸ ︷
(Rj − Cj)−γPREi,j − γ
POST
i,j
Tj


· (Ij→i +B
U+DB
j→i )
(25)
Proof Follows directly from the aforementioned discussion. ⊓⊔
After defining all interference types, now we can describe the computation
of the interference term of Condition 2 for the absence of buffering interfer-
ence. Recall, the objective is to compute the maximum interference that a
preempting flow may suffer, and test whether that interference is sufficient
to cause buffering interference to the analysed flow. Let fi be the analysed
flow, fj be the preempting flow for which we test the existence of buffering
interference, and fk be the higher-priority flow whose contention domain with
fj starts on the link p (e.g. flows f8, f7, f6 and link between ρ3 and ρ4 in
30 Borislav Nikolic´1 et al.
Figure 8). Moreover, let p be the currently analysed link in Algorithm 1. The
term inf(fk, fj , p) can be obtained as follows:
inf(fk, fj , p) =


Contribution of fk to I
∅
j (from Equation 14) if fk ∈ F
∅
D
(fpj )
Contribution of fk to I
U
j (from Equation 15) if fk ∈ F
U
D(f
p
j )
Contribution of fk to I
DN
j (from Equation 16) if fk ∈ F
DN
D (f
p
j )
Contribution of fk to I
DB
j (from Equation 22) if fk ∈ F
DB
D (f
p
j )
Contribution of fk to I
U+DN
j (from Equation 23) if fk ∈ F
U+DN
D
(fpj )
Contribution of fk to I
U+DB
j (from Equation 25) if fk ∈ F
U+DB
D
(fpj )
(26)
Note, that in Equation 26, fj should be treated as if it would terminate at
the router after p, hence the term fpj . The only exception is that the total inter-
ference of fk to fj is computed assuming the interval of interest corresponding
to the traversal of fj across its entire path Rj .
Finally, the WCTT of flow fi (Equation 27) can be obtained by summing
up its traversal delay and the interference it suffers.
Ri =
traversal
delay︷︸︸︷
Ci +
difo
interference︷︸︸︷
I∅i +
difu
interference︷︸︸︷
IUi +
difd
interference︷ ︸︸ ︷
IDNi + I
DB
i +
difud
interference︷ ︸︸ ︷
IU+DNi + I
U+DB
i
(27)
7 Experimental Evaluation
In this section, we conduct an experimental evaluation of the proposed method.
First, we compare it against the following state-of-the-art techniques: the
method of Xiong et al (2017) (referred to as SOTA), and the method of In-
drusiak et al (2018) (referred to as SOTA+). The comparison is performed for
various platform and workload configurations, while the aspects of interest are
schedulability guarantees (Experiment 1), WCTT bounds and runtime com-
plexities (Experiment 2). Then, we investigate how different VC buffer sizes
affect schedulability guarantees (Experiment 3). After that, we assess the ef-
ficiency of the proposed approach by comparing the obtained WCTT bounds
against corresponding values observed via simulations for synthetic workloads
(Experiment 4) and for a use-case of an autonomous driving vehicle ap-
plication (Experiment 5). Finally, we conclude the experimental evaluation
by analysing the hardware requirements of the proposed approach (Experi-
ment 6).
7.1 Experimental Setup
The analysis and simulation parameters are summarised in Table 4. An asterisk
sign denotes a randomly generated value, assuming a uniform distribution.
Real-Time Analysis of Priority-Preemptive NoCs 31
Table 4 Analysis and simulation parameters
NoC topology 2-D mesh
NoC size 8 × 8
Routing method X-Y routing
Router frequency 2GHz
Routing latency + link traversal latency 3 + 1 cycles
Link width 4B
Flit size 4B
Flow source router coordinates 〈 [1 - 8]*, [1 - 8]* 〉
Flow destination router coordinates 〈 [1 - 8]*, [1 - 8]* 〉
Flow size [1 - 128]* KB
Flow priority assignment policy Rate monotonic
Flow deadline = Flow period [0.01 - 1]* msec
Simulated time 1 sec
Simulator SPARTS (extended) Nikolic´ et al (2011)
Hardware Intel i5-2520M, with 4GB of RAM
Note, that if during the creation of a flow its source and destination routers
have the same coordinates, the destination router coordinates are generated
again.
7.2 Experiment 1: Schedulability Guarantees
In this experiment, we perform a comparison of the proposed approach against
both SOTA and SOTA+. The aspects of interest are schedulability guaran-
tees. The comparison is conducted in the form of a sensitivity analysis. Assum-
ing a certain configuration with platform θ and workload F , the schedulability
test is performed. If a flow-set is schedulable, the sizes of all flows are uniformly
increased, and the test is performed again. Similarly, if a flow-set is unschedu-
lable, the sizes of all flows are uniformly decreased, and the test is performed
again. This process is repeated until a threshold value is found (called the
schedulability threshold ST ), where a flow-set is schedulable, however, any in-
crease in sizes of flows would render it unschedulable. Of course, the bigger
the ST value is, the more efficient the method is.
Let STsota be the schedulability threshold value obtained for SOTA, and
let STnew be the schedulability threshhold value obtained for the proposed
approach. Then, the following metric is used to assess the improvements of
the proposed method over SOTA:
imp =
STnew − STsota
STsota
· 100%
Similarly, let STsota+ be the schedulability threshold value obtained for
SOTA+. Then, the following metric is used to asses the improvements of the
proposed method over SOTA+.
imp =
STnew − STsota+
STsota+
· 100%
32 Borislav Nikolic´1 et al.
100 200 300 400 5000
200
400
600
800
1000
1200
1400
1600
1800
2000
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(a) β = 2
100 200 300 400 5000
200
400
600
800
1000
1200
1400
1600
1800
2000
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(b) β = 100
100 200 300 400 5000
500
1000
1500
2000
2500
3000
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(c) β =∞
Fig. 13 Experiment 1: ST improvements over SOTA (method of Xiong et al (2017))
Since the proposed approach dominates both methods (i.e. always produces
the same or bigger ST values), the improvements have only positive values.
We repeated the aforementioned comparison for varying platform and work-
load configurations. Namely, we used 3 different values for buffer sizes of VCs
(recall the symbol β in Table 1): (i) each buffer can store only 2 flits, (ii) each
buffer can store at most 100 flits, and (iii) each buffer can store an entire
packet (buffer size set to the maximum packet size). Additionally, we varied
the flow-set size, and observed the improvements for flow-sets with 100, 200,
300, 400 and 500 flows. For each of these unique configurations we have gen-
erated 1000 flow-sets and computed with the above metrics the improvements
over both SOTA and SOTA+. The results are illustrated in Figure 132 and
Figure 14, respectively.
Figure 13 demonstrates that the improvements curve is always convex, and
that the gains grow with the increasing flow-set size. This is expected, because
more flows lead to more complex contention scenarios, which the proposed
method handles efficiently, while the state-of-the-art method significantly over-
2 In Figures 13-21, the box-edges represent the 25th percentile (q1) and the 75th percentile
(q3), while every data input more than an interquartile range away from the box (i.e. less
than q1 − (q3 − q1), or greater than q3 + (q3 − q1)) is considered as an outlier. Additionally,
the blue lines connect the mean values of the respective categories.
Real-Time Analysis of Priority-Preemptive NoCs 33
100 200 300 400 5000
100
200
300
400
500
600
700
800
900
1000
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(a) β = 2
100 200 300 400 5000
100
200
300
400
500
600
700
800
900
1000
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(b) β = 100
100 200 300 400 5000
200
400
600
800
1000
1200
1400
1600
Flow−set size
ST
 im
pr
ov
em
en
t (i
n %
)
(c) β =∞
Fig. 14 Experiment 1: ST improvements over SOTA+ (method of Indrusiak et al (2018))
approximates. It is also visible that the improvements grow with growing β.
This can be explained with the fact that for larger values of β the proposed
approach efficiently identifies scenarios where buffering interference does not
occur, while the state-of-the-art method unconditionally considers it. Thus,
the biggest improvements are observed for 500 flows and β = ∞, where the
proposed method, on average, allows to accommodate the workload which is
9 times bigger than the one which could be accommodated by SOTA. The
biggest observed improvement ratio for an individual flow-set is slightly less
than 30 (3000% in Figure 13(c)).
Figure 14 shows the improvements against SOTA+. The trends are very
similar to the previous case, in a sense that bigger flow-set sizes and big-
ger β both contribute to more significant improvements. The improvements
against SOTA+ are on average 40% smaller than the improvements against
SOTA, which can be attributed to a more efficient treatment of flows with
only downstream indirect interference (see the difference between Equation 6
and Equation 7 in Section 5). Nonetheless, the improvements of our method
against SOTA+ are still substantial. Again, the best results are observed for
500 flows and β = ∞ where the proposed method, on average, allows to ac-
commodate the workload which is 6 times bigger than the one which could be
34 Borislav Nikolic´1 et al.
accommodated by SOTA+. The biggest observed improvement ratio for an
individual flow-set is 15.75 (1575% in Figure 14(c)).
7.3 Experiment 2: WCTT Improvements and Scalability
In this experiment, we evaluate the improvements of the proposed method
against both SOTA and SOTA+ with respect to WCTTs of individual flows.
Assuming a given flow-set, first we obtained the ST for SOTA. Then, we
adjusted the sizes of all flows by the obtained ST. This was done to make sure
that the tested flow-sets will indeed be schedulable with both the proposed
method and the state-of-the-art methods used for comparison. After that,
we derived and compared the WCTTs of all flows in the following way. Let
WCTTsota be the WCTT of one flow obtained with SOTA, and letWCTTnew
be the WCTT of the same flow obtained with the proposed method. Then,
the following metric is used to describe the improvement of the new method
over SOTA:
imp =
WCTTsota −WCTTnew
WCTTsota
· 100%
This process was repeated for all flows of the flow-set.
Similarly, we repeat the aforementioned procedure for SOTA+. The fol-
lowing metric is used to describe the improvement of the new method over
SOTA+:
imp =
WCTTsota+ −WCTTnew
WCTTsota+
· 100%
We repeated the experiment for 1000 flow-sets, each with 500 flows. For
better visualisation, the results were organised in priority groups, e.g. [1 −
25], [26 − 50], ..., [476 − 500] (a smaller value denotes a higher priority). We
also repeated the same experiment for different values of β. The results of a
comparison against SOTA are illustrated in Figure 15, while the results of a
comparison against SOTA+ are illustrated in Figure 16.
From Figure 15 we conclude that as flow priorities decrease, the improve-
ments become more apparent. The gains are the biggest for the lowest priority
flows, and for all 3 tested values of β they asymptotically converge towards
100%. This is expected, because lower-priority flows suffer more interference
and contention scenarios are more complex, which the proposed approach ef-
ficiently handles. It is visible that the improvement curve is concave across
the entire domain for all 3 tested configurations of β. As in the previous ex-
periment, bigger values of β yield more improvements, and the best results
are achieved for β = ∞ (Figure 15(c)). This coincides with the finding of
Experiment 1.
Figure 16 demonstrates the improvements of the proposed method against
SOTA+. The conclusions are similar to the previous case, the gains grow
with decreasing priorities. It is also visible that for β = 2 (Figure 16(a)) and
Real-Time Analysis of Priority-Preemptive NoCs 35
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(a) β = 2
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(b) β = 100
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(c) β =∞
Fig. 15 Experiment 2: WCTT improvements over SOTA (method of Xiong et al (2017))
β = 100 (Figure 16(b)) the improvements curve is slightly convex for higher
priorities, and slightly concave for lower priorities, with the inflection point
near the middle of the domain (priorities around 250). Conversely, for β =∞
(Figure 16(c)), the improvement curve is concave across the entire domain.
Again, it is visible that bigger values of β yield more improvements.
Finally, for all evaluated methods we recorded execution times, so as to
assess their runtime complexities and discuss their scalability potentials. The
results are illustrated in Figure 17, where the distribution of execution times
for the proposed method is illustrated. Moreover, mean values for all three
approaches are also illustrated.
From Figure 17 it is visible that for β = 2 and β = 100, the proposed
method takes longer time to compute WCTTs. This is expected, due to the
fact that the proposed approach is indeed computationally more complex.
However, the time penalty is not significant at all. This can be explained with
the fact that the proposed method produces substantially tighter results than
SOTA and SOTA+, and consequently, requires less iterations to converge
to WCTT values. This effect is especially emphasised for β = ∞, where the
proposed method, despite its higher complexity, computes WCTTs faster than
SOTA and SOTA+.
36 Borislav Nikolic´1 et al.
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(a) β = 2
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(b) β = 100
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
W
CT
T 
im
pr
ov
em
en
t (i
n %
)
(c) β =∞
Fig. 16 Experiment 2: WCTT improvements over SOTA+ (method of Indrusiak et al
(2018))
Moreover, even for flow-sets with 500 flows, the proposed method derives
WCTT values, on average, in 200 milliseconds, while the maximum observed
computation time is slightly less than 600 milliseconds. This implies that the
proposed method is indeed scalable and applicable to workloads consisting of
hundreds of flows.
7.4 Experiment 3: Effect of Buffer Sizes on Schedulability Guarantees
The objective of this experiment is to assess the effects of VC buffer sizes on
derived schedulability guarantees. Assuming a given flow-set, we varied the
number β in the following range [2, 10, 100, 1000, 10000,∞], and observed how
ST values change. First, let STx be a ST value obtained for β from the following
range {2, 10, 100, 1000, 10000}. Similarly, let ST∞ be a corresponding value for
β =∞. Then, the following metric is used to assess the decrease (penalty) in
the derived schedulability guarantees of the approach with limited β, against
the one with β =∞ (used as a baseline):
pen =
STx
ST∞
· 100%
Real-Time Analysis of Priority-Preemptive NoCs 37
100 200 300 400 5000
100
200
300
400
500
600
Flow−set size
D
ur
at
io
n 
(in
 m
illis
ec
on
ds
)
 
 
Proposed method
SOTA
SOTA+
(a) β = 2
100 200 300 400 5000
100
200
300
400
500
600
Flow−set size
D
ur
at
io
n 
(in
 m
illis
ec
on
ds
)
 
 
Proposed method
SOTA
SOTA+
(b) β = 100
100 200 300 400 5000
50
100
150
200
250
300
350
400
450
500
Flow−set size
D
ur
at
io
n 
(in
 m
illis
ec
on
ds
)
 
 
Proposed method
SOTA
SOTA+
(c) β =∞
Fig. 17 Experiment 2: Scalability and runtime complexity of the proposed approach, SOTA
(method of Xiong et al (2017)) and SOTA+ (method of Indrusiak et al (2018))
The experiment was performed for 1000 flow-sets, each with 500 flows. The
results are illustrated in Figure 18. It is visible that, on average, the negative
effect of using smaller VC buffers is around 11%. With an increase in the buffer
sizes, counter-intuitively, the obtained STs drop. One explanation might be
that any increase in buffer sizes renders the buffer bound less applicable. At
the same time, the increase in buffer sizes is not so significant to nullify the
buffering interference. Hence, the derived STs slightly drop. This interesting
finding and the corresponding explanation will be revisited in Experiment 4. It
is also visible in Figure 18 that after a certain threshold (in our experiment it
is β = 10000), the buffer sizes were such that almost all buffering interference
could be avoided, and hence we see a significant increase in derived STs.
The results suggest that, if it is not possible to provide a platform with VC
buffer sizes which allow to avoid buffering interference in majority of cases,
it is more efficient to use a less resourceful platform with only β = 2. Please
note, that in our experiment the threshold for performance jump was so high
(β = 10000) because we designed our experiment in such a way to test the
limits of the proposed method, and hence loaded the NoC with the maximum
load which it could sustain while still guaranteeing the schedulability. This
approach caused all flow sizes to be significantly inflated during the search for
38 Borislav Nikolic´1 et al.
2 10 100 1000 1000070
75
80
85
90
95
100
Buffer size
ST
 w
.r.
t. 
un
lim
ite
d 
bu
ffe
rs
 (in
 %
)
Fig. 18 Experiment 3: STs for varied buffer sizes, relative to STs for unlimited buffers
ST, and hence β = 10000 was the threshold point. Also note, that in realistic
scenarios flows can be significantly smaller, and hence the threshold point
would be reached for smaller values of β.
7.5 Experiment 4: Method Efficiency
The focus of this experiment is on estimating the efficiency of the proposed
method with respect to derived WCTTs. To do so, we implemented a cycle-
accurate simulator of the platform described in Section 3.1, by extending the
simulator SPARTS (Nikolic´ et al (2011)). We assessed the tightness of derived
WCTT bounds by comparing them against the corresponding WCTT values
observed during simulations.
The experiment was conducted as follows. First, we used the same flow-
sets and ST values from Experiment 2, and corrected flow sizes accordingly.
Then, we simulated the execution of 1 second, which, on average, took 5 hours
per flow-set. After the simulation was completed, we collected the observed
WCTTs of all flows. Then we compared them with theWCTT bounds obtained
by the proposed method in Experiment 2. Let WCTTsim be the observed
WCTT of one flow, and let WCTTnew be the WCTT bound obtained with
the proposed method. The tightness of the derived bound is expressed with
the following metric:
tightness =
WCTTsim
WCTTnew
· 100%
Due to the fact that simulations take much longer to finish, we collected
the results for 20 flow-sets. Again, for better visualisation, the results were
organised in priority groups, e.g. [1−25], [26−50], ..., [476−500]. We repeated
the experiment for different values of β. The results are illustrated in Figure 19.
From Figure 19 it is visible that the tightness of bounds decreases with
decreasing priorities, which is an expected result, because complex contention
scenarios which are associated with the lower-priority flows are less likely to
Real-Time Analysis of Priority-Preemptive NoCs 39
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
M
ea
su
re
d 
vs
 c
om
pu
te
d 
W
CT
T 
(in
 %
)
(a) β = 2
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
M
ea
su
re
d 
vs
 c
om
pu
te
d 
W
CT
T 
(in
 %
)
(b) β = 100
1−25 101−125 226−250 351−375 476−5000
10
20
30
40
50
60
70
80
90
100
Priority
M
ea
su
re
d 
vs
 c
om
pu
te
d 
W
CT
T 
(in
 %
)
(c) β =∞
Fig. 19 Experiment 4: Observed WCTTs (via simulation), relative to analytical WCTTs
be captured during a limited simulation time. The decrease in tightness is
exponential and asymptotically converges to 25%, however, with the longer
simulation time this value could be improved. Therefore, a more extensive
experimental evaluation is a potential future activity. Moreover, it is visible
that the results are very similar for different values of β, which implies that
the method scales with respect to VC buffer sizes, and is equally applicable to
platforms with small buffers (e.g. β = 2) and huge buffers (e.g β =∞).
If we combine the findings of Experiment 3 (increasing buffer sizes may
have a negative effect on derived guarantees) with the findings of this ex-
periment (tightness scales with β), we can conclude that even the simulation
results for larger values of β are worse than the corresponding ones for smaller
values of β. This clarifies that the phenomenon observed in Experiment 3 is
not a property of the proposed analysis method, but in fact the inherent char-
acteristic of priority-preemptive NoCs. This finding supports the conclusions
from Experiment 3, and it is of crucial importance for system designers, be-
cause it suggests that there are two viable strategies: (i) use platforms with
small buffer sizes (e.g. β = 2), or (ii) use platforms with sufficiently large
buffers which allow to (almost) completely mitigate the effects of the buffering
interference (e.g. β = 10000 in our experiments). Any intermediate solutions
40 Borislav Nikolic´1 et al.
would be more expensive than the former one (more hardware resources), and
at the same time would provide worse results.
7.6 Experiment 5: Use-Case of Autonomous Driving Vehicle Application
In this experiment, we also assess the efficiency of the proposed method. We
do it in the same way as in Experiment 4, by comparing analytically obtained
WCTT values against the corresponding ones observed via simulations. But
instead of using a synthetic workload, the workload is modelled after a use-case
of an autonomous driving vehicle application (Shi et al (2010)). The use-case
consists of 33 functionalities producing totally 38 traffic flows. For a more
detailed description of the use-case, a reader is advised to consult the work of
Shi et al (2010).
The experiment was conducted in the following way. First, WCTT bounds
were obtained by the proposed method for β = 2. Then, we simulated the
execution of 100 seconds, and for each flow we collected the following val-
ues: (i) the observed worst-case traversal time, (ii) the observed average-case
traversal time, and (iii) the observed best-case traversal time. Then, the same
process was repeated for β = 100 and β =∞. The values of interest are plotted
in Figure 20.
From Figure 20 it is visible that in majority of cases, the average and
the best case are almost identical. This implies that flows usually traverse
without any contentions. However, in scenarios where contentions do occur, the
traversal times are significantly inflated. From Figure 20 we can also observe
that in most cases the proposed method derives tight WCTT bounds (a small
gap between the analytically obtained and the corresponding measured worst-
case), and that trend is evident for all configurations of β. The remark from
the previous experiment is also valid here; the simulations were performed for
only a limited amount of time (100 seconds of simulated time), and longer
simulation runs would even further reduce the aforementioned gap.
We also computed the maximum number of port contentions (a necessary
requirement to guarantee a dedicated per-port VC to each flow), and found
out that the workload from this use-case can be accommodated by a platform
with only 4 VCs.
7.7 Experiment 6: Hardware Requirements
In this experiment, we assess the hardware requirements of the proposed
model. Recall, that the number of available VCs within the platform should
be at least equal to the maximum number of contentions for any port, which is
a requirement that guarantees a dedicated per-port VC to each flow (Nikolic´
et al (2013)). We used the flow-sets from Experiment 1 (randomly generated
sources and destinations) and computed the number of needed VCs for each
of them. This process was repeated for varying flow-set size. The results are
illustrated in Figure 21.
R
ea
l-T
im
e
A
n
a
ly
sis
o
f
P
rio
rity
-P
reem
p
tiv
e
N
o
C
s
4
1
0 10 20 30 40 50 60 70
Flow
s
Traversal time (in µs)
 
 
FBU3 → VOD1
FBU8 → VOD2
FBU1 → BFE1
FBU2 → BFE2
FBU3 → BFE3
FBU4 → BFE4
FBU5 → BFE5
FBU6 → BFE6
FBU7 → BFE7
FBU8 → BFE8
FDF1 → STPH
FDF2 → STPH
STPH → OBMG
BFE1 → FDF1
BFE2 → FDF1
BFE3 → FDF1
BFE4 → FDF1
BFE5 → FDF2
BFE6 → FDF2
BFE7 → FDF2
BFE8 → FDF2
VOD1 → NAVC
VOD2 → NAVC
NAVC → THRC
USOS → OBMG
SPES → STAC
STAC → THRC
NAVC → DIRC
SPES → NAVC
VIBS → STAC
OBDB → NAVC
OBDB → OBMG
NAVC → OBDB
TPMS → STAC
POSI → NAVC
POSI → OBMG
OBMG → OBDB
STAC → TPRC
Analytically obtained worst case
M
easured worst case
M
easured average case
M
easured best case
(a
)
β
=
2
0 10 20 30 40 50 60 70
Flow
s
Traversal time (in µs)
 
 
FBU3 → VOD1
FBU8 → VOD2
FBU1 → BFE1
FBU2 → BFE2
FBU3 → BFE3
FBU4 → BFE4
FBU5 → BFE5
FBU6 → BFE6
FBU7 → BFE7
FBU8 → BFE8
FDF1 → STPH
FDF2 → STPH
STPH → OBMG
BFE1 → FDF1
BFE2 → FDF1
BFE3 → FDF1
BFE4 → FDF1
BFE5 → FDF2
BFE6 → FDF2
BFE7 → FDF2
BFE8 → FDF2
VOD1 → NAVC
VOD2 → NAVC
NAVC → THRC
USOS → OBMG
SPES → STAC
STAC → THRC
NAVC → DIRC
SPES → NAVC
VIBS → STAC
OBDB → NAVC
OBDB → OBMG
NAVC → OBDB
TPMS → STAC
POSI → NAVC
POSI → OBMG
OBMG → OBDB
STAC → TPRC
Analytically obtained worst case
M
easured worst case
M
easured average case
M
easured best case(b
)
β
=
1
0
0
0 10 20 30 40 50 60 70
Flow
s
Traversal time (in µs)
 
 
FBU3 → VOD1
FBU8 → VOD2
FBU1 → BFE1
FBU2 → BFE2
FBU3 → BFE3
FBU4 → BFE4
FBU5 → BFE5
FBU6 → BFE6
FBU7 → BFE7
FBU8 → BFE8
FDF1 → STPH
FDF2 → STPH
STPH → OBMG
BFE1 → FDF1
BFE2 → FDF1
BFE3 → FDF1
BFE4 → FDF1
BFE5 → FDF2
BFE6 → FDF2
BFE7 → FDF2
BFE8 → FDF2
VOD1 → NAVC
VOD2 → NAVC
NAVC → THRC
USOS → OBMG
SPES → STAC
STAC → THRC
NAVC → DIRC
SPES → NAVC
VIBS → STAC
OBDB → NAVC
OBDB → OBMG
NAVC → OBDB
TPMS → STAC
POSI → NAVC
POSI → OBMG
OBMG → OBDB
STAC → TPRC
Analytically obtained worst case
M
easured worst case
M
easured average case
M
easured best case(c)
β
=
∞
F
ig
.
2
0
E
x
p
erim
en
t
5
:
T
ra
v
ersa
l
tim
es
fo
r
th
e
a
u
to
n
o
m
o
u
s
d
riv
in
g
v
eh
icle
a
p
p
lica
tio
n
(a
n
-
a
ly
tica
l
w
o
rst
ca
se,
m
ea
su
red
w
o
rst
ca
se,
m
ea
su
red
a
v
era
g
e
ca
se
a
n
d
m
ea
su
red
b
est
ca
se)
100
200
300
400
500
5 10 15 20 25 30 35
Flow
−set size
Number of needed VCs
F
ig
.
2
1
E
x
p
erim
en
t
6
:
H
a
rd
w
a
re
req
u
irem
en
ts
F
rom
F
ig
u
re
21
it
is
v
isib
le
th
a
t
th
e
n
u
m
b
er
of
n
eed
ed
V
C
s
scales
lin
early
w
ith
th
e
in
creasin
g
n
u
m
b
er
of
fl
ow
s
in
th
e
fl
ow
-set.
T
h
is
is
ex
p
ected
,
b
ecau
se
a
2
-D
m
esh
is
a
scalab
le
N
oC
top
ology.
A
d
d
ition
ally,
w
e
see
th
at,
even
for
th
e
m
a
ssive
w
ork
load
s
of
500
fl
ow
s,
on
avera
ge,
on
ly
25
V
C
s
are
n
eed
ed
.
P
lease
n
o
te,
th
a
t
th
is
resu
lt
is
b
ased
on
ran
d
om
ly
gen
erated
traffi
c
sou
rces
an
d
d
es-
42 Borislav Nikolic´1 et al.
tinations, assuming the X-Y routing policy. It has already been demonstrated
that with a thoughtful mapping (Nikolic´ et al (2013)) and a thoughtful rout-
ing (Nikolic´ and Pinho (2017)) this number can be significantly reduced (on
average, by 25% and by 40%, respectively). Given that there already exist
platforms with 8 VCs (e.g. Intel (2010)), we can expect that the forthcoming
generations of real-time oriented many-cores with priority-preemptive NoCs
will have a dozen or more VCs. With thoughtful mapping and routing, such
platforms could successfully accommodate workloads comprised of several hun-
dreds of traffic flows.
8 Conclusions and Future Work
In this work, we proposed a novel method for the worst-case analysis of traver-
sal times of network traffic flows, deployed upon a priority-preemptive NoC.
Compared to the state-of-the-art techniques, our approach renders more flow-
sets schedulable, and also yields substantially tighter upper-bounds on the
worst-case traversal times. By employing the proposed method, resource over-
provisioning can be mitigated to a large extent, and significant design-cost
reductions can be achieved. Moreover, we implemented a cycle-accurate sim-
ulator of the assumed NoC architecture, and used it to assess the tightness
of derived WCTT bounds. Finally, we reached an interesting conclusion that
larger virtual channel buffers do not necessarily lead to better results, and in
many cases can be counter-productive, which is a very important finding for
system designers.
As a future work, we plan to extensively evaluate the proposed approach
with additional use-cases and benchmarks. Also, extending the method, so
as to make it applicable to flow-sets with arbitrary deadlines and platforms
with fewer virtual channels is a promising future work activity. Finally, how
to (i) map flows to cores, (ii) assign priorities to flows and (iii) assign paths to
flows are relevant problems which remain to be addressed.
References
Benini L, De Micheli G (2002) Networks on chips: a new soc paradigm. The
Computer Journal 35(1):70 –78
Burns A, Harbin J, Indrusiak L (2014) A wormhole noc protocol for mixed
criticality systems. In: Proceedings of the 35th IEEE Real-Time Systems
Symposium
Dally W (1992) Virtual-channel flow control. IEEE Transactions on Parallel
and Distributed Systems 3(2):194 –205
Dally W, Seitz C (1987) Deadlock-free message routing in multiprocessor in-
terconnection networks. IEEE Transactions on Computers
Dasari D, Nikolic´ B, Nelis V, Petters SM (2013) Noc contention analysis using
a branch and prune algorithm. ACM Transactions on Embedded Computing
Systems
Real-Time Analysis of Priority-Preemptive NoCs 43
Diemer J, Ernst R (2010) Back suction: Service guarantees for latency-sensitive
on-chip networks. In: International Symposium on Networks-on-Chip
de Dinechin BD, van Amstel D, Poulhie`s M, Lager G (2014a) Time-critical
computing on a single-chip massively parallel processor. In: Proceedings of
the 17th Conference on Design Automation and Test in Europe
de Dinechin BD, Durand Y, van Amstel D, Ghiti A (2014b) Guaranteed ser-
vices of the noc of a manycore processor. In: Proceedings of the International
Workshop on Network on Chip Architectures
Ferrandiz T, Frances F, Fraboul C (2011) A network calculus model for
spacewire networks. In: Proceedings of the 17th IEEE Conference on Em-
bedded and Real-Time Computing and Applications
Goossens K, Dielissen J, Radulescu A (2005) Aethereal network on chip: con-
cepts, architectures, and implementations. IEEE Design & Test of Comput-
ers
Henia R, Hamann A, Jersak M, Racu R, Richter K, Ernst R (2005) System
level performance analysis - the symta/s approach. IEE Proceedings - Com-
puters and Digital Techniques 152(2)
Hu J, Marculescu R (2003) Energy-aware mapping for tile-based noc architec-
tures under performance constraints. In: Proceedings of the 8th Asia and
South Pacific Design Automation Conference
Indrusiak LS (2014) End-to-end schedulability tests for multiprocessor embed-
ded systems based on networks-on-chip with priority-preemptive arbitration.
Journal of System Architecture
Indrusiak LS, Harbin J, Burns A (2015) Average and worst-case latency im-
provements in mixed-criticality wormhole networks-on-chip. In: Proceedings
of the 27th Euromicro Conference on Real-Time Systems
Indrusiak LS, Burns A, Nikolic´ B (2016) Analysis of buffering effects on
hard real-time priority-preemptive wormhole networks. Technical report
arxiv:1606.02942
Indrusiak LS, Burns A, Nikolic´ B (2018) Buffer-aware bounds to multi-point
progressive blocking in priority-preemptive nocs. In: Proceedings of the 21st
Conference on Design Automation and Test in Europe
Intel (2010) Single-Chip-Cloud Computer.
www.intel.com/content/dam/www/public/us/en/documents/
technology-briefs/intel-labs-single-chip-cloud-article.pdf
Intel (2013) Intel R© Xeon Phi
TM
.
http://www.intel.com/content/www/us/en/processors/xeon/
xeon-phi-detail.html
Kalray (2014) MPPA-256 Manycore Processor.
www.kalrayinc.com/kalray/products/#processors
Kasapaki E, Schoeberl M, Sørensen RB, Mu¨ller C, Goossens K, Sparsø J (2016)
Argo: A real-time network-on-chip architecture with an efficient gals imple-
mentation. IEEE Transactions on Very Large Scale Integration Systems
Kashif H, Patel H (2014) Bounding buffer space requirements for real-time
priority-aware networks. In: Proceedings of the 19th Asia and South Pacific
Design Automation Conference
44 Borislav Nikolic´1 et al.
Kashif H, Patel H (2016) Buffer space allocation for real-time priority-aware
networks. In: Proceedings of the 22nd IEEE Real-Time and Embedded Tech-
nology and Applications Symposium
Kashif H, Gholamian S, Patel H (2014) Sla: A stage-level latency analysis for
real-time communication in a pipelined resource model. IEEE Transactions
on Computers 99
Kavaldjiev NK, Smit GJM (2003) A survey of efficient on-chip communications
for soc. In: Proceedings of the 4th Symposium on Embedded Systems
Liu M, Becker M, Behnam M, Nolte T (2015a) Improved priority assignment
for real-time communications in on-chip networks. In: Proceedings of the
23rd International Conference on Real-Time Networks and Systems
Liu M, Behnam M, Nolte T (2015b) A stochastic response time analysis for
communications in on-chip networks. In: Proceedings of the 21st IEEE Con-
ference on Embedded and Real-Time Computing and Applications
Liu M, Becker M, Behnam M, Nolte T (2016a) Scheduling real-time pack-
ets with non-preemptive regions on priority-based nocs. In: Proceedings of
the 22nd IEEE Conference on Embedded and Real-Time Computing and
Applications
Liu M, Becker M, Behnam M, Nolte T (2016b) Tighter time analysis for real-
time traffic in on-chip networks with shared priorities. In: International Sym-
posium on Networks-on-Chip
Liu M, Becker M, Behnam M, Nolte T (2017) A tighter recursive calculus to
compute the worst case traversal time of real-time traffic over nocs. In: Pro-
ceedings of the 22nd Asia and South Pacific Design Automation Conference
Mesidis P, Indrusiak L (2011) Genetic mapping of hard real-time applications
onto noc-based mpsocs – a first approach. In: 6th International Workshop
on Reconfigurable Communication-centric Systems-on-Chip
Millberg M, Nilsson E, Thid R, Jantsch A (2004) Guaranteed bandwidth using
looped containers in temporally disjoint networks within the nostrum net-
work on chip. In: Proceedings of the 7th Conference on Design Automation
and Test in Europe, vol 2, pp 890–895 Vol.2
Ni LM, McKinley PK (1993) A survey of wormhole routing techniques in direct
networks. The Computer Journal 26
Nikolic´ B, Petters SM (2014a) Edf as an arbitration policy for wormhole-
switched priority-preemptive nocs – myth or fact? In: Proceedings of the
14th International Conference on Embedded Software
Nikolic´ B, Petters SM (2014b) Real-time application mapping for many-cores
using a limited migrative model. Real-Time Systems Journal
Nikolic´ B, Pinho LM (2017) Optimal minimal routing and priority assignment
for priority-preemptive real-time nocs. Real-Time Systems Journal
Nikolic´ B, Awan MA, Petters SM (2011) SPARTS: Simulator for power aware
and real-time systems. In: Proceedings of the 8th IEEEInternational Con-
ference on Embedded Software and Systems
Nikolic´ B, Ali HI, Petters SM, Pinho LM (2013) Are virtual channels the
bottleneck of priority-aware wormhole-switched noc-based many-cores? In:
Proceedings of the 21st International Conference on Real-Time Networks
Real-Time Analysis of Priority-Preemptive NoCs 45
and Systems
Nikolic´ B, Yomsi PM, Petters SM (2014) Worst-case communication delay
analysis for many-cores using a limited migrative model. In: Proceedings of
the 20th IEEE Conference on Embedded and Real-Time Computing and
Applications
Nikolic´ B, Indrusiak LS, Petters SM (2016a) A tighter real-time communi-
cation analysis for wormhole-switched priority-preemptive nocs. Technical
report arxiv:1605.07888
Nikolic´ B, Pinho LM, Indrusiak LS (2016b) On routing flexibility of wormhole-
switched priority-preemptive nocs. In: Proceedings of the 22nd IEEE Con-
ference on Embedded and Real-Time Computing and Applications
Paukovits C, Kopetz H (2008) Concepts of switching in the time-triggered
network-on-chip. In: Proceedings of the 14th IEEE Conference on Embedded
and Real-Time Computing and Applications, pp 120–129
Racu A, Indrusiak L (2012) Using genetic algorithms to map hard real-time
on noc-based systems. In: 7th International Workshop on Reconfigurable
Communication-centric Systems-on-Chip
Rambo EA, Ernst R (2015) Worst-case communication time analysis of
networks-on-chip with shared virtual channels. In: Proceedings of the 18th
Conference on Design Automation and Test in Europe
Sayuti M, Indrusiak L (2013) Real-time low-power task mapping in networks-
on-chip. In: IEEE Computer Society Annual Symposium on VLSI
Schoeberl M (2007) A time-triggered network-on-chip. In: Proceedings of the
17th International Conference on Field-Programmable Logic and Applica-
tions
Schoeberl M, Abbaspour S, Akesson B, Audsley N, Capasso R, Garside J,
Goossens K, Goossens S, Hansen S, Heckmann R, Hepp S, Huber B, Jordan
A, Kasapaki E, Knoop J, Li Y, Prokesch D, Puffitsch W, Puschner P, Rocha
A, Silva C, Sparsø J, Tocchi A (2015) T-crest: Time-predictable multi-core
architecture for embedded systems. Journal of System Architecture
Shi Z, Burns A (2008a) Priority assignment for real-time wormhole commu-
nication in on-chip networks. In: Proceedings of the 29th IEEE Real-Time
Systems Symposium
Shi Z, Burns A (2008b) Real-time communication analysis for on-chip net-
works with wormhole switching. In: International Symposium on Networks-
on-Chip
Shi Z, Burns A (2010) Schedulability analysis and task mapping for real-time
on-chip communication. Real-Time Systems Journal
Shi Z, Burns A, Indrusiak LS (2010) Schedulability analysis for real time
on-chip communication with wormhole switching. International Journal on
Embedded and Real-Time Communication Systems
Song H, Kwon B, Yoon H (1997) Throttle and preempt: a new flow control
for real-time communications in wormhole networks. In: Proceedings of the
1997 International Conference on Parallel Processing
Stefan RA, Molnos A, Goossens K (2012) daelite: A tdm noc supporting qos,
multicast, and fast connection set-up. IEEE Transactions on Computers
46 Borislav Nikolic´1 et al.
63(3)
Tilera (2012) TILE64
TM
Processor.
www.mellanox.com/repository/solutions/tile-scm/docs/
UG130-ArchOverview-TILE-Gx.pdf
Tobuschat S, Ernst R (2017) Real-time communication analysis for networks-
on-chip with backpressure. In: Proceedings of the 20th Conference on Design
Automation and Test in Europe
Xiong Q, Lu Z, Wu F, Xie C (2016) Real-time analysis for wormhole noc:
Revisited and revised. In: Proceedings of the 26th ACM Great Lakes Sym-
posium on VLSI
Xiong Q, Wu F, Lu Z, Xie C (2017) Extending real-time analysis for wormhole
nocs. IEEE Transactions on Computers 66(9)
