Simultaneous Progressing Switching Protocols for Timing Predictable
  Real-Time Network-on-Chips by Ueter, Niklas et al.
Simultaneous Progressing Switching Protocols for
Timing Predictable Real-Time Network-on-Chips
Niklas Ueter1, Georg von der Bru¨ggen1, Jian-Jia Chen1, Tulika Mitra2, and Vanchinathan Venkataramani2
1TU Dortmund University, Germany
2National University of Singapore, Singapore
Abstract—Many-core systems require inter-core communica-
tion, and network-on-chips (NoCs) have been demonstrated
to provide good scalability. However, not only the distributed
structure but also the link switching on the NoCs have imposed
a great challenge in the design and analysis for real-time
systems. With scalability and flexibility in mind, the existing
link switching protocols usually consider each single link to be
scheduled independently, e.g., the worm-hole switching protocol.
The flexibility of such link-based arbitrations allows each packet
to be distributed over multiple routers but also increases the
number of possible link states (the number of flits in a buffer)
that have to be considered in the worst-case timing analysis.
For achieving timing predictability, we propose less flexible
switching protocols, called Simultaneous Progressing Switching
Protocols (SP2), in which the links used by a flow either all
simultaneously transmit one flit (if it exists) of this flow or
none of them transmits any flit of this flow. Such an all-or-
nothing property of the SP2 relates the scheduling behavior on the
network to the uniprocessor self-suspension scheduling problem.
We provide rigorous proofs which confirm the equivalence of
these two problems. Moreover, our approaches are not limited to
any specific underlying routing protocols, which are usually con-
structed for deadlock avoidance instead of timing predictability.
We demonstrate the analytical dominance of the fixed-priority
SP2 over some of the existing sufficient schedulability analysis
for fixed-priority wormhole switched network-on-chips.
Keywords: Network-on-Chip, Real-Time Scheduling, Si-
multaneous Progressing Switching Protocols
I. INTRODUCTION
Power dissipation has constrained the performance scal-
ing of single-core systems in the past decade. Instead of
increasing the operational frequency of a processor, the chip
manufacturers have shifted their design focus towards chips
with multiple or many cores that operate at lower voltages and
frequencies than their single-core counterparts. In a multi- or
many-core system, communication and synchronization of the
applications executed on different cores have to be designed
efficiently from both hardware and software perspectives.
The communication fabric of a multi- or many-core platform
must scale with the number of cores. Otherwise, the compu-
tation capacity of the cores may be wasted if they are waiting
for the communication, synchronization, or memory access.
One possible approach to achieve good scalability of the
communication is the Network-on-Chip (NoC) architecture,
in which a switched network with routers is used to provide
the interconnection of the physical cores on a chip. The NoC
architecture allows parallel inter-core communication with
moderate hardware costs. NoCs are the prevalent choice of
interconnection due to their overall good performance and
scalability potential as reported by Kavaldjiev et. al [22].
The efficiency of a NoC highly depends on many design
factors, including topology, routing protocols, flow control,
switching arbitration protocols, etc. The currently available
multi-core platforms based on NoCs have employed different
topologies, e.g., a ring in the Intel Xeon Phi 3120A, a 2-D
torus in MPPA Manycores by Kalray, and a 2-D mesh in Tilera
TILE-Gx8036. Moreover, many different communication pro-
tocols and communication topologies have been proposed and
evaluated in the literature. Specifically, flit-based network-on-
chips (NoCs) have been proposed with the goal to decrease
production cost and increase energy efficiency due to less
complex routers and decreased buffer sizes as compared to
other approaches.
Real-Time system design is concerned with the construction
of systems that can be formally verified to satisfy timeliness
constraints. Such real-time constraints are prevalent in, e.g.,
timing-sensitive applications in embedded mobile platforms,
automotive, and aerospace applications. To construct a hard
real-time system on a NoC, each hard real-time message
(defined as an instance of a sporadic/periodic flow) in the
NoC has to be successfully transmitted from its source to its
destination before its deadline.
The approaches for real-time systems on a NoC apply two
general strategies. One is to utilize time-division-multiplexing
(TDM) to ensure that the timing constraints are satisfied
by constructing the transmission schedule statically with a
repetitive table, e.g., in [11], [12], [18], [28], [31], [36], [37],
[39]. Another is to apply a priority-based dynamic scheduling
strategy in the routers (and switches) to arbitrate the flits in the
network, e.g., in [13], [15], [16], [19], [20], [23], [27], [29],
[38], [42], [43]. The difficulty of the TDM strategy is to con-
struct the TDM schedule and the global clock synchronization,
whilst the difficulty of the priority-based scheduling strategy
is to validate the schedulability, i.e., whether all messages can
meet their deadlines in the worst case.
Specifically, for dynamic scheduling strategies, the
wormhole-switched fixed-priority NoC with preemptive
virtual channels has recently been studied in a series of
papers. The first attempts to tackle the schedulability analysis
were in 1994 in [29] and 1997 in [13]. Both of them were
found to be flawed in 1998 by Kim et al. [23], whose analysis
was later found to be erroneous in 2005 by Lu et al. [27].
The series of erroneous analyses continued in [13], [23], [27],
ar
X
iv
:1
90
9.
09
45
7v
2 
 [c
s.D
C]
  2
1 O
ct 
20
19
[29] until a seemingly correct result by Shi and Burns [38]
published in 2008. Eight years later, Xiong et al. [42] pointed
out the analytical flaw in [38] and disproved the safe bounds
in [19], [20]. The erroneous patch in [42] was later fixed by
the authors in their journal revision in [43] in 2017. In the
mean time, Indrusiak et al. [15], [16] presented new analyses,
but they “chose to provide intuitions, insight and experimental
evidence on the proposed analysis and its improvements,
rather than theorems or proofs.” They supported their
analyses by evaluating concrete cases, i.e., whether there was
any observed case which was claimed to meet the deadlines
but in fact missed the deadlines. However, such case studies
cannot validate the correctness of their analyses, as also
stated by Indrusiak et al. [15], [16]. Specifically, among the
10 results [13], [15], [19], [20], [23], [27], [29], [38], [42],
[43] published up to 2017, Table VII in [15] shows that eight
of them were already disproved by counter examples, and
two of them are probably safe as no counter examples have
been given. Nikovı´c et al. [30] published the most recent
result for the worst-case response timing analysis.
The fact that almost all proposed analyses for the problem
discussed in the previous paragraph have been found flawed
suggests that the scheduling algorithm and architecture can
be potentially too complex to be correctly analyzed. These
approaches have adopted the well-known worst-case response
time analysis for uniprocessor sporadic real-time tasks under
fixed-priority uniprocessor scheduling developed in [17], [25],
but they have never shown the connection between worm-hole
switching and uniprocessor scheduling.
Another research line to analyze the worst-case response
time of wormhole-switched fixed-priority NoC with preemp-
tive virtual channels is to apply Network Calculus and Com-
positional Performance Analysis (CPA), or their extensions,
to analyze the transmission on the links in a compositional
manner, e.g., [1], [3], [5], [32], [33], [40]. The worst-case
response time of a flow is the sum of the worst-case response
times on individual links, which can be pessimistic. One
exception is the analysis from Giroudot and Mifdaoui [9],
which applies the Pay Multiplexing Only Once principle by
Schmitt et al. [35].
Contributions: In this paper, we revisit the fundamental
algorithmic problem of flit-based NoC arbitration protocols
with respect to real-time constraints by hinting to the fun-
damental algorithmic complexity. In addition, we identify
the analytic pessimism of wormhole-switched fixed-priority
arbitration protocols due to the large degree of uncertainty in
system behaviour and thus hard to verify timeliness properties.
This paper intends to answer a few unsolved fundamental
questions for real-time networking switching in a NoC and
has the following contributions:
• What is the fundamental difficulty of worst-case timing
analysis when flit-based transmissions are handled by
switch-based (link-based) scheduling? We will show that
the difficulty is mainly due to the explosion of the pos-
sible progression states of a message in the transmission
path. The existing analyses did not intend to prove the
coverage of all possible progression states. Moreover, we
will also argue why priority-based scheduling without
controlling the number of possible progression states is
therefore difficult to be analyzed and optimized.
• Is there any equivalence between existing uniprocessor
scheduling and NoC switching? Yes, but, to the best of
our knowledge, such a protocol has never been designed.
For achieving timing predictability, we propose less flexi-
ble switching protocols, called Simultaneous Progressing
Switching Protocols (SP2), in which the links used by
a message either all simultaneously transmit one flit
of this message or none of them transmits any flit of
this message. Such an all-or-nothing property of the
SP2 relates the scheduling behavior on the network to
the uniprocessor self-suspension scheduling problem. We
provide rigorous proofs which confirm the equivalence of
these two problems.
• We demonstrate the analytical dominance of the (work-
conserving) fixed-priority version of our approach over
the existing sufficient schedulability analyses for fixed-
priority wormhole switched NoCs in [16], [43]. More-
over, our approaches are not limited to any specific un-
derlying routing protocols, which are usually constructed
for deadlock avoidance instead of timing predictability.
II. SYSTEM MODEL AND PROBLEM DEFINITION
Network-on-Chips (NoCs) are characterized by the topol-
ogy, routing protocol, arbitration, buffering, flow control mech-
anism, and switching protocol.1 In this paper, we define a NoC
as a collection of cores A, routers V, and links Λ. Each router
is connected to at least one other router by two physically
separate links, i.e., up-link and down-link. Figure 1 illustrates
a meshed network with 9 cores, A = {Ai | i = 1, 2, . . . , 9}, 9
routers V = {Vi | i = 1, 2, . . . , 9}, and 24 links in Λ between
Vi and Vj for some i and j and 18 links between Ai and Vi
for i = 1, 2, . . . , 9. We assume that all the cores, routers, and
links are homogeneous. Therefore, the transmission rate and
processing capability are identical.
A. Switching Mechanisms
With respect to switching, three different switching proto-
cols have been established, namely, circuit switching, store-
and-forward switching, and wormhole switching.
1) Circuit Switching: A packet (transmission unit) is for-
warded by the routers through dedicated routes that are re-
served/allocated until the transmission is finished. Therefore,
each transmission can only be preempted during the establish-
ing of a route. An advantage of this approach is that no buffers
are required and subsequent arbitrary deadlock-free routing,
which allows for optimized and adaptive routing schemes.
However, the overhead to establish the routes may render this
approach infeasible when small packets are injected frequently.
1Our notation of flows is equivalent to tasks and our notation of messages
is equivalent to jobs in the classical notation of real-time systems community.
2
A1
V1
A2
V2
A3
V3
A4
V4
A5
V5
A6
V6
A7
V7
A8
V8
A9
V9
Fig. 1: Examplary 3x3 mesh NoC. Each application is con-
nected to a source-router where it injects messages.
2) Store-And-Forward Switching: Routers can only forward
a packet once it is completely received and stored, which
implies that the routers must provide sufficient buffer capacity
to store a complete packet. Fortunately, the arbitration protocol
is suitable for real-time analysis, since a packet may compete
for at most one link at each point in time.
3) Wormhole-Switching: Each packet is divided into
smaller transmission units, called flits, always including a
designated header and a designated tail flit which are used
for control and routing. That is, each payload flit follows
the output port of the header flit. In fixed-priority wormhole-
switched NoCs, each router contains virtual channels, i.e.,
separated buffers that contain flits of a single packet. Once the
tail flit is transmitted and removed from the buffer, the virtual
channel can be used for flits of another packet. Furthermore,
the highest-priority flit is scheduled to transmit over the link at
each router. In this approach, complete packets do not have to
be buffered and which allows smaller buffers in the hardware
design. On the downside, each packet may be distributed over
multiple routers and subsequently compete for multiple links
at the same time which makes the timing analysis complex.
Additionally, the limited number of virtual channels and full
buffers on the receiving router add additional interference
which complicate the analysis.
B. Messages and Periodic/Sporadic Flows
A periodic (sporadic) flow fi generates an infinite sequences
of flow instances, called messages, and has the following
parameters:
• Ti is the minimum inter-arrival time or period of the
flow fi, i.e., for a periodic flow one message is released
exactly every Ti time units and for a sporadic flow two
subsequent messages are separated by at lest Ti.
• Λi is the static routing path of the flow fi, i.e.,
λi1, λi2, . . . , λiηi is the sequence of the ηi links that a
message of fi has to be transmitted on. We assume that
a physical link cannot be used more than once in the
static routing path Λi for any i.
• Ci is the worst-case number of flits of a message of fi.
• Di is the relative deadline of the flow fi. That is, when
a message is injected at time t, its absolute deadline is
t + Di. Our protocols are not restricted to any specific
relation of the minimum inter-arrival time and the relative
deadline Di. However, our timing analysis will focus on
the constrained-deadline cases, where Di ≤ Ti ∀i.
C. Problem Definition
The scheduler design problem studied in this paper is
defined as follows: We are given a NoC, defined as a
collection of cores A, routers V, and links Λ. For a given
set F of sporadic or periodic flows on the NoC, the objective
is to design a switching mechanism (scheduling algorithm)
that can ensure that all messages (instances of the flows) can
meet their deadline.
The schedulability test problem studied in this paper is
defined as follows: We are given a NoC, defined as a
collection of cores A, routers V, and links Λ. For a given set
F of sporadic or periodic flows on the NoC and a switching
mechanism, the objective is to validate whether the messages
(instances of the flows) can meet their deadlines.
We assume that the cores and routers are synchronized
perfectly with respect to time. That is, there is no clock drift
in the NoC. Otherwise, the clock drift must be considered
carefully. One solution is to introduce additional delays and
interferences to pessimistically bound the impact due to clock
drifts. Moreover, we assume discrete time, i.e., the NoC
operates in the granularity of a fixed time unit and the finest
granularity is the flit.
III. EXISTING ANALYTICAL APPROACHES FOR
WORM-HOLE SWITCHING
In this section, we will first summarize the existing ana-
lytical approaches for the worm-hole switching mechanism
in Section III-A. Then, we will explain the mismatch of the
existing analyses and the underlying uniprocessor scheduling
in Section III-B.
A. Summary of Existing Analyses
A first analytical approach to determine the worst-case
response time of sporadic traffic flows in wormhole-switched
fixed-priority network-on-chips was given by Mutka [29] and
Hary and Ozguner [13]. Both of them are based on the
schedulability analysis for uniprocessor sporadic real-time
tasks under fixed-priority scheduling developed in [17], [25].
To analyze the worst-case response time of the flow fi, they
considered the complete path Λi as a single shared resource,
i.e., a uniprocessor. This shared resource may not always be
available for fi, and they modeled the unavailability by only
considering the higher-priority flows that use any link in Λi,
called direct interference. They concluded that the problem is
equivalent to the fixed-priority uniprocessor scheduling, which
was disproved by Kim et al. [23], who showed that the flow
3
fi can suffer from the interference due to flow fj even if Λi
and Λj have no intersection, called indirect interference. By
extending the notion of interference sets developed by Kim
et al. [23], Lu et al. [27] proposed to discriminate between
flows that could not interfere with each other to reduce the
pessimism of the analysis.
However, both of the approaches in [23], [27] assume
that the synchronous release of the first messages of the
sporadic real-time flows is the worst-case, i.e., similar to the
critical instant theorem in classical uniprocessor fixed-priority
scheduling proposed by Liu and Layland [26]. This statement
was later disproved in 2008 by Shi and Burns [38], where
jitter terms were added to model the asynchronous release of
the first messages of the sporadic real-time flows. Based on
the results of this work, Kashif and Patel proposed a link-
based analysis called stage-level analysis [19], [20] to achieve
a tighter analysis. Both analyses were proved to be unsafe
by Xiong et al. [42] using simulations. It was discovered
that a flit of a higher-priority flow may induce interference
more than once, i.e., on multiple routers, thus rendering the
conjectures made by Shi and Burns [38] and Kashif and
Patel [19], [20] false. This behavior is referred to as multi-
point progressive blocking by Indrusiak et al. [16]. The state
of the art with respect to fixed-priority wormhole-switched
networks-on-chips with infinite buffers is represented by [15],
[43]. Unfortunately, the infinite buffer assumption is infeasible
in real systems, thus back-pressure effects that occur due to
limited buffer sizes in the routers have to be considered. In the
work of Indrusiak et al. [16], the authors incorporate buffer
sizes into the worst-case response time analysis. They “chose
to provide intuitions, insight and experimental evidence on the
proposed analysis and its improvements, rather than theorems
or proofs.” Thus, further counterexamples may be found. The
fact that almost all proposed analyses have been found to be
flawed, suggests that the scheduling algorithm and architecture
are too complex to be reasonably analyzed. Further evidence
for this claim is that in the analyses provided by Indrusiak
et al. [16], increased buffer sizes lead to increased worst-
case response times. Nikolı´c et al. [30] presented an improved
analysis over the results in [16], [43].
Motivated by this, we revisit the fundamental algorith-
mic problem of packet-based network-on-chip scheduling and
identify the analytic pessimism incurred by the complexity of
link-based arbitration as harmful to routing and to verification.
All the above results in [13], [15], [16], [19], [20], [23],
[27], [29], [30], [38], [42], [43] made an assumption that the
schedulability analysis is somehow related to a corresponding
uniprocessor fixed-priority scheduling problem. However, this
has never been formally proved. We will explain the mismatch
by using the progression model in Section III-B.
B. Progression Model
In this subsection, we detail why the link-based arbitration
problem does not match the uniprocessor scheduling model
and illustrate the subsequent problems in response-time anal-
yses using uniprocessor scheduling theory. To explain the
mismatch, we will focus on the possible buffer states of one
instance (i.e., message) of a flow fi under analysis. Let ~Bi
denote the state vector of the number of flits that are buffered
in the cores and routers involved in the path Λi. Suppose that
there are ηi links involved in Λi. Note that the first element in
~Bi denotes the number of flits of the whole message of fi in
the source core to be sent and the last element in ~Bi denotes
the number of flits that have been received at the destination
core.
Recall our assumption in Section II-C that the NoC is
assumed to be fully synchronized in time. Therefore, in each
time unit, a buffered flit can be forwarded to the next node
(a core or a router). Since the NoC works in discrete time,
we can observe the changing of the vector ~Bi over time when
considering only the time units at which the message is sent.
When a flit is sent in a time unit at the j-th link in Λi,
then the buffer of the j-th node is reduced by 1 and the buffer
of the j + 1-th node in Λi is increased by 1. For notational
brevity, let ~yj be a vector of ηi + 1 elements in which all
the elements are 0 except the j-th element that is −1 and the
j + 1-th element that is 1. For example, ~y1 + ~y2 implies that
the first and the second links send one flit forward in this time
unit. Moreover, ~y1+~y3 implies that the first and the third links
send one flit forward in this time unit.
Definition 1 (Progression). Consider a buffer state ~Bi at
time t, in which all elements in ~Bi are non-negative integers.
Suppose element zj is either 0 or 1 for j = 1, 2, . . . , ηi and
at least one of them is 1. Specifically, when zj is 1, the j-th
link sends one flit forward.
Let ~Y be
∑ηi
j=1 zj~yj . For a buffer state ~Bi, zj for j =
1, 2, . . . , ηj , and a vector ~Y defined above, the change of the
buffer state is valid if
• all elements in ~Bi + ~Y are non-negative integers, and
• the j-th element in ~Bi is ≥ zj+1 for j = 1, 2, . . . , ηi− 1.
If the change of the buffer state is valid, we say that the flow
makes a progression in this time unit, i.e., one clock cycle.
In a time unit, a link may or may not be utilized to send
one flit of fi in the switching mechanism. Therefore, there are
2ηi combinations of the vectors of ~ys. Note that progressions
do not have to take place in two consecutive time units. If
the message is not sent in the next time unit at all, there is
no progression of the message. As an illustrative example,
consider Ci = 10 and that the message is sent from core A1
via R1 and R2 to core A2 in the NoC illustrated in Figure 1.
If one flit is sent in a time unit, we get ~Bi = (Ci− 1, 1, 0, 0).
Now, there are three possibilities for the next time unit when
the NoC transmits a flit or multiple flits of the message:
• ~Bi = (Ci − 1, 0, 1, 0): That is, A1 does not send any
flit but R1 sends a flit to R2. The progression is due to
~Y = (0,−1, 1, 0).
• ~Bi = (Ci − 2, 2, 0, 0): That is, A1 sends one flit to R1
but R1 does not send a flit to R2, i.e., ~Y = (−1, 1, 0, 0).
• ~Bi = (Ci−2, 1, 1, 0): That is, A1 sends one flit to R1 and
R1 sends a flit to R2, which means that the progression is
4
(10, 0, 0, 0)
(9, 1, 0, 0)
(8, 2, 0, 0)
...
...
[-1, 1, 0, 0]
(9, 0, 1, 0)
(9, 0, 0, 1)
[0, 0,-1, 1]
(8, 1, 1, 0)
[-1, 1, 0, 0]
(8, 1, 0, 1)
[-1, 1,-1, 1]
[0,-1, 1, 0]
(8, 1, 1, 0)
...
...
[-1, 0, 1, 0]
[-1, 1, 0, 0]
Fig. 2: Progressions of a message which involves 2 cores and 2 routers, i.e., 3 links, when Ci = 10. The numbers associated
to an edge indicate the change of the buffers in the vector ~Bi. The red dashed path illustrates the beginning of a fastest
progression and the blue dashed path illustrates the beginning of a slowest progression.
due to ~Y = (−1, 1, 0, 0) + (0,−1, 1, 0) = (−1, 0, 1, 0).
The first three levels of the tree illustrated in Figure 2 provides
the above example. In each of the above state, the next
progression has to be considered. Due to space limitation, we
only further illustrate the progressions that are possible when
~Bi is (9, 0, 1, 0).
Definition 2 (A Series of Progressions). A series of progres-
sions is a sequence of progressions defined in Definition 1,
one after another, starting from ~Bi = (Ci, 0, 0, . . . , 0) to
~Bi = (0, 0, . . . , Ci).
A safe analysis of the worst-case response time or the
schedulability for sending the message should consider all
possible series of progressions of fi starting from ~Bi =
(Ci, 0, 0, . . . , 0) to ~Bi = (0, 0, . . . , Ci). If we only account for
the number of time units when the message of fi is transmit-
ted, it is not difficult to see that the slowest one only sends one
flit forward per progression, in which the switching mechanism
results in Ci × ηi iterations of progressions. Moreover, the
fastest one sends one flit (if available) forward for all cores
and routers involved in the path λi per progression, in which
the switching mechanism results in Ci + ηi − 1 iterations of
progressions.
All the results in [13], [15], [16], [19], [20], [23], [27], [29],
[30], [38], [42], [43] made an assumption that the correspond-
ing uniprocessor scheduling problem can use Ci+ηi−1 as the
worst-case execution time of the corresponding sporadic task
to represent the flow fi. This assumption implicitly implies
that the flow fi takes the fastest series of progressions ex-
plained above. Such uniprocessor analyses are only valid when
the other iterations of progressions are accounted correctly.
However, the fastest series of progressions for fi is not always
possible in the worst case. To ensure the correctness of the
analysis, some additional time units should be included. Many
patches have been provided to account for such additional time
units after the series of flaws found in [13], [19], [20], [23],
[27], [29], [38], [42].
Informally speaking, the researchers in [13], [15], [16],
[19], [20], [23], [27], [29], [30], [38], [42], [43] have tried
to construct their analyses by linking the problem to a cor-
responding uniprocessor scheduling problem. Most of them
were later found flawed, e.g., [13], [19], [20], [23], [27],
[29], [38], [42], or without a formal proof, e.g., [15], [16],
[30].2 However, none of them has seriously considered all
the possible progressions for transmitting fi. Whether the
worst-case response time analysis for preemptive worm-hole
switching is equivalent to any specific form of uniprocessor
scheduling problem remains as an open question.
We strongly believe that the worst-case response time
analysis or the schedulability analysis for wormhole-switched
fixed-priority network-on-chip is highly complex, as the timing
behavior is not uniprocessor equivalent, as reported in the
literature. In a uniprocessor system, if a job is executed for
δ time units, the execution time of the job is reduced by δ
time units. However, sending δ flits can have different series
of progressions in the NoC.
If we would like to consider the complete path Λi as a
single shared resource, like in [13], [15], [16], [19], [20],
[23], [27], [29], [30], [38], [42], [43], and analyze the worst-
case behavior by utilizing the corresponding instance of the
uniprocessor scheduling problem, the mapping from the series
of the progressions to the uniprocessor problem must be
formally proved.
Please note that we do not claim that such a mapping is
impossible. We only stated the mismatch. Such mappings are
potentially very difficult to achieve precisely due to the large
search space. However, safe approximations and upper bounds
are also missing in the literature. In both cases, a correct
proof should explain how to safely account for the number
of iterations in the progressions of the flows and map them to
the corresponding execution time in the constructed instance
of the uniprocessor scheduling problem.
IV. SIMULTANEOUS PROGRESSING SWITCHING
PROTOCOLS
To the best of our knowledge, there is no formal proof to
demonstrate the connection between the progressions of the
messages on a NoC and the corresponding instance of the
2The proofs in [30] did not consider the equivalence of the worst-case
response time analysis adopted in uniprocessor systems and the analysis
of a NoC. Instead, they emphasized the quantification of different types of
interferences. However, in many places in the proofs, e.g., the building blocks
from Lemmas 3, 4, and 6 in [30], the derivation is based on examples.
5
uniprocessor scheduling problem. We note that the analytical
difficulty is potentially due to the flexibility introduced in the
switching mechanism. In the position papers by Wilhelm et
al. [41] and Axer et. al [2], the authors state that system
properties that are subject to predictability constraints should
already be considered and guaranteed from the design. Since
the worm-hole switching protocol was not designed with
predictability constraints in mind, designing new protocols that
can be safely analyzed without losing too much flexibility or
efficiency can be an alternative.
Instead of proving the complex scenarios in the standard
worm-hole switching, we propose another protocol which has
only one series of progressions. This less flexible switching
protocols, called Simultaneous Progressing Switching Proto-
cols (SP2), achieves timing predictability by enforcing that a
flow fi is eligible to transmit on its route if and only if it can
be allocated for all links in Λi in-parallel. In other words, the
links used by a flow fi either all simultaneously transmit one
flit of this flow (if it exists) or none of them transmits any flit
of this flow. As a result, for a progression of fi in a time unit,
some links in Λi may be reserved even though there is no flit
to be transmitted over this link in this time unit, a behaviour
similar to processor spinning.
In order to formally define the schedules and analyze the
schedulability, we use the following definition.
Definition 3 (Schedules). A schedule Sλj (t) is a function that
maps time t in the time-domain to the flow that is scheduled
on the link λj at that time. Further, Sλj (t) =∞ if λj is idle
at time t.
We use the Iverson bracket to indicate whether a flow
fi is scheduled on a link λj at time t.3 For convenience,
we use SΛj (t) to indicate the ordered set of schedules
{Sλk(t) | λk ∈ Λj}. Moreover, we abbreviate our notation to
a single value, i.e., SΛj (t) = i to denote that all links in Λj
schedule flow fi at time t.
Definition 4. A schedule S implements the SP2 if for all t ≥ 0,
for each static routing path Λi of every flow fi, the following
implication holds:
(Sλk(t) = i for some λk ∈ Λi) =⇒ SΛj (t) = i
In general, the SP2 can be implemented with different
strategies. The focus of this paper is not the implementation
or design of scheduling strategies to meet the schedule defined
in Definition 4. In the upcoming two subsections, we discuss
two potential scheduling strategies that can be used to derive
such schedules.
A. Links to Gang Scheduling
To meet the deadline of a message of a flow fi, that arrives
at time t, the concept of the SP2 requires to have Ci + ηi − 1
time units to use all the links in Λi simultaneously before t+
3[Sλj (t) = i] is 1 if fi is scheduled on λj at time t and is 0 otherwise.
Di. This is similar to the rigid gang scheduling problem [10],
which can be defined as follows:
• We are given a set of periodic/sporadic tasks to be
executed on the given machines.
• Each task has to simultaneously use a subset of the given
machines as a gang. Whenever the task is executed, all
of its required machines must be exclusively allocated to
the task.
That is, we can consider that each of the links in Λ is a
machine and each flow is a task. The links in Λi form a gang
for flow fi and the execution time is Ci + ηi − 1.
Finding the optimal schedule for the rigid gang scheduling
problem has been shown NP-hard in the strong sense even
when all the tasks have the same period and the same
deadline [24]. Moreover, even special cases, like three ma-
chines [4] or unit execution time per task [14], are also NP-
hard in the strong sense. The rigid gang scheduling problem
for implicit-deadline periodic real-time task systems (i.e.,
Di = Ti for every task τi) has been recently studied by
Goossens and Richard [10]. They presented two algorithms,
one is based on linear programming and another is based on
a heuristic algorithm. Moreover, Harde et al. [12] constructed
static schedules by using a constrained-programming or an
integer-linear-programming (ILP) approach for harmonic real-
time task systems. Another version of gang scheduling is the
so-called global gang scheduling problem [8], [21], [34], in
which the set of machines used by a gang task is not fixed.
A gang task requires a certain amount of machines, and these
machines can be dynamically relocated at runtime. We note
that the global gang scheduling problem is unrelated to the
SP2, since the links used by a flow has to be defined from the
source node to the destination node.
B. Work-Conserving Priority-Based Schedules
Instead of optimizing the scheduling strategies in the routers
in a NoC for the SP2, we will focus on the work-conserving
priority-based SP2. Such strategies can be dynamic-priority
algorithms (i.e., two messages of flow fi may have different
priorities) and fixed-priority scheduling algorithms (i.e., all
messages of flow fi have the same priority).
That is, each message has a priority. When two messages
intend to use one link at the same time, the higher-priority
message is scheduled and the lower-priority message is sus-
pended. Whenever a message is suspended in one of its links,
it is suspended in all of its links.
We will focus on fixed-priority scheduling in the remainder
of this paper. We will focus on the theoretical benefit of the
Simultaneous Progressing Switching Protocols in Section V.
V. SP2 SCHEDULING ANALYSIS
For a scheduling protocol A, a real-time schedulability anal-
ysis of a flow set F validates whether no flow in the flow set
misses its deadline under any valid schedule generated by A.
Such a validated flow set is hence called feasibly schedulable
by A or infeasible otherwise. Furthermore, an analysis of
a schedulability test for an algorithm A is called sufficient
6
if all flow sets that are deemed to be feasibly schedulable
by this test are actually feasibly schedulable. Likewise the
test is called necessary, if every flow set that is schedulable
by algorithm A is verified to be feasibly schedulable by
the corresponding schedulability test. Hence, a necessary and
sufficient schedulability test is called exact. In the remainder
of this paper, we will derive a sufficient schedulability test.
For each flow under analysis, we partition all other flows
into a direct contention domain and an indirect contention
domain. We define a (direct) contention domain of two flows
by identifying the set of higher-priority flows that share at
least one link and thus potentially directly interfere with each
other. Then, we consider the (indirect) contention domain of
the flow under analysis, i.e., flows that do not directly share a
link with this flow under analysis but interfere with flows in
the (direct) contention domain. In the remainder of this paper,
we implicitly assume that the flow set is indexed such that a
flow fi has higher priority than a flow fj if i < j.
In this section, we explain how the schedulability
analysis for preemptive fixed-priority Simultaneous Pro-
gressing Switching Protocols can be related to the
work-conserving fixed-priority preemptive uniprocessor self-
suspension scheduling problem. In real-time scheduling the-
ory, self-suspension denotes the property of an executable
entity to be exempted from the scheduling decisions for a
specified amount of time , i.e., self-suspension time. In our
analysis, we use self-suspension to model a flow fj that is
eligible to be scheduled using any work-conserving algorithm
at a given time t due to being the highest-priority flow and
being active (released and not yet finished), but is not trans-
mitted due to indirect contention. We prove that this (indirect)
contention can be related to self-suspension behaviour in
uniprocessor real-time scheduling theory and thus show the
applicability of existing self-suspension aware schedulability
analyses. We formally define and prove the self-suspension
equivalent property of fixed-priority scheduling using the SP2.
We only briefly introduce the uniprocessor self-suspension
problem and refer the reader to the existing literature [6], [7]
for further information. In short, self-suspension refers to the
exemption of a ready schedulable entity from the scheduling
decision for a certain amount of time. This exemption behavior
is modeled as dynamic self-suspension and multi-segment self-
suspension in the literature. In the former, the suspension
pattern can be arbitrary and is only parametrically upper
bounded by the total self-suspension time. This flexibility
comes at the cost of more pessimism in the timing analyses.
In the latter model, an upper bound of the duration and the
number of a task’s suspension intervals is known and can thus
be used in the timing analyses. In this paper, we consider the
dynamic self-suspension model and the corresponding timing
analyses.
Example 1 (Self-Suspension Behaviour). In Figure 3, three
priority ordered flows f1, f2, f3 that transmit 20, 19, and 29
flits respectively through the subset of routers V1, V2, V3, and
V6 as illustrated Figure. 1. Flow f1 has the highest priority
0 10 20 30 40 50 60
link (V1 → V2)
f1 f2
link (V2 → V3)
f3 f2 f3
link (V3 → V6)
suspended
f3 f3
Fig. 3: An exemplary self-suspension instance for three flows
using the SP2 and fixed-priority scheduling. The empty rect-
angles are the time allocated for f2 and f3 but not used for
transmission since there is no available flit yet.
and f3 has the lowest priority. Flow f1 transmits from V1 to
the router V2, flow f2 transmits from router V1 through V2 to
V3. Moreover, flow f3 transmits from V2 through the router V3
to V6. In this example, all flows are released synchronously
and are scheduled according to Simultaneous Progressing
Switching Protocols using fixed-priority scheduling. Note that
an actual transmission of a flit is denoted by a darkened box
whereas the unfilled areas indicate the flows that are granted
that link. On the link from V1 to V2, flow f1 precedes flow
f2 due to its higher priority. Since f3 does not share any link
with f1, flow f3 can transmit on its links. After the finishing
of f1, flow f2 attempts to transmit on its links and preempts
flow f3 on the link from V2 to V3. Due to the SP2, f3 is not
eligible to transmit on the link from V3 to V6 despite being
the highest-priority flow on that link. Since the preemption by
f2 is transparent on that link, this behaviour is similar to the
self-suspension property of real-time tasks.
To formalize the direct contention domain of a flow fi under
analysis, we denote the set of higher-priority flows of fi that
share at least one link as sharei = {fj ∈ F | j < i ,Λj ∩
Λi 6= ∅}. Under the fixed-priority SP2, each flow has a priority,
assumed to be unique here. In our analysis in this section, we
assume work-conserving fixed-priority arbitration:
• a flow transmits further if none of its link is used by any
higher-priority flow, and
• a flow does not transmit further if one of its link is
allocated by another higher-priority flow.
Note that we analyze the schedulability of each flow indi-
vidually under the assumption that the schedulability of all
higher-priority flows has already been validated. In order to
formally quantify the times t such that a flow fj is eligible
to transmit on all links in Λi in-parallel using Simultaneous
Progressing Switching Protocols but is not transmitting, i.e.,
self-suspension behaviour, we give the following definition.
Definition 5. A flow fj is said to be Λi-self-suspended at a
time t, if the following conditions are satisfied:
Prop. 1: fj is active, i.e., released and not yet finished at time
t
7
Prop. 2: fj is the highest-priority flow on all the links in Λi,
i.e., min{SΛi(t)} > j
Prop. 3: Λi ) Λj and Λi ∩ Λj 6= ∅, i.e., Λj is a true non-
empty subset of Λi
Prop. 4: for all λk ∈ Λi Sλk(t) 6= j at time t, i.e., fj is not
scheduled on all the links
Note that we require Λi 6= Λi, since the self-suspension like
behaviour can only occur due to contention that is transparent
to the flow under analysis. In the following theorem, we
formally bound the set of flows that exhibit self-suspension
behaviour in order to safely account for the additional inter-
ference.
Lemma 1. Let a flow fj satisfy the properties fj ∈ sharei
and for all fn ∈ sharej : Λi ∩ Λn 6= ∅ then it follows that
fn ∈ sharei.
Proof. This simply comes from the definitions. By the first
property, it follows that j > i and Λj ∩Λi 6= ∅. The second
property implies that ∀fn, n > j such that Λn ∩ Λj 6= ∅,
the condition Λi ∩ Λn 6= ∅ holds, which implies that fn ∈
sharei.
Theorem 1. In a schedule that is generated by any fixed-
priority algorithm using Simultaneous Progressing Switching
Protocols, the set of flows that are Λi-self-suspending is not
larger than
SS(Λi) = {f` ∈ sharei | ∃fn ∈ share` : Λn ∩Λi = ∅}.
Proof. We prove this theorem by contrapositive, i.e., we show
if fj /∈ {f` ∈ sharei | ∃fn ∈ share` : Λn ∩ Λi = ∅} then
fj is not Λi-self-suspending. Therefore, we must analyze the
following two cases:
1) fj /∈ sharei,
2) fj ∈ sharei and ∀fn ∈ sharej : Λi ∩Λn 6= ∅.
In the first case, let fj /∈ sharei and thus by definition Λj ∩
Λi = ∅. Therefore, fj is not Λi-self-suspending at any time t
by definition.
In the second case, assume the existence of a time instant
t∗ such that fj is Λi-self-suspended at time t∗ and satisfies
the properties stated in the second case. Then, fi is active at
time t∗ and for all λk ∈ Λi either Sλk(t∗) = 0 or Sλk(t∗) >
j. Further, by the properties stated for the second case and
the results from Lemma. 1, we know that flow fn ∈ sharei.
This is a contradiction, because no flow fn could have been
active at time t∗ since otherwise the schedule would have been
Sλk(t
∗) = n for all λk ∈ Λi and n < j.
For further analysis, it is required to bound the maximal
amount of time a flow fj may be Λi-self-suspending.
Theorem 2. Let each higher-priority flow fj of the flow
under analysis fi be feasibly schedulable using Simultaneous
Progressing Switching Protocols. Then, the cumulative amount
of time that fj is Λi-self-suspending is at most Rj−Cj , where
Rj denotes the worst-case response-time of flow fj .
Proof. We consider the following two cases:
1) Flow fj is not Λi-self-suspending and thus the self-
trivially upper bounded by Rj − Cj .
2) There exists at least one point in time t ≥ 0 such that fj
is Λi-self-suspending.
Let Sj = {t ∈ [tj , tj +Rj) | fj is Λi-self-suspending} where
tj denotes the release of a packet of fj . By Definition. 5,
we know that fj satiesfies Prop. 1 - Prop. 4 at time t for
all t ∈ Sj . Furthermore, since Λi ∩ Λj 6= ∅ (Prop. 3) and
by the all-or-nothing property of Simultaneous Progressing
Switching Protocols, it must be that [SΛj (t) = j] = 0.
Since by assumption, the schedulability of each higher-
priority flow has already been validated, fj is feasibly schedu-
lable, i.e.,
∫ tj+Rj
tj
1− [SΛj (τ) = j] dτ = Rj−Cj . Due to the
SP2 property, we know that all t that satisfy (1− [(SΛi(t) =
j]) = 1 also satisfy (1−[(SΛj (t) = j]) = 1. In conclusion, we
have that
∫
t∈Sj 1− [(SΛi(τ) = j] dτ ≤
∫
t∈Sj 1− [SΛj (τ) =
j] dτ ≤ Rj − Cj which concludes the proof.
Corollary 1. A sporadic constrained-deadline flow set F
is fixed-priority schedulable using Simultaneous Progressing
Switching Protocols, if for each flow fi the transformed higher-
priority flow set:
f ′j =
{
(Cj , Tj , Dj , Sj) fj ∈ SS(Λi)
(Cj , Tj , Dj) otherwise,
(1)
is schedulable, where Sj = Rj − Cj .
The worst-case response time and schedulability of each
flow fk has to be verified under the assumption that the
higher-priority flows f1, f2, . . . , fk−1 are already verified to
be schedulable. Based on Corollary 1, any schedulability
test that verifies the schedulability of sporadic constrained-
deadline self-suspending task sets on uniprocessor systems
with preemptive fixed-priority scheduling can be used, e.g.,
the state-of-the-art tests by Chen et al. [6].
VI. ANALYTICAL ADVANTAGES OF SP2
In this section we shortly compare our fixed-priority SP2
and schedulability analysis with some of the state-of-the-art
fixed-priority schedulability analyses for wormhole-switching
NoC with virtual channels proposed by Xiong et al. [43] and
Indrusiak et al. [16]. Unfortunately, we have to admit to not
be able to comprehend the analyses presented by Nikolı´c et
al. [30] and are thus incapable to compare with their methods
analytically. We note that we do not intend to directly compare
the state-of-the-art analyses due the different protocols and
models but to only examplify the analytical gains and benefits
of the SP2 compared to one-hop scheduling protocols.
Let sharek be the set of higher-priority flows whose routing
path intersects with the routing path of fk. Let share1k be
fj ∈ {sharek \ {f` ∈ sharek | share` \ sharek = ∅}}. This
notation follows [30]. In the plaintext, share1k consists of the
flows in sharek by excluding those flows f` in which higher-
priority flows that intersect with f`, i.e., share` are also in
sharek. This means that if f` is in share1k, then f` is in
sharek and there exists one flow fn that is not in sharek but
8
in share`. Therefore, this notation is exactly SS(Λk) defined
in Theorem 1, i.e.,
SS(Λk) = share
1
k
According to Eq. (4) and Eq. (5) in [30], the analyses of the
worst-case response time for preemptive worm-hole switching
from [16], [43] can be computed by solving the minimum
value t > 0 of the following function:4
t = Ck +
∑
fj∈sharek
⌈
t+ JIj→i
Tj
⌉
· (Cj +Bj→i) (2)
where Bj→i ≥ 0 is the interference due to buffering, i.e.,
backpressure, and
JIj→i =
{
Rj − Cj if fj ∈ share1k,
0 otherwise.
(3)
Now, we consider the following response time analysis from
Chen et al. in [6]:5
t = Ck + Sk +
k−1∑
i=1
⌈
t+Q~xi + (1− xi)(Ri − Ci)
Ti
⌉
Ci (4)
where Q~xi =
∑k−1
j=i (Sj × xj) and for a certain binary
assignment of xi ∈ {0, 1} for i = 1, 2, . . . , k − 1.
According to Corollary 1 and SS(Λk) = share1k, we have
Sj = Rj − Cj if fj ∈ share1k and Sj = 0 if fj /∈ share1k.
Moreover, since fk is not in SS(Λk), we have Sk = 0. If fj is
in share1k, we set xj to 0; otherwise, we set xj to 1. In such
a setting of ~x, the value Q~xi in Eq. (4) is always 0. Together
with Sk = 0 by definition and the definition in Eq. (3), the
analysis in Eq. (4) becomes:
t =Ck +
∑
fj∈SS(Λk)
⌈
t+Rj − Cj
Tj
⌉
Cj +
∑
fj∈sharek\SS(Λk)
⌈
t
Tj
⌉
Cj
=Ck +
∑
fj∈share1k
⌈
t+Rj − Cj
Tj
⌉
Cj +
∑
fj∈sharek\share1k
⌈
t
Tj
⌉
Cj
=Ck +
∑
fj∈sharek
⌈
t+ JIj→i
Tj
⌉
Cj
≤Ck +
∑
fj∈sharek
⌈
t+ JIj→i
Tj
⌉
(Cj +Bj→i) = RHS of Eq.(2)
Therefore, the worst-case response time analysis from [16],
[43] is dominated by our analysis from Corollary 1 by applying
suspension-aware response time analysis from Chen et al. [6].
VII. IMPLEMENTATION CONSIDERATIONS
The architectural implementation of the priority-based Si-
multaneous Progressing Switching Protocols(SP2) providing
the all-or-nothing property requires to rethink previous router
4The term JRj in Eq. (4) in [30] is removed here since we consider sporadic
flows without release jitter.
5For notational consistency with [6], we here use the notation from [6] for
self-suspending task systems and assume that there are k − 1 higher-priority
flows in sharek .
designs. In the state-of-the-art wormhole switching protocols,
the decision at each router is local, i.e., each router simply
chooses the highest-priority flow on any outgoing link. In
contrast, the all-or-nothing property requires global decision
making.
The SP2 is a general concept, and there could be different
possible realizations. One possibility is to use centralized arbi-
tration which decides and dispatches the priority information
to the routers. However, this may incur high hardware cost.
Possible implementations are subject of future research efforts
and beyond the conceptual scope of this paper.
VIII. CONCLUSION
In this paper, we discuss the fundamental difficulty of worst-
case timing analysis of flit-based pipelined transmissions over
multiple links in-parallel in network-on-chips. The space of
possible progression states that need to be covered by an
analysis hints to the mismatch with uniprocessor scheduling
theory and their assumptions, thus making analyses complex
and prone to being optimistic. To that end, we propose
Simultaneous Progressing Switching Protocols (SP2), in which
the links used by a message either all simultaneously transmit
one flit of this message or none of them transmits any flit
of this message. For this family of protocols, we formally
prove the matching with uniprocessor scheduling assumptions
and theory. Furthermore, we show the relation between the
uniprocessor self-suspension scheduling problem and the SP2
scheduling and provide formal proofs to confirm this relation.
In addition, we provide a sufficient schedulability analysis for
fixed-priority SP2 scheduling.
We note that the existing link-based switching and the
proposed SP2 are in fact two extreme scenarios with respect to
the series of progressions. The SP2 approach always results in
the fastest series of progressions in any case by sacrificing the
possibility to send part of the messages even when only one
link is blocked by another higher-priority message/flow. The
link-based switching mechanism (worm-hole protocol) allows
the flexibility to send only one flit of a flow forward at a
time unit in the NoC, but it has to potentially consider the
slowest series of progressions of the flow in the worst case.
It may be possible to design timing predictable systems with
good average-case performance by pruning unnecessary pro-
gressions that have to be considered in the protocol. However,
this alternative was not considered in this paper.
We strongly believe that a timing-predictable switching
protocol in NoCs should be carefully designed so that NoC-
based many-core systems can yield predictable performance.
This paper provides protocols that can be implemented with
different strategies. We believe that our proposals can be a
first step towards predictable switching protocols of NoCs.
In our future work, we will explore possible design options
and their tradeoffs in the schedulability analyses and design
complexity/cost.
9
REFERENCES
[1] L. Abdallah, M. Jan, J. Ermont, and C. Fraboul. Wormhole networks
properties and their use for optimizing worst case delay analysis of
many-cores. In Proc. of the 10th IEEE International Symposium on
Industrial Embedded Systems (SIES 2015), pages 1–10, June 2015.
[2] P. Axer, R. Ernst, H. Falk, A. Girault, D. Grund, N. Guan, B. Jonsson,
P. Marwedel, J. Reineke, C. Rochange, M. Sebastian, R. V. Hanxleden,
R. Wilhelm, and W. Yi. Building timing predictable embedded systems.
ACM Trans. Embed. Comput. Syst., 13(4):82:1–82:37, Mar. 2014.
[3] H. Ayed, J. Ermont, J.-l. Scharbarg, and C. Fraboul. Towards a
unified approach for worst-case analysis of tilera-like and kalray-like
noc architectures. In World Conf. on Factory Communication Systems
(WFCS), WiP Session, Aveiro, Portugal, May 2016. IEEE.
[4] J. Błaz˙ewicz, P. Dell’ Olmo, M. Drozdowski, and M. Speranza. Corri-
gendum to: Scheduling multiprocessor tasks on three dedicated proces-
sors. Inf. Process. Lett., 49(5):269–270, 1994.
[5] M. Boyer, B. Dupont de Dinechin, A. Graillat, and L. Havet. Computing
routes and delay bounds for the network-on-chip of the kalray mppa2
processor. In Proc. of the 9th European Congress on Embedded Real
Time Software and Systems (ERTS2 2018), 2018.
[6] J.-J. Chen, G. Nelissen, and W.-H. Huang. A unifying response time
analysis framework for dynamic self-suspending tasks. In Euromicro
Conference on Real-Time Systems (ECRTS), pages 61–71, 2016.
[7] J.-J. Chen, G. Nelissen, W.-H. Huang, M. Yang, B. Brandenburg,
K. Bletsas, C. Liu, P. Richard, F. Ridouard, N. Audsley, R. Rajkumar,
D. de Niz, and G. von der Bru¨ggen. Many suspensions, many problems:
a review of self-suspending tasks in real-time systems. Real-Time
Systems, Sep 2018.
[8] Z. Dong and C. Liu. Analysis techniques for supporting hard real-time
sporadic gang task systems. In IEEE Real-Time Systems Symposium,
RTSS, pages 128–138, 2017.
[9] F. Giroudot and A. Mifdaoui. Buffer-aware worst-case timing analysis
of wormhole nocs using network calculus. In Real-Time and Embedded
Technology and Applications Symposium, (RTAS), pages 37–48, 2018.
[10] J. Goossens and P. Richard. Optimal scheduling of periodic gang tasks.
LITES, 3(1):04:1–04:18, 2016.
[11] K. Goossens, J. Dielissen, and A. Radulescu. Aethereal network on
chip: concepts, architectures, and implementations. IEEE Design Test
of Computers, 22(5):414–421, Sep. 2005.
[12] T. Harde, M. Freier, G. von der Bru¨ggen, and J.-J. Chen. Configurations
and optimizations of TDMA schedules for periodic packet communica-
tion on networks on chip. In International Conference on Real-Time
Networks and Systems, RTNS, pages 202–212, 2018.
[13] S. L. Hary and F. Ozguner. Feasibility test for real-time communication
using wormhole routing. IEE Proceedings - Computers and Digital
Techniques, 144(5):273–278, 1997.
[14] J. Hoogeveen, S. van de Velde, and B. Veltman. Complexity of
scheduling multiprocessor tasks with prespecified processor allocations.
Discrete Appl. Math., 55(3):259–272, 1994.
[15] L. S. Indrusiak, A. Burns, and B. Nikolic. Analysis of buffering
effects on hard real-time priority-preemptive wormhole networks. CoRR,
abs/1606.02942, 2016.
[16] L. S. Indrusiak, A. Burns, and B. Nikolic. Buffer-aware bounds to
multi-point progressive blocking in priority-preemptive nocs. In Design,
Automation & Test in Europe Conference & Exhibition, DATE, pages
219–224, 2018.
[17] M. Joseph and P. Pandya. Finding Response Times in a Real-Time
System. The Computer Journal, 29(5):390–395, May 1986.
[18] E. Kasapaki, M. Schoeberl, R. B. Sorensen, C. T. Muller, K. Goossens,
and J. Sparsø. Argo: A real-time network-on-chip architecture with an
efficient GALS implementation. IEEE Trans. VLSI Syst., 24(2):479–492,
2016.
[19] H. Kashif, S. Gholamian, and H. Patel. Sla: A stage-level latency
analysis for real-time communication in a pipelined resource model.
IEEE Transactions on Computers, PP, April 2014.
[20] H. Kashif and H. Patel. Buffer space allocation for real-time priority-
aware networks. In 2016 IEEE Real-Time and Embedded Technology
and Applications Symposium (RTAS), pages 1–12, April 2016.
[21] S. Kato and Y. Ishikawa. Gang EDF scheduling of parallel task systems.
In IEEE Real-Time Systems Symposium, RTSS, pages 459–468, 2009.
[22] N. Kavaldjiev and G. Smit. A survey of efficient on-chip communi-
cations for soc. In 4th PROGRESS Symposium on Embedded Systems,
pages 129–140. STW Technology Foundation, 10 2003. Imported from
DIES.
[23] B. Kim, J. Kim, S. Hong, and S. Lee. A real-time communication
method for wormhole switching networks. In International Conference
on Parallel Processing, pages 527–534, 1998.
[24] M. Kubale. The complexity of scheduling independent two-processor
tasks on dedicated processors. Inf. Process. Lett., 24(3):141–147, 1987.
[25] J. P. Lehoczky, L. Sha, and Y. Ding. The rate monotonic scheduling
algorithm: Exact characterization and average case behavior. In IEEE
Real-Time Systems Symposium’89, pages 166–171, 1989.
[26] C.-L. Liu and J. W. Layland. Scheduling algorithms for multiprogram-
ming in a hard-real-time environment. Journal of the ACM, 20(1):46–61,
1973.
[27] Z. Lu, A. Jantsch, and I. Sander. Feasibility analysis of messages for
on-chip networks using wormhole routing. In Asia and South Pacific
Design Automation Conference (ASP-DAC), volume 2, pages 960–964,
2005.
[28] M. Millberg, E. Nilsson, R. Thid, and A. Jantsch. Guaranteed bandwidth
using looped containers in temporally disjoint networks within the
nostrum network on chip. In Design, Automation and Test in Europe
Conference (DATE), pages 890–895, 2004.
[29] M. W. Mutka. Using rate monotonic scheduling technology for real-
time communications in a wormhole network. In Second Workshop on
Parallel and Distributed Real-Time Systems, pages 194–199, 1994.
[30] B. Nikolic´, S. Tobuschat, L. S. Indrusiak, R. Ernst, and A. Burns. Real-
time analysis of priority-preemptive nocs with arbitrary buffer sizes and
router delays. Real-Time Systems, 55(1):63–105, Jan 2019.
[31] C. Paukovits and H. Kopetz. Concepts of switching in the time-
triggered network-on-chip. 2008 14th IEEE International Conference on
Embedded and Real-Time Computing Systems and Applications, pages
120–129, 2008.
[32] Y. Qian, Z. Lu, and W. Dou. Analysis of worst-case delay bounds
for best-effort communication in wormhole networks on chip. In
International Symposium on Networks-on-Chip, pages 44–53, 2009.
[33] E. A. Rambo and R. Ernst. Worst-case communication time analysis of
networks-on-chip with shared virtual channels. In Design, Automation
& Test in Europe Conference, (DATE), pages 537–542, 2015.
[34] P. Richard, J. Goossens, and S. Kato. Comments on ”gang EDF
schedulability analysis”. CoRR, http://arxiv.org/abs/1705.05798, 2017.
[35] J. B. Schmitt, F. A. Zdarsky, and I. Martinovic. Improving performance
bounds in feed-forward networks by paying multiplexing only once. In
14th GI/ITG Conference - Measurement, Modelling and Evalutation of
Computer and Communication Systems, pages 1–15, 2008.
[36] M. Schoeberl. A time-triggered network-on-chip. In FPL 2007, In-
ternational Conference on Field Programmable Logic and Applications,
Amsterdam, The Netherlands, 27-29 August 2007, pages 377–382, 2007.
[37] M. Schoeberl, S. Abbaspour, B. Akesson, N. C. Audsley, R. Capasso,
J. Garside, K. Goossens, S. Goossens, S. Hansen, R. Heckmann,
S. Hepp, B. Huber, A. Jordan, E. Kasapaki, J. Knoop, Y. Li, D. Prokesch,
W. Puffitsch, P. P. Puschner, A. Rocha, C. Silva, J. Sparsø, and A. Toc-
chi. T-CREST: time-predictable multi-core architecture for embedded
systems. Journal of Systems Architecture - Embedded Systems Design,
61(9):449–471, 2015.
[38] Z. Shi and A. Burns. Real-time communication analysis for on-chip
networks with wormhole switching. In Proceedings of the Second
ACM/IEEE International Symposium on Networks-on-Chip (NOCS),
pages 161–170, 2008.
[39] R. Stefan, A. Molnos, A. Ambrose, and K. Goossens. A tdm noc
supporting qos, multicast, and fast connection set-up. In Proceedings of
the Conference on Design, Automation and Test in Europe, DATE ’12,
pages 1283–1288, 2012.
[40] S. Tobuschat and R. Ernst. Real-time communication analysis for
networks-on-chip with backpressure. In Design, Automation & Test in
Europe Conference & Exhibition, (DATE), pages 590–595, 2017.
[41] R. Wilhelm, D. Grund, J. Reineke, M. Schlickling, M. Pister, and
C. Ferdinand. Memory hierarchies, pipelines, and buses for future
architectures in time-critical embedded systems. IEEE Trans. on CAD
of Integrated Circuits and Systems, 28(7):966–978, 2009.
[42] Q. Xiong, Z. Lu, F. Wu, and C. Xie. Real-time analysis for wormhole
noc: Revisited and revised. In 2016 International Great Lakes Sympo-
sium on VLSI (GLSVLSI), pages 75–80, May 2016.
[43] Q. Xiong, F. Wu, Z. Lu, and C. Xie. Extending real-time analysis for
wormhole nocs. IEEE Transactions on Computers, 66(9):1532–1546,
Sept 2017.
10
