nDimNoC: Real-Time D-dimensional NoC by  et al.
nDimNoC: Real-Time D-dimensional NoC
Yilian Ribot González #
CISTER Research Centre, ISEP, Polytechnic Institute of Porto, Portugal
Geoffrey Nelissen #
Eindhoven University of Technology, The Netherlands
Eduardo Tovar #
CISTER Research Centre, ISEP, Polytechnic Institute of Porto, Portugal
Abstract
The growing demand of powerful embedded systems to perform advanced functionalities led to a
large increase in the number of computation nodes integrated in Systems-on-chip (SoC). In this
context, network-on-chips (NoCs) emerged as a new standard communication infrastructure for
multi-processor SoCs (MPSoCs). In this work, we present nDimNoC, a new D-dimensional NoC
that provides real-time guarantees for systems implemented upon MPSoCs. Specifically, (1) we
propose a new router architecture and a new deflection-based routing policy that use the properties
of circulant topologies to ensure bounded worst-case communication delays, and (2) we develop a
generic worst-case communication time (WCCT) analysis for packets transmitted over nDimNoC. In
our experiments, we show that the WCCT of packets decreases when we increase the dimensionality
of the NoC using nDimNoC’s topolgy and routing policy. By implementing nDimNoC in Verilog and
synthesizing it for an FPGA platform, we show that a 3D-nDimNoC requires ≈5-times less silicon
than routers that use virtual channels (VC). We computed the maximum operating frequency of a
3D-nDimNoC with Xilinx Vivado. Increasing the number dimensions in the NoC improves WCCT
at the cost of a more complex routing logic that may result in a reduced operating clock frequency.
2012 ACM Subject Classification Computer systems organization → Real-time systems; Networks
→ Network on chip
Keywords and phrases Real-Time Embedded Systems, Systems-on-Chips, Network-on-Chips, Worst-
Case Communication Time
Digital Object Identifier 10.4230/LIPIcs.ECRTS.2021.5
Funding This work was partially supported by National Funds through FCT/MCTES (Por-
tuguese Foundation for Science and Technology), within the CISTER Research Unit (UIDP/UIDB
04234/2020); also by FCT and the ESF (European Social Fund) through the Regional Operational
Programme (ROP) Norte 2020, under PhD grant 2020.06898.BD.
1 Introduction
These days, SoCs include more and more heterogeneous processing elements that execute
dedicated functions in parallel. Traditional shared communication buses, which used to
connect all the computation nodes together, are a major performance bottleneck of modern
SoCs. Therefore, NoCs emerged as a new standard communication infrastructure for SoC as
they present a scalable and versatile solution for systems with a high level of parallelism [2, 15].
The literature on NoCs is extensive. However, real-time systems add new constraints on
the NoC infrastructures. In addition to ensure that messages arrive at their destination in
a correct fashion, real-time NoCs must guarantee that packet transmissions respect strong
timing constraints [16]. Over the years, there have been several attempts to design real-
time NoCs by considering different approaches. A large body of solutions consider a mesh
topology and rely on wormhole switching with VCs. That strategy leads to powerful NoC
infrastructures with bounded WCCT but they rely extensively on buffers and virtual channels
to provide timing guarantees. This makes them expensive to implement in terms of silicon
footprint, and increases their power consumption.
© Yilian Ribot González, Geoffrey Nelissen, and Eduardo Tovar;
licensed under Creative Commons License CC-BY 4.0
33rd Euromicro Conference on Real-Time Systems (ECRTS 2021).
Editor: Björn B. Brandenburg; Article No. 5; pp. 5:1–5:22
Leibniz International Proceedings in Informatics
Schloss Dagstuhl – Leibniz-Zentrum für Informatik, Dagstuhl Publishing, Germany
5:2 nDimNoC: Real-Time D-dimensional NoC
These last years, buffer-less NoCs have gain popularity as an alternative to VC-based
NoCs. Buffer-less NoCs are compact; their implementation cost and power consumption are
lower than traditional approaches. Therefore, they are more suitable to (embedded) systems
with area and/or power consumption constraints. In [39] and [31] two novel buffer-less
deflection-based real-time NoCs called HopliteRT and HopliteRT* were proposed. They
ensure predictable timing behaviors, accommodates dynamic workload and have an extremely
low hardware consumption footprint. Noticeably, HopliteRT* uses the characteristics of a
circulant topology to ensure bounded worst-case communication delays.
NoCs are an attractive and promising alternative for the traditional shared-buses. Yet,
most of the existing literature for real-time systems focuses on 2-Dimensional NoCs (2D-
NoCs), i.e., where routers are connected according to a mesh or torus topology for example.
However, in a non-real-time setting, Romanov [32] shows that circulant topologies possess
better characteristics over traditional mesh and torus topologies. Circulant topologies are a
type of n-dimensional topologies for networks. Thus, in this work, we explore the design of
n-dimensional NoCs architectures compatible with real-time systems requirements.
This line of research is also motivated by the recent evolution in the integrated circuit
(IC) industry. Indeed, three-dimensional integrated circuits (3D-ICs) seem to be the future
of ICs [19, 5, 20, 33, 29]. 3D-ICs achieve higher performance, while reducing average
interconnection length; provide higher packing density thanks to the added third dimension;
reduce power consumption; and enhance computation bandwidth. Hence, there is currently
a drive towards creating new powerful NoCs solutions that meet the requirements of future
large-scale MPSoCs by combining the advantages of 3D integration and NoC architecture.
Contribution. We propose nDimNoC, a new D-dimensional NoC that provides real-time
guarantees for systems implemented upon MPSoCs, reduces average network communication
latency and provides greater flexibility compared to more traditional 2D NoCs. The main
contributions of our work are: (1) to design a new buffer-less router architecture that allows
synthesizing D-dimensional NoCs; (2) to propose a new deflection-based routing policy that
uses the characteristics of D-dimensional circulant topologies to ensure bounded worst-case
communication delays; (3) to develop a generic WCCT analysis for packets transmitted over
nDimNoC; (4) to implement a 3D version of nDimNoC in Verilog (a hardware description
language) that can be instantiated on a real FPGA platform; and (5) to assess our new design
against related works in terms of computed WCCT bounds and hardware requirements.
2 Related work
Most 2D-NoC solutions rely on wormhole switching with virtual channels (VCs) (e.g.,
CONNECT [27], IDAMC [37]). In [34], Shi et al. propose an analysis of the worst-case
network latency for a new real-time fixed priority preemptive wormhole NoC in which each
priority level is assigned its own VC. Several variations of that approach were proposed over
the years [36, 7, 37, 6, 30], for instance, handling the case where several flows share the
same priority [21], changing the routing policy to EDF [25] or supporting communication
flows with different criticality levels [3, 18]. The complexity of those designs and their
routing policies led to complex WCCT analyses inspired by both the classic real-time system
theory [41, 42, 17, 26] and Network Calculus [10, 11].
In [24], a new type of NoC called SBT-NoC was proposed. In this work, Nikolic et al.
introduced a global arbitration protocol inspired by the CAN protocol. Theoretical results
are promising but this NoC solution has not been implemented in a real platform yet.
Y. Ribot, G. Nelissen, and E. Tovar 5:3
Recently, Wasly et al. in [39] proposed a new buffer-less NoC for real-time systems. Their
NoC is called HopliteRT. The design of HopliteRT ensures that the WCCT of packets is
upper-bounded. HopliteRT* is an evolution of HopliteRT proposed in [31]. It introduces a
notion of quality of service in the routing policy and uses a circulant topology in order to
improve the packets’ WCCT in comparison to HopliteRT.
In [32], various routing strategies, i.e., table routing, Clockwise routing and Adaptive
routing, were studied for two-dimensional ring circulant networks. The author shows that
several characteristics of NoCs are improved in comparison to mesh and torus topologies
when circulant graphs are used as a topological basis.
From a 3D-NoC perspective, Park et al. [28] proposed a Multi-layered on-chip Interconnect
Router Architecture (MIRA). Their approach assumes 3D processor designs (i.e., processor
cores partitioned into multiple layers), and is therefore inadequate for existing highly optimized
2D processor designs. In [9], Ghidini et al. presented a 3D-NoC mesh architecture called Lasio
relying on wormhole switching with FIFO queues. In order to minimize packet communication
latency and NoC area, Tiny 3D mesh NoC was later proposed in [22]. Tiny NoC reduces the
number of routers and links in the network by connecting multiple programming elements to
the same router. This solution minimizes the total NoC area as compared to Lasio NoC,
however, average packets latency improves only when there are few flows and/or under a low
packet injection rate. In [4], a 3D NoC architecture based on De-Bruijn graph was proposed.
Tree-based interconnect architectures have been also considered in some works [13, 14, 12].
However, they are very complex to implement due to their irregular and complex network
topologies. In [8], a NoC/Bus-based hybrid 3D architecture was proposed, but the approach
suffers from low throughput due to inefficient hybridization between the NoC and bus media.
To the best of our knowledge, none of the 3D-NoC solutions developed so far targets
real-time systems. Therefore, they do not provide guaranteed upper-bounds on the packets
WCCT, and do not come with a WCCT analysis. In this work, we develop a new real-time
D-dimensional NoC (with D ≥ 2) and its associated timing analysis.
3 System model
In this paper, we assume a system composed of N programming elements {π1, ..., πN }. Each
programming element πq is connected to a different router Rq of a D-dimensional NoC. The co-





ming element πq injects a set of nq communication flows F q = {fq1 , f
q
2 , ..., f
q
n} into the network.










D), Cj , Tj}.
A communication flow fj generates a potentially infinite number of packets that are injected









D). fj respects a minimum inter-arrival time Tj between the generation
of every two packets. Each packet sent by flow fj is divided in Cj flits that are sequentially
injected in the network. Each flit has a size Sflit (in bits). We assume that all the routing
information is encoded in each flit of the packet, i.e., there is no distinction between header,
body or tail flits. The routing information is the coordinates of the destination programming
element of the associated flow.
In the rest of this paper, we use the notations Rorig(fj) and Rdest(fj) to refer to the













5:4 nDimNoC: Real-Time D-dimensional NoC
(a) Circulant topology C(16; 1, 2, 4). (b) Equiv. 4x2x2 grid-
based 3D-network.
(c) nDimNoc router archi-
tecture.
Figure 1 nDimNoc’s topology and router architecture.
4 nDimNoC architecture
In this section, we present nDimNoC. More specifically, we describe: (1) the network topology,
(2) the router architecture, and (3) the routing policy. We later provide the timing analysis
for nDimNoC in Section 5.
4.1 NoC topology
Consider a network composed of N routers R0 to RN−1. In nDimNoC, the routers are
connected together according to a ring circulant topology C(N ; g1, g2, ..., gD) where g1 = 1, N
is the total number of routers, D indicates the dimensionality of the network, and g1, g2, ..., gD
are the generatrices of the network. We assume that the generatrices follow the following
properties: 1 = g1 < g2 < ... < gD < N , and that their values are harmonic (i.e., for any pair
of generatrices gi and gj such that i < j, gi is a divider of gj). Under the circulant topology
C(N ; 1, g2, ..., gD), all routers have D inputs I1, I2, ..., ID and D outputs O1, O2, ..., OD for
inter-routers communications. All routers are connected by a single unidirectional ring using
one of their inputs and one of their outputs (see blue line in Figure 1a). Then, each router
is also connected to the routers that are g2, g3, ..., gD hops away on the ring (see red, green
and black lines in Figure 1a). Formally stated, for each router Rq (with 0 ≤ q < N), its uth
output port Ou (1 ≤ u ≤ N) is connected to the uth input port Iu of the router R(q+gu) mod N .
A circulant network C(N ; 1, g2, ..., gD) may also be represented as a S1xS2x...xSD grid-





DD, respectively. The size of the network on each dimension can
be computed as follows S1 = NgD , S2 =
gD
gD−1








D) of a router Rq defines the position of the router Rq in the grid representation.
As an example, Fig. 1a shows the circulant network C(16; 1, 2, 4). In Fig. 1b, we provide
the equivalent representation as a 4x2x2 grid-based 3-Dimensional network of the circulant
network shown in Fig. 1a. The red, green, and blue links in Fig. 1a correspond to the red,
green, and blue links in Fig. 1b, respectively.
Y. Ribot, G. Nelissen, and E. Tovar 5:5
In the rest of this paper, we often reason about the position posq of a router Rq on the
main unidirection ring of the circulant topology. That position can be inferred from the




rqk × gD−k+1 (1)
To simplify some of our further discussions, we define the helping function dist(Rq, Rm)
as the distance between routers Rq and Rm on the main ring, i.e.,
dist(Rq, Rm) = (posm − posq + N) mod N (2)
Note that the following properties hold for circulant topologies.
▶ Property 1. Let Rl be a router at which flit p is located. After one hop on dimension
−→
Du
of the network, flit p reaches a router Rm located gD−u+1 steps further on the main ring of
the network, i.e., dist(Rl, Rm) = gD−u+1.
Finally, we define ringu(Rq) as the set of routers that are on the same ring of dimension−→
Du as Rq. That is,
ringu(Rq) = {Rl | ∀b ∈ [u + 1, D], rlb = r
q
b} (3)
As an example, let R0 be the router at coordinates (0; 0; 0) in Figure 1a, then all the
routers connected by the green links are in ring1(R0), and all the routers connected by red
links are in ring2(R0).
4.2 Router architecture
In order to reduce implementation cost in terms of hardware resources utilization and network
analysis complexity, nDimNoC does not use VCs and does not rely on extensive buffer.
As we discuss in the previous section, nDimNoC routers have D inputs (i.e., I1, I2, ...,
ID) and D output ports (i.e., O1, O2, ..., OD) connected to neighboring routers to allow for
inter-routers communication (see Fig. 1c). In addition, all routers also have D input ports
(i.e., IP E1 , IP E2 , ..., IP ED ) that may be used by the programming element to inject packets
into the network. Therefore, in total, each router has 2 × D input ports and D output ports.





network by using the input ports IP E1 , IP E2 , ...IP ED , respectively. Therefore, several packets
may be injected to different dimensions simultaneously. Thus, the waiting times suffered by
the packets inside the programming elements decreases. Indeed, in solutions that support a
single input port to inject packets into the network, all packets compete for the same input
port. In nDimNoc, however, a packet that is waiting to be injected into the network only
conflicts with the subset of packets that must be injected to the same input port IP Eu .











D) is injected in the network using port IP Eu if
and only if sju ̸= dju and ∀x | u < x ≤ D, sjx = djx.
From Property 2, we get that all the packets of a given flow will be injected using the
same input port.
The ports O1, O2,..., OD of a router are connected to the ports I1, I2,..., ID of its
neighboring routers, but also serve as inputs to the programming elements. That is, the
programming element connected to a router can reads packets from all the output ports
ECRTS 2021
5:6 nDimNoC: Real-Time D-dimensional NoC








1 ID → OD None ID → OD No contention over OD.
2 ID → O1 Any ID → O1 ID → O1 always wins.
3
Iu → Ou
None Iu → Ou No contention over Ou.
4 Iu−1 deflec-
ted to Ou
Iu → Ou+1 Flows coming from the Iu−1 and Iu ports
conflict over Ou. Iu−1 → Ou always wins
over Iu → Ou. The flow coming from the Iu





Iu → O1 No flow entering by a port on a higher dimen-
sion than Iu requests O1. Iu → O1 wins.
6 Iv>u Iu → Ou+1 A flow entering by a port on a higher dimen-
sion than Iu wins O1. The flow coming from
the Iu port is deflected to the Ou+1 port.
7
IP Eu → Ou
None IP Eu → Ou There is no flow coming from another port
that requests Ou. The flow on IP Eu is injected
in the network via Ou.




None The flow waiting on the IP Eu port conflicts
over the Ou port with flows coming from
neighboring routers. Since flows from IP Eu
have the lowest priority, the flow waiting on
the IP Eu port is not injected in the network.
O1, O2,..., OD. We show this property (i.e., that the programming element has read-access
to all output ports of the router) using the notations OP E1 , OP E2 , ..., OP ED in Fig. 1c. We
assume that a programming element can read packets from several different output ports
simultaneously. This may be done by considering that each programming element has a
FIFO queue connected to each port OP Eu (with 1 ≤ u ≤ D). We assume that those FIFO
queues are large enough to prevent back pressure in the network. Although this design
solution may lead to increased router programming logic complexity, it avoids the extra cost
of implementing expensive exit multiplexers.
We consider that each programming element has also a FIFO queue connected to each
port IP Eu . These queues store flits that are pending to be injected in the network. Note that,
the FIFO queues connected to each OP Eu and IP Eu port could be implemented in software or
hardware. Their specific implementation is irrelevant to the matter discussed in this work.
We assume that no traffic injection regulator exists at the programming elements. There-
fore, they can inject flits into the network as fast as possible. Nonetheless, we assume that
each flow fj can have a maximum of one packet in the FIFO queue pending to be injected in
the network at any moment in time. That is, only after a packet is injected, a new packet
from the same flow fj can be stored in the FIFO queue. The implicit assumption is that the
minimum inter-arrival time Tj between the generation of every two packets of fj is larger or
equal than the worst-case packet injection time wcitj of that flow, i.e., ∀fj , Tj ≥ wcitj . Note
that, the restriction is only related to the content of the FIFO queues at the injection ports
Y. Ribot, G. Nelissen, and E. Tovar 5:7
(a) Packet request to
use I1.
(b) Situation after rout-
ing arbitration and new
packet requests.
(c) Situation after rout-
ing arbitration and new
packet requests.
(d) Situation after rout-
ing arbitration.
Figure 2 nDimNoc’s routing policy example.
and does not limit the number of in-flight packets in the network. That is to say, several
packets from the same flow fj can be traveling around the network at the same time. Also
note that this assumption is less constraining than those made in many works on real-time
NoCs that assume periods larger than the worst-case communication time (of which the
injection time is just one component).
4.3 Routing policy
Consider a flit that must travel from router (0;0;0) to router (2;0;0) in the example network
of Figure 1a. It will reach its destination faster if it travels on the green link than if it hops





D3, respectively, it is equivalent to say that it is faster for the flit to travel on
a dimension of lower order. nDimNoC’s routing policy simply builds upon that property.
Additionally, it uses the idea of deflection routing [1] to avoid the cost of packet buffering.
The approach is as follows.









D). As mentioned in Section 4.2, the programming element
injects that flit on port IP Eu such that sju ̸= dju and ∀x | u < x ≤ D, sjx = djx.
If the flit was transmitted in isolation (i.e., without any interfering flow), it would
travel along the dimension −→Du of the network by entering in each router by input port Iu
and requesting output port Ou. Then, when it reaches the first router Rk with the same
coordinates rk2 , rk3 ,...,rkD as its destination (i.e., rkb = d
j
b, ∀b ∈ [2, D]), it would request the
output port Ok1 and travel along dimension
−→
D1 until reaching its destination. It results that
flits entering by input port Iu (such that 2 ≤ u ≤ D) may only request the output port Ou
or O1. Flits entering by the input port I1 may only request the output port O1.
If there is interfering traffic, nDimNoC’s routing policy allows flits to be “deflected” to
make place for “higher priority” traffic. Two such scenarios may happen:
1. If multiple flits entering by different input ports request the output port O1 at the same
time, nDimNoC always gives the highest priority to the flit that entered by the input port
with highest dimension (i.e., ID wins over ID−1, which wins over ID−2, etc.). Consider
ECRTS 2021
5:8 nDimNoC: Real-Time D-dimensional NoC
two flits entering by ports Iu and Iv such that u < v and that request output port O1.
Then, the flit entering by Iv exists through O1, and the flit entering by Iu exists through
Ou+1. We say that the flit that entered by Iu is deflected to dimension
−−−→
Du+1.
2. A flit entering by port Iu that was deflected to the output port Ou+1 may now conflict
for port Ou+1 with a flit coming from Iu+1 and that requests Ou+1 at the same time.
Under this contention scenario, the flit coming from Iu and that was deflected towards
Ou+1 wins the right to use Ou+1 and the flit coming from Iu+1 is deflected towards the
output port Ou+2.
Note that deflections redirect deflected flits on longer paths towards their destination.
However, the topology presented in Section 4.1 ensures that it still progresses towards its
destination router. Therefore, nDimNoC’s routing policy is deadlock-free and livelock-free.
Furthermore, after each deflection, a flit’s priority to request output port O1 in a future
router increases (since flits traveling on higher dimensions have higher priorities). Therefore,
its probability to be able to later travel on a shorter route increases too.
Finally, flits injected by the programming element (i.e., flits entering by any port IP Eu ),
always have the lowest priority and must wait for the respective port Ou to be free. Table 1
summarizes the routing policy of a D-dimensional nDimNoC.
Example. Consider a 4x2x2 3-dimensional nDimNoC (i.e., D = 3) (see Figure 2a-2d). Each
3D-nDimNoC router has six input ports (I1, I2, I3, IP E1 , IP E2 , and IP E3 ) and three output
ports (O1, O2, and O3). Consider also a flit of a flow fj (yellow flit in Figure 2a) with
origin and destination coordinates (0; 0; 1) and (3; 1; 0), respectively. Since sj3 ≠ d
j
3, the flit
is injected via input port IP E3 (see Figure 2a). The flit then travels along the dimension−→
D3 until it reaches router Rk with the same coordinates rk2 , rk3 ,...,rkD as its destination, i.e.,
rk2 = d
j
2 = 1 and rk3 = d
j
3 = 0 (see Figure 2a). In Rk, the flit enters by input port I3 and
requests output port O1 to travel along the dimension
−→
D1 until its destination (see Figure 2b).
According to rule 2 of nDimNoC’s routing policy (see Table 1), it has the highest priority to
use O1 and therefore enters the router (1; 1; 0) (next router to Rk on dimension
−→
D1) by its
port I1, and requests port O1 (see Figure 2b). If a flit enters by the input I2 (blue flit in
Figure 2b) and/or I3 port (pink flit in Figure 2b) and request O1 at the same time as the
yellow flit, then the yellow flit is deflected to the output port O2 (see Figure 2c and rule 6
in Table 1). Thus, it must now travel along dimension −→D2 until it reaches the same router
as it would have if it could have used the O1 port instead. Note that the yellow flit may
still suffer additional deflections to dimension −→D3 in any router it reaches while traveling
along dimension −→D2 as it is the case on Figure 2c where both a flit entering by the I1 port
(violet flit) and a flit entering by the I3 port (orange flit) request the O1 port. Then, the
request I3 → O1 wins over the other requests and the flits entering by the I1 and I2 ports
are deflected to the O2 and O3 ports, respectively (see Fig. 2d and rule 4 in Table 1).
5 Bound on the worst-case communication time
In Section 4, we presented nDimNoC’s design. In this section, we present an analysis for
the worst-case communication time (WCCT) between two processing elements connected
with nDimNoC. The WCCT of a packet is defined as the sum of the maximum amount of
time wcit during which the last flit of the packet must wait in the programming element
before to be injected into the network, and the maximum amount of time wctt taken by
Y. Ribot, G. Nelissen, and E. Tovar 5:9
any flit of the packet to traverse the network and reach its destination. We refer to those
as the worst-case injection time (wcitj) and the worst-case traversal time (wcttj) of flow fj ,
respectively. Then, the WCCT of a packet of a flow fj is defined as:
wcctj = wcitj + wcttj , (4)
5.1 Worst-case and best-case traversal time
In this section, we compute bounds on the worst- and best-case traversal time of a flit p
(abbreviated WCTT and BCTT, respectively). A bound on the WCIT is later derived in
Section 5.2.
As discussed in the previous section, a flit p of flow fj that travels through nDimNoC
can be deflected in any router on its path to its destination, but there is only a limited set
of routers in which it can actively request to change the dimension it travels along. Those




D), and (ii) every
router Rk on the path of p such that its coordinates respect rkb = d
j
b, ∀b ∈ [2, D]. We formally
denote this set of routers by R where
R = {Rk | ∀l ∈ [1, D], rkl = s
j











As will be shown later in this section, the routing decisions in the routers in R are the
only ones that must be analyzed to get a bound on the BCTT and WCTT of a flit p.
We use a directed acyclic graph (DAG) G to compute the WCTT and BCTT of a flit of
a flow fj that traverses an D-dimensional nDimNoC. A DAG G = (V, E) is formed by a set
of vertices V and a set of edges E. Each edge e ∈ E connects two vertices u and v in E. We
note e = (u, v). Each edge is assigned a weight w(u, v).
The DAG compactly represents all the routes that the flit p may potentially follow (from
its origin to its destination) when it traverses nDimNoC. Each vertex v in the DAG G
represents one input port of a router in R. Let R(u) and I(u) be the router and the input
port of the router represented by vertex u in the graph, then we note u = (R(u), I(u)). Each
edge e = (u, v) ∈ E connecting vertices u and v represents a possible path taken by the flit
from input port I(u) of router R(u) to the input port I(v) of another router R(v) on its
path. The weight of the edge e = (u, v) is the maximum number of hops from I(u) to I(v)
according to that path. Additionally, we label each edge e = (u, v) with the specific output
port taken by the flit in router R(u).
Example. Figure 3 shows the DAG of the example of Section 4.3 (Figure 2). It shows
the potential paths that a flit of a flow fj may follow from the origin router (0; 0; 1) to the
destination router (3; 1; 0) when it traverses a 4x2x2 3-dimensional nDimNoC. The source
vertex v0 at level 0 of the graph represents the input port IP E3 of the origin router Rs at
which the flit is injected by the programming element. Since the flit p is injected by IP E3 ,
it can only exist by output port O3 of Rs. Rk is the first router p reaches after leaving Rs
where p may request output port O1. Flit p may only enter Rk by the input port I3. Vertex
v1 on Level 1 represents input port I3 of Rk. The weight w1 is the number of hops the flit p
does from the O3 port of Rs to the I3 port of Rk. In router Rk, the flit enters by the I3 port
and requests the O1 port to travel along dimension
−→
D1. According to rule 2 of nDimNoC’s
routing policy (see Table 1), the routing decision for that request is always I3 → O1 (because
any flit entering by the I3 port has the highest priority to use the O1 port in a 3-dimensional
ECRTS 2021
5:10 nDimNoC: Real-Time D-dimensional NoC
Figure 3 DAG of the potential trajectories that a flit may take from the origin (0; 0; 1) to the
destination (3; 1; 0) when it traverses a 4x2x2 3-D nDimNoC.
nDimNoC). Therefore, p reaches router Rk+1 in one hop, and certainly enters Rk+1 by port
I1. Since Rk+1 is also in R, it is represented by vertex v2. According to Table 1, two different
routing decisions may be taken in Rk+1: (1) p is routed to the O1 port if there is no conflict
over O1 (see rule 5 in Table 1); or (2) p is deflected to the O2 port if there is one or more
flows coming from other ports that request the O1 port at the same time as p (see rule 6
in Table 1). If situation (1) happens (i.e., the flit under analysis is routed to the O1 port),
it enters the router Rk+2 using port I1. We represent the I1 port of Rk+2 as vertex v3 in
Level 3. If situation (2) occurs (i.e., the flit is deflected to the O2 port), it may enter router
Rk+2 from: (1) the I2 port if it suffer no further deflection to reaching Rk+2 (see rule 3 in
Table 1) or (2) the I3 port if it suffers more deflections on its path to Rk+2 (see rule 4 in
Table 1). We represent the I2 and I3 ports of Rk+2 as vertices v4 and v5 in the level 3 of the
graph, respectively. Considering the potential routing decisions to which the flit p may be
subjected after it enters Rk+2 by the ports I1, I2 or I3, p may reach its destination router
Rd by input ports I1, I2 or I3 (vertices v6, v7, and v8 on Level 4) in a maximum number of
hops represented by the weights of the edges connecting the vertices of level 3 to those of
level 4. Note that, the flit will always be received by the programming element regardless of
the routing decision taken in the destination router.
After building the graph G as exemplified above, the WCTT of flit p is the longest
weighted path in graph G, and its BCTT is the shortest weighted path in G. For the example
of Figure 3, the WCTT is thus equal to 8 and corresponds to the case where the flit p follows
the path represented by vertices v0, v1, v2, v4 and v8. Similarly, taking the shortest weighted
path, we get that the BCTT of p is 4 in that example. Note that the WCTT may not always
be obtained when the flit experiences its maximum number of deflection, hence the need for
building the full graph G.
Y. Ribot, G. Nelissen, and E. Tovar 5:11
Following the reasoning above, the graph G can systematically be built using Algorithm 1.
Algorithm 1 uses Lemmas 1 to 5 to compute the set of input and output ports to which the
flit p may be routed in each router in R, and to compute the weight of each edge. We now
present and prove those lemmas.
In the following, we denote by Rcur and Rnext any two routers in R such that Rnext is
the first router in R reached by p after leaving Rcur.
▶ Lemma 1. A flit p of a flow fj that enters router Rcur by port IP Eu will be routed to the
output port Ou.
Proof. By rule 7 of nDimNoC’s routing policy (Table 1). ◀
Algorithm 1 Building the DAG of the potential trajectories.
Input: flow fj ;
Output: V , E;
1 V ← ∅; E ← ∅;
2 Build set R according to Equation (5);
3 Rcur ← source router of fj ;
4 Iu ← input port by which fj is injected in its source router according to Property 2;
5 Create vertex vcur = (Rcur, Iu);
6 V ← V ∪ {vcur};
7 ΓI ← {Iu};
8 ΓInew ← ∅;
9 while Rcur is not the destination router of fj do
10 Rnext ← first router in R reached by any flit of fj after it leaves Rcur;
11 foreach Icur ∈ ΓI do
12 vcur ← get vertex (Rcur, Icur) in V ;
13 ΓO ← Set of output ports to which the flit may be routed if it enters Rcur by the
input port Icur ; // use Lemmas 1 and 2
14 foreach Ocur ∈ ΓO do
15 ΓInext ← Set of input ports by which the flit may enter Rnext if it exits Rcur by
the output port Ocur ; // use Lemmas 3 and 4
16 foreach Inext ∈ ΓInext do
17 if Inext /∈ ΓInew then
18 ΓInew ← ΓInew ∪ {Inext} ;
19 Create vertex vnext = (Rnext, Inext);
20 V ← V ∪ {vnext};
21 else
22 vnext ← get vertex (Rnext, Inext) in V ;
23 end
24 Create edge e = (vcur, vnext) with weight hRcur→RnextOcur→Inext ; // use Lemma 5




29 ΓI ← ΓInew ;
30 ΓInew ← ∅ ;
31 Rcur ← Rnext;
32 end
ECRTS 2021
5:12 nDimNoC: Real-Time D-dimensional NoC
▶ Lemma 2. A flit p that enters router Rcur by port Iu may be routed to any of the output
ports belonging to the set ΓOu , such that
ΓOu =
{
{O1} when u = D
{O1} ∪ {Ou+1} when u ̸= D
(6)
Proof. According to rule 2 in Table 1, a flit entering by the ID port has the highest priority
to use the O1 port and will never be deflected to any other output port. This proves the first
case of Equation (6). If the flit enters the router by an input port Iu such that u < D and
requests output port O1, two scenarios may happen according to Table 1: (1) it wins port
O1 (see rule 5 in Table 1), or (2) it is deflected to port Ou+1 (see rule 6 in Table 1). This
proves the second case of Equation (6). ◀
▶ Lemma 3. Let Ou be the output port by which flit p exits the router Rcur. If Rnext is only
one hop further on dimension −→Du, then flit p enters Rnext by its port Iu.
Proof. Since, according to the network topology defined in Section 4.1, the output port Ou
of Rcur is connected to the input port Iu of Rnext, and because flit p exits Rcur by port Ou,
the lemma follows. ◀
▶ Lemma 4. Let Ou be the output port by which flit p exits the router Rcur. If Rnext is
more than one hop away from Rcur on dimension
−→
Du, then the flit p will enter Rnext by one
of the input ports belonging to the set ΓIu, such that,
ΓIu = {Iv | u ≤ v ≤ D} (7)
Proof. If Rnext is more than one hop away from Rcur on dimension
−→
Du, flit p must hop
through at least one other router between Rcur and Rnext. First, we note that by definition,
Rnext is the first router after Rcur on flit p’s path where p may request output port O1.
Thus, according to nDimNoC’s routing policy (Section 4.3), p may only continue to travel
along the same dimension (see rule 3 in Table 1) or be deflected to a higher order dimension
while traveling between Rcur and Rnext (see rule 4 in Table 1).
If no deflection happens in the routers located between Rcur and Rnext, flit p will enter
Rnext by input port Iu. However, according to rule 4 of nDimNoC’s routing policy (Table 1),
if u < D, the flit p may also be deflected to dimension −−−→Du+1 in one of those intermediate
routers. If no further deflection happen until reaching Rnext, p will then enter Rnext by the
input port Iu+1. Yet, if u + 1 < D, Table 1 states that the flit p may still be deflected to
dimension −−−→Du+2 while traveling along dimension
−−−→
Du+1. Repeating this reasoning, we get
that flit p may enter Rnext by any input port Iv such that u ≤ v ≤ D. ◀
▶ Lemma 5. The maximum number of hops from the output port Ou of Rcur to the input
port Iv of Rnext (with u ≤ v) is upper bounded by
hRcur→RnextOu→Iv = (v − u) +










and poscur and posnext are computed with Equation (1).
Y. Ribot, G. Nelissen, and E. Tovar 5:13
Proof. According to nDimNoC’s routing policy and following the same explanation as in
the proof of Lemma 4, a flit that exits Rcur by port Ou and enters Rnext by port Iv must
have been deflected exactly (v − u) times.
According to Property 1, flit p bypasses gD−k+1 routers on the main ring of the network
with every hop it does on dimension −→Dk. Because, by definition of our circulant topology,
we have gD−k+1 > gD−k for all k, the flit p will make its maximum number of hops when it
suffers its (v − u) deflections as early as possible and thus travels as long as possible along
the highest order dimension, i.e., along −→Dv.







k=u gD−k+1 routers on the network’s main ring. Thus, after the (v − u)
initial hops, the flit reaches router Rcur′ situated
∑v−1
k=u gD−k+1 steps further than Rcur on
the main ring. That is, the position of Rcur′ on the main ring is given by Equation (9).
Since the network contain N routers on its main ring, the router R′cur and Rnext are still
(posnext − poscur′ + N) mod N steps away from each other on that ring. However, since the
flit p only travels along dimension −→Dv after it reached R′cur, it bypasses gD−v+1 routers of





hops from port Ov of





additional hops to reach Rnext, hence proving Equation (8). ◀
▶ Corollary 6. If u = v, then hRcur→RnextOu→Iv is an exact bound on the number of hops between
the output port Ou of Rcur and the input port Iv of Rnext.
Proof. According to nDimNoC’s routing policy, any deflection of a flit p between Rcur and
Rnext would result in p entering Rnext by an input port Iv such that v > u. Therefore, if
u = v, flit p must not have suffered any deflection and must have travel along dimension −→Du
only. Because (posnext − poscur + N) mod N is the distance between Rcur and Rnext on the
main ring of the network, and because for every hop on dimension −→Du, packet p bypasses
gD−u+1 routers on the main ring (by Property 1), we have that p reaches Rnext in exactly
(posnext−poscur+N) mod N
gD−v+1
hops. Note that this last equation is equal to hRcur→RnextOu→Iv when
v = u, which proves the claim. ◀
We now prove that the WCTT and BCTT of a flit of flow fj are bounded by the longest
and the shortest weighted path of the graph G returned by Algorithm 1, respectively. To do
so, we first prove that the graph G built using Algorithm 1 contains all routes that may be
taken by packets of flow fj between its origin and destination.
▶ Lemma 7. The DAG built using Algorithm 1 contains one edge for each possible path
between any two routers in R that may be successively traversed by any flit of flow fj.
Proof. Algorithm 1 iterates over all routers in R that are on the path of fj from its origin
to its destination router (Lines 3, 9, 10 and 31). For each pair of routers Rcur, Rnext, the
algorithm computes the set ΓI of all input ports by which fj may enter Rcur. For each such
input, it uses Lemmas 1 and 2 to compute the set ΓO of all output ports by which fj may exit
Rcur (Line 13). For each output port Ocur ∈ ΓO, it then uses Lemmas 3 and 4 to compute
the set ΓInext of all input ports by which fj may enter Rnext (Line 15). It finally creates an
edge for every path between Ocur and the input ports in ΓInext (Line 24). Since Lemmas 1 to
4 were all proven correct, we have that Algorithm 1 creates an edge for every possible path
between any two routers Rcur and Rnext in R, i.e., one edge for any combination of output
and input port of Rcur and Rnext that may be successively traversed by a packet of fj . ◀
ECRTS 2021
5:14 nDimNoC: Real-Time D-dimensional NoC
Lemma 7 has the following corollary as direct consequence.
▶ Corollary 8. The DAG built using Algorithm 1 contains all possible paths taken by flow fj
from its origin to its destination router.
▶ Theorem 9. The longest weighted path of the DAG built with Algorithm 1 is an upper
bound on the WCTT of any flit of flow fj.
Proof. By Lemma 7 and Corollary 8, the DAG built with Algorithm 1 contains all possible
routes from the origin to the destination of fj encoded as a different path in the graph.
Furthermore, by Lemma 5 and Line 24 of Algorithm 1, the weight of every edge in the graph
is an upper bound on the number of hops on the longest path between the output and input
ports of the two routers represented by the vertices connected by that edge. Thus, the longest
weighted path in the graph is an upper bound on the number of hops between all routers on
the path of fj from its origin to its destination. This proves the Theorem. ◀
▶ Theorem 10. The shortest weighted path of the DAG built with Algorithm 1 is the BCTT
of any flit of flow fj.
Proof. According to nDimNoC’s routing policy and its discussion in Section 4.3, a flit p of
flow fj performs its minimum number of hops between its origin and destination router when
it does not suffer any deflection.
Since the DAG built with Algorithm 1 contains all possible routes from the origin to the
destination of fj encoded as a different path in the graph (by Lemma 7 and Corollary 8), it
also contains the path where the flit of fj does not suffer any deflection. Furthermore, by
Corollary 6 and Line 24 of Algorithm 1, the weight of every edge corresponding to a path
where p does not suffer deflection is equal to the exact number of hops performed by p on
that path. Therefore, the shortest weighted path in the graph is an exact bound on the
BCTT of fj from its origin to its destination. This proves the Theorem. ◀
5.2 Worst-case injection time
In the previous section, we explained how to compute bounds on the BCTT and WCTT of
any flit of a packet injected by a flow fj . In this section, we derive a bound on the worst-case
injection time WCIT of any packet of fj (see Theorem 12).
First, we recall a bound on the maximum number of packets that may be injected in the
network by any flow fj . This bound was already proven in [31].
▶ Lemma 11. In any time interval of length ∆t, the flow fj can inject in the network at










Proof. The proof is similar to that of the maximum workload that can be executed by a task
with minimum inter-arrival time Tj and release jitter wcitj . The complete proof is provided
in Lemma 14 of [31]. ◀
Then, we prove an upper bound on the WCIT of any packet of fj using Theorem 12.
To prove that theorem, we use the following notation: flow fj is injected in router Rinj via
input port IP Einj (i.e., Rinj is the origin router of fj). We define Finj as the set of all flows
(including fj) injected in the same input port IP Einj of the same router Rinj as fj . Note that
this set of flows is a property of the system and thus we assume that it is given as an input
to the analysis. We also define Γconfinj as the set of all flows originating from other routers
than Rinj and that may conflict with the injection of flow fj in router Rinj . The content of
Γconfinj is computed using Lemmas 13 and 17 proven later in this section.
Y. Ribot, G. Nelissen, and E. Tovar 5:15
▶ Theorem 12. The WCIT wcitj of any packet of flow fj is given by the smallest positive





 − 2 + ∑
∀fl∈Γconfinj
λl(wcitj + 1 + Jl) (10)
where Jl = wctt′l − bctt′l is the difference between the worst-case and the best-case traversal
time of flow fl until router Rinj (computed with Theorems 9 and 10).
Proof. Let p be the last flit of any packet of flow fj . According to nDimNoC’s routing policy,
the flit p will be injected in the network as soon as : 1) all flits ahead of p in the FIFO queue
of the input port by which it is injected in the network have been injected, and 2) there is
one clock cycle during which no packet entering Rinj from other input ports conflicts for the
same output port as p.
Let nflits be the maximum number of flits ahead of p in the FIFO queue, and let Wu(∆t)
be the maximum number of flits entering into Rinj by another input port than p and
requesting the same output port as p in a time interval of length ∆t. Then, conditions 1)
and 2) are met as soon as
∆t + 1 ≥ nflits + Wu(∆t + 1) (11)
for ∆t ≥ 0. The solution to Equation (11) is thus equal to the WCIT wcitj of flow fj .
To solve the above equation, we first derive a bound on nflits, and then derive a bound
on Wu(∆t + 1).
Bound on nflits: Section 4.2 explains that each flow in Finj may have at most one
packet in the FIFO queue of the input port by which they are injected in the network.
Therefore, in the worst-case scenario, there is one packet of each other flow injected via the
same port IP Einj as fj ahead of p in the FIFO queue. Since p is the last flit of fj ’s packet,





 − 1 (12)
Bound on Wu(∆t + 1): Let fl be a flow entering the router Rinj by another input
port than p but requesting the same output port as p (i.e., fl ∈ ΓCinj). In the best and worst
case scenarios, a flit from fl takes bctt′l and wctt′l clock cycles to reach Rinj , respectively.
Therefore, the first flit of fl that may conflict with the injection of p must have been injected
no earlier than wctt′l clock cycles before the beginning of the period during which p is
interfered with. Conversely, the last flit of fl that may conflict with the injection of p must
have been injected no later than bctt′l clock cycles before the end of the interference with p.
Therefore, the length of the interval during which fl may inject flits that conflict with the
injection of p is
∆t + 1 + wctt′l − bctt′l = ∆t + 1 + Jl. (13)
Lemma 11 states that a flow fl may inject at most λl(∆t + 1 + Jl) flits in that time interval.
Therefore, the total number of flits from all conflicting flows fl ∈ ΓCinj is upper bounded as
Wu(∆t + 1) ≤
∑
∀fl∈ΓCinj
λl(∆t + 1 + Jl) (14)
Injecting Equations (12) and (14) in Equation (11), we prove the lemma. ◀
ECRTS 2021
5:16 nDimNoC: Real-Time D-dimensional NoC
Theorem 12 requires to know the set of all flows ΓCinj that may conflict with the injection
of fj in router Rinj . The content of that set depends on the specific output port requested
by the packets of fj . Lemmas 13 and 17 below provide a means to compute the content of
ΓCinj when fj request output port O1 or Ou (with u ̸= 1), respectively.
▶ Lemma 13. The set of flows coming from other routers than Rinj and that may be routed
to output port O1 of Rinj is given by
Γ1inj = {fl | ∀b ∈ [2, D], dlb = r
inj
b ∧ dist(Rorig(fl), Rinj) ≤ dist(Rorig(fl), Rdest(fl))} (15)
Proof. According to nDimNoC’s routing policy, a flow fl may request the output port O1
of Rinj only if (i) fl may hop through router Rinj , and (ii) ∀b > 1, dlb = r
inj
b . Condi-
tion (i) requires that Rinj is located between the origin and destination router of fl, i.e.,
dist(Rorig(fl), Rinj) ≤ dist(Rorig(fl), Rdest(fl)). Combining both (i) and (ii) proves the
lemma. ◀
To compute the set ΓCinj for the case where fj requests output port Ou (with u ̸= 1), we
must first prove some intermediate results using Lemmas 14 to 16.
To prove those lemmas, we define Fuinj as the set of all flows that may enter router Rinj
by input port Iu. That set can easily be built by checking all paths that may be taken by
each flow in the network according to Table 1. All those that have at least one path in which
they enter Rinj by input port Iu are then added to the set Fuinj .
▶ Lemma 14. The set of flows that may enter Rinj by input port Iu and request output port
Ou is given by Γu→uinj = Fuinj \ {fl | ∀b ∈ [2, D], r
inj
b = dlb}
Proof. According to Table 1, a flow entering by input port Iu and requesting output port O1
cannot be routed to output port Ou (only to O1 or Ou+1) (see rules 2, 5, and 6 in Table 1).
Therefore, the set of flows entering by Iu that may be routed to Ou is the set of flows that
enters Rinj by Iu (i.e., Fuinj) minus those that request O1, i.e., all the flows fl that have
a destination such that ∀b ∈ [2, D], rinjb = dlb (see routing policy explained in Section 4.3).
This proves the lemma. ◀
▶ Lemma 15. Let defq be a boolean equal to true if a deflection may happen in router Rq,
and equal to false otherwise. Then, we have
defq =
{
true if ∃u, v with u ̸= v | ∃fl ∈ Fuq , ∃fm ∈ Fvq s.t. l ̸= m ∧ ∀b ∈ [2, D], dlb = dmb = rqb
false otherwise.
(16)
Proof. According to Table 1, a deflection may happen in router Rq only if at least two
different flows compete to access the output port O1. For that situation to happen, there must
exist at least two different flows fl and fm entering by two different input ports (i.e., ∃u, v
with u ̸= v | ∃fl ∈ Fuq , ∃fm ∈ Fvq s.t. l ̸= m) that both request output port O1. According
to the routing policy explained in Section 4.3, this happens only if ∀b ∈ [2, D], dlb = dmb = r
q
b .
This proves the lemma. ◀
▶ Lemma 16. The set of flows that may enter Rinj by input port Iu−1 and be routed to
output port Ou is given by
Γu−1→uinj =
{
Fu−1inj if definj = true
∅ if definj = false
(17)
Y. Ribot, G. Nelissen, and E. Tovar 5:17
Table 2 Resources utilization of different NoCs in Kirtex-7 FPGAs.
NoC LUTs % Resource utilization of the platform
8x8 ProNoC 100000 20%-150%
8x8 IDAMC 83000 18%-127%
8x8 CONNECT 96000 20%-147%
8x8 HopliteRT* 5632 1.1%-8.5%
4x4x4 3D-nDimNoC 18560 3.9%-28%
Proof. According to Table 1, all flows that enter router Rinj by input port Iu−1 (i.e., those in
Fu−1inj ) may be deflected to output port Ou (see rules 4 and 6 in Table 1). Thus, the set of flows
entering by Iu−1 and routed to Ou is given by all flows in Fu−1inj if a deflection may happen
in Rinj , i.e., if definj = true. If no deflection may happen in Rinj (i.e., definj = false), then
Table 1 states that none of the flows entering by Iu−1 may be routed to Ou. This proves
both cases of Equation (17). ◀
▶ Lemma 17. The set of flows coming from other routers than Rinj and that may be routed
to output port Ou of Rinj (with u ̸= 1) is given by
Γuinj = Γu→uinj ∪ Γu−1→uinj (18)
Proof. According to Table 1, only flows that enter a router by its input ports Iu or Iu−1
can be routed to output port Ou. Since, according to Lemmas 14 and 16, Γu→uinj and Γu−1→uinj
contain all the flows entering Rinj by input ports Iu and Iu−1, respectively, that may be
routed to output port Ou, their union contains all flows that may come from other routers
than Rinj and may be routed to output port Ou of Rinj . ◀
The content of ΓCinj is thus equal to Γ1inj if fj requests output port O1 (Lemma 13), and
to Γuinj if it requests any other output port (Lemma 17). Using ΓCinj in Theorem 12 we can
now bound the WCIT of fj .
6 Experimental results
6.1 Implementation of nDimNoC
We implemented a 3D-nDimNoC with the hardware description language Verilog. We
synthesized a single router of 3D-nDimNoC for flits of 64 bits. The target platform was a
Xilinx Virtex-7 485T FPGA. It required 290 LUTs and 202 Flip-Flops (FFs) in total. This
corresponds to only 0.1% and 0.03% of the total number of LUTs and FFs available in the
target FPGA, respectively.
We compared the hardware cost of a 3D-nDimNoC with HopliteRT* [31], as well as
to some other NoCs based on virtual channels (VCs): ProNoC [23], IDAMC [37], and
CONNECT [27]. The target platform was a Xilinx Kirtex-7 FPGA. Kirtex-7 is a mid-range
family of FPGAs that contains approximately between 65,600 and 477,760 LUTs depending
on which one you pick. Table 2 shows the synthesis results. A ProNoC router with two
VCs required 1574 LUTs, a HopliteRT* router required 88 LUTs, and according to [38]
and [27], an IDAMC and a CONNECT router require approximately 1300 and 1500 LUTs,
ECRTS 2021
5:18 nDimNoC: Real-Time D-dimensional NoC
(a) Max WCTT NoC. (b) Average WCTT NoC. (c) Max WCIT NoC.
(d) Average WCIT NoC. (e) Max WCCT NoC. (f) Avg WCCT NoC.
Figure 4 Experimental results for a random traffic pattern.
respectively. Then, as reported in Table 2, an 8x8 ProNoC, IDAMC, and CONNECT NoCs
require ≈100,000, ≈83,000, and ≈96,000 LUTs, respectively, eating up a big portion (if not
all in some cases) of the logic available in the FPGA. This leaves limited resources available
for any computation logic. Therefore, those solutions are not really suitable for systems
implemented over mid-range FPGAs. On the other hand, a 4x4x4 3D-nDimNoC requires
18,560 LUTs, i.e., between 3.9% to 28% of the Xilinx Kirtex-7 resources. It is three times
more expensive than HopliteRT* (which requires 5632 LUTs) but approximately 5-times
cheaper than ProNoC, IDAMC, and CONNECT NoCs in terms of LUTs utilization. We
thus conclude that 3D-nDimNoC is a suitable solution for such FPGA platforms.
Finally, we connected the nDimNoC router to a Microblaze soft-core and synthesized a
4x2x2 3D-network for a Virtex-7 485T using Xilinx Vivado. We computed the maximum
operating frequency of the network with Xilinx Vivado. We obtained ≈210 MHz for a 4x2x2
3D-nDimNoC against ≈275 MHz for an equivalent 4x4 HopliteRT* NoC. This degradation in
terms of maximum operating frequency may be explained by the fact that (1) an nDimNoC
router requires more complex logic to route packets from its input to its output ports, and
(2) the additinal dimensions increase the number of wires between routers, which increases
the complexity of the placement and routing during the logic synthesis.
6.2 Analyses results
In this section, we provide experimental results by computing the WCTT, WCIT, and WCCT
of sets of communication flows that traverse NoCs of different dimensionalities.
As a starting point, we generated sets of communication flows for a 16x16 2D-NoC
according to a random traffic pattern. The origin and destination coordinates of each flow
were randomly generated using a uniform probability distribution. The number of flits
of packets released by a communication flow was randomly chosen between 1 and 5, and
their inter-arrival times were generated as in [36]. Then, we made a one-to-one mapping
Y. Ribot, G. Nelissen, and E. Tovar 5:19
of the routers in the 16x16 2D-NoC to the routers of a 4x8x8 3D-nDimNoC, a 4x4x4x4
4D-nDimNoC, a 2x2x4x4x4 5D-nDimNoC, and a 2x2x2x2x4x4 6D-nDimNoC. The origin
and destination of each flow were accordingly updated for each network topology.
In Figs. 4a and 4b, we show the maximum and average packets WCTT for an increasing
number of flows in NoCs of different dimensionalities. The results were computed by using
the analysis of HopliteRT [39, 40] and HopliteRT* [31] (assuming a 16x16 2D-NoC), and the
analysis presented in Section 5.1 for the 2D, 3D, 4D, 5D and 6D-nDimNoC topology. To
establish a fair comparison, we assume one priority level (i.e., all flows were assigned the
highest priority) for the analysis proposed in [31]. Each point in the plot is the result of 100
repetitions (100 different random flow sets). We varied the number of generated flows from
10 to 300 by steps of 10.
In 4a, we observe that the maximum WCTT is slightly worse with nDimNoC as compared
to HopliteRT*. Nonetheless, Fig 4b) shows that the average traversal time improves with
nDimNoC as the dimensionality of the network increases. This can easily be explained by the
fact that new routes, possibly shorter and faster, are made available between pairs of routers
when a new dimension is added to the network. Moreover, the number of interfering flows,
and therefore, the number of deflections that flows may suffer on each link decreases since
the number of routers on each dimension decreases. Note that, the average packets WCTT is
reduced by ≈40% and ≈60% with a 5D-nDimNoC and a 6D-nDimNoC, respectively, against
HopliteRT*. We also show that the maximum and average worst-case traversal times are
noticeably reduced with nDimNoC as compared to HopliteRT.
In Fig. 4c and 4d, we show the maximum and average WCIT of flows using the analysis
of HopliteRT* and nDimNoC. We also computed the maximum and average WCIT by using
the analysis of HopliteRT, but we do not show them on the graphs as they are extremely
pessimistic and would render the plots unreadable by cluttering all other lines together.
As shown, the packets see their WCIT drastically reduced in nDimNoC in comparison to
HopliteRT*. This is expected since nDimNoC allows the programming element connected to
a router to inject packets simultaneously via as many input ports as there are dimensions
in the network. A router of HopliteRT*, on the other hand, can inject at most one flit per
cycle in the network (on either of the router output ports). Furthermore, the number of
communication flows that may interfere with the injection of a packet at a router decreases
since more routes are available in the network, and thus less traffic uses each individual route.
In Fig. 4e and 4f, we show the maximum and average packets WCCT (which we recall to
be equal to the sum of the WCTT and WCIT of those packets). We varied the number of
generated flows from 10 to 100 by steps of 10. The results were obtained by using the analysis
of nDimNoC, and the analyses proposed in [31] and [21] for HopliteRT* and a VC-based
real-time NoC, respectively. The analysis presented in [21] by Liu et al. is an improved
analysis of that proposed in [36, 35] by Shi and Burns. To establish a fair comparison, we
assume one VC (i.e., one priority level) for the analysis presented in [21]. As shown in
Figure 4e, for almost all configurations, the WCCT returned by the analysis of nDimNoC
outperforms that returned by the analysis of [31] and [21]. The average WCCT is only better
with the analysis by Liu et al. when the network is completely underloaded and very few
flows are traversing the network (i.e., less than 30 flows). Note also that, [21] considers
that each flow may only have one packet of each flow traveling across the network at the
same time, whilst nDimNoC supports the transmission of several packets from the same
communication flow simultaneously.
The average WCCT with nDimNoC improves when the network’s dimensionality increases
and is barely impacted by the number of flows. Therefore, we conclude that increasing the
dimensionality of nDimNoC has a positive impact from an average performance perspective
for a limited impact on the worst-case performance of the flows.
ECRTS 2021
5:20 nDimNoC: Real-Time D-dimensional NoC
7 Summary and conclusion
In this paper, we presented nDimNoC, a new and flexible real-time D-dimensional NoC
that uses the properties of circulant topologies to provide real-time guarantees to the flows
transmitted over that NoC. We proposed a timing analysis for nDimNoC. We also did
a complete implementation of 3D-nDimNoC in HDL Verilog. Experimental results show
improvements in terms of network communication latency in comparison to existing 2D
solutions.
References
1 P. Baran. On distributed communications networks. IEEE Transactions on Communications
Systems, 12(1):1–9, 1964. doi:10.1109/TCOM.1964.1088883.
2 L. Benini and G. De Micheli. Networks on chip: a new paradigm for systems on chip design.
In Design, Automation and Test in Europe Conference and Exhibition, pages 418–419, March
2002.
3 Alan Burns, James Harbin, and Leandro Soares Indrusiak. A wormhole NoC protocol for
mixed criticality systems. In IEEE Real-Time Systems Symposium, pages 184–195, 2014.
4 Yiou Chen, Jianhao Hu, Xiang Ling, and Tingting Huang. A novel 3d noc architecture based
on de bruijn graph. Computers & Electrical Engineering, 38(3):801–810, 2012.
5 Shamik Das, Andy Fan, Kuan-Neng Chen, Chuan Seng Tan, Nisha Checka, and Rafael
Reif. Technology, performance, and computer-aided design of three-dimensional integrated
circuits. In Proceedings of the 2004 International Symposium on Physical Design, ISPD
’04, page 108?115, New York, NY, USA, 2004. Association for Computing Machinery. doi:
10.1145/981066.981091.
6 Dakshina Dasari, Borislav Nikoli’c, Vincent N’elis, and Stefan M Petters. NoC contention
analysis using a branch-and-prune algorithm. ACM Transactions on Embedded Computing
Systems, 13(3s):113, 2014.
7 Jonas Diemer, Jonas Rox, Mircea Negrean, Steffen Stein, and Rolf Ernst. Real-time communic-
ation analysis for networks with two-stage arbitration. In 9th ACM International Conference
on Embedded Software. IEEE, 2011.
8 Feihui Li, C. Nicopoulos, T. Richardson, Yuan Xie, V. Narayanan, and M. Kandemir. Design
and management of 3d chip multiprocessors using network-in-memory. In 33rd International
Symposium on Computer Architecture (ISCA’06), pages 130–141, 2006. doi:10.1109/ISCA.
2006.18.
9 Yan Ghidini, Thais Webber, Edson Moreno, Fernando Grando, Rubem Fagundes, and César
Marcon. Buffer depth and traffic influence on 3d nocs performance. In 2012 23rd IEEE
International Symposium on Rapid System Prototyping (RSP), pages 9–15. IEEE, 2012.
10 Frédéric Giroudot and Ahlem Mifdaoui. Buffer-aware worst-case timing analysis of wormhole
NoCs using network calculus. In IEEE Real-Time and Embedded Technology and Applications
Symposium, 2018.
11 Frederic Giroudot and Ahlem Mifdaoui. Tightness and computation assessment of worst-case
delay bounds in wormhole networks-on-chip. In 27th International Conference on Real-Time
Networks and Systems, 2019.
12 C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh. A scalable communication-centric soc
interconnect architecture. In International Symposium on Signals, Circuits and Systems.
Proceedings, SCS 2003. (Cat. No.03EX720), pages 343–348, 2004. doi:10.1109/ISQED.2004.
1283698.
13 R. I. Greenberg and Lee Guan. An improved analytical model for wormhole routed networks
with application to butterfly fat-trees. In Proceedings of the 1997 International Conference
on Parallel Processing (Cat. No.97TB100162), pages 44–48, 1997. doi:10.1109/ICPP.1997.
622554.
Y. Ribot, G. Nelissen, and E. Tovar 5:21
14 P. Guerrier and A. Greiner. A generic architecture for on-chip packet-switched interconnections.
In Proceedings Design, Automation and Test in Europe Conference and Exhibition 2000 (Cat.
No. PR00537), pages 250–256, 2000. doi:10.1109/DATE.2000.840047.
15 Jörg Henkel, Wayne Wolf, and Srimat Chakradhar. On-chip networks: A scalable,
communication-centric embedded system design paradigm. In 17th International Conference
on VLSI Design. IEEE, 2004.
16 S. Hesham, J. Rettkowski, D. Goehringer, and M. A. Abd El Ghany. Survey on real-time
networks-on-chip. IEEE Transactions on Parallel and Distributed Systems, 28(5):1500–1517,
May 2017. doi:10.1109/TPDS.2016.2623619.
17 Leandro Soares Indrusiak, Alan Burns, and Borislav Nikolić. Buffer-aware bounds to multi-
point progressive blocking in priority-preemptive nocs. In 2018 Design, Automation & Test in
Europe Conference & Exhibition (DATE), pages 219–224. IEEE, 2018.
18 Leandro Soares Indrusiak, James Harbin, and Alan Burns. Average and worst-case latency
improvements in mixed-criticality wormhole networks-on-chip. In 27th Euromicro Conference
on Real-Time Systems. IEEE, 2015.
19 J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl. A stochastic global net-length distribution for
a three-dimensional system-on-a-chip (3d-soc). In Proceedings 14th Annual IEEE International
ASIC/SOC Conference (IEEE Cat. No.01TH8558), pages 147–151, 2001. doi:10.1109/ASIC.
2001.954688.
20 C. C. Liu, I. Ganusov, M. Burtscher, and Sandip Tiwari. Bridging the processor-memory
performance gap with 3d ic technology. IEEE Design Test of Computers, 22(6):556–564, 2005.
doi:10.1109/MDT.2005.134.
21 Meng Liu, Matthias Becker, Moris Behnam, and Thomas Nolte. Tighter time analysis for
real-time traffic in on-chip networks with shared priorities. In 10th IEEE/ACM International
Symposium on Networks-on-Chip, 2016.
22 César Marcon, Ramon Fernandes, Rodrigo Cataldo, Fernando Grando, Thais Webber, Ana
Benso, and Letícia B Poehls. Tiny noc: A 3d mesh topology with router channel optimization
for area and latency minimization. In 2014 27th International Conference on VLSI Design
and 2014 13th International Conference on Embedded Systems, pages 228–233. IEEE, 2014.
23 Alireza Monemi, Jia Tang, Maurizio Palesi, and Muhammad Nadzir Marsono. ProNoC: A
low latency network-on-chip based many-core system-on-chip prototyping platform. Micropro-
cessors and Microsystems, 54, September 2017. doi:10.1016/j.micpro.2017.08.007.
24 B. Nikolic, Robin Hofmann, and R. Ernst. Slot-based transmission protocol for real-time nocs
- sbt-noc. In ECRTS, 2019.
25 B. Nikolić and S. M. Petters. Edf as an arbitration policy for wormhole-switched priority-
preemptive nocs-myth or fact? In International Conference on Embedded Software, pages
1–10, October 2014.
26 Borislav Nikolić, Sebastian Tobuschat, Leandro Soares Indrusiak, Rolf Ernst, and Alan Burns.
Real-time analysis of priority-preemptive nocs with arbitrary buffer sizes and router delays.
Real-Time Systems, 55(1):63–105, 2019.
27 M. K. Papamichael and J. C. Hoe. CONNECT: Re-examining conventional wisdom for
designing Nocs in the context of FPGAs. In ACM/SIGDA International Symposium on Field
Programmable Gate Arrays, FPGA ’12, pages 37–46, New York, NY, USA, 2012. ACM.
28 D. Park, S. Eachempati, R. Das, A. K. Mishra, Y. Xie, N. Vijaykrishnan, and C. R. Das. Mira:
A multi-layered on-chip interconnect router architecture. In 2008 International Symposium on
Computer Architecture, pages 251–261, 2008. doi:10.1109/ISCA.2008.13.
29 Vasilis F Pavlidis, Ioannis Savidis, and Eby G Friedman. Three-dimensional integrated circuit
design. Newnes, 2017.
30 Eberle A Rambo and Rolf Ernst. Worst-case communication time analysis of networks-on-chip
with shared virtual channels. In Design, Automation & Test in Europe Conference & Exhibition,
2015.
ECRTS 2021
5:22 nDimNoC: Real-Time D-dimensional NoC
31 Y. Ribot González and G. Nelissen. Hoplitert*: Real-time noc for fpga. IEEE Transactions
on Computer-Aided Design of Integrated Circuits and Systems, 39(11):3650–3661, 2020. doi:
10.1109/TCAD.2020.3012748.
32 Aleksandr Yu Romanov. Development of routing algorithms in networks-on-chip based on
ring circulant topologies. Heliyon, 5(4):e01516, 2019.
33 Abbas Sheibanyrad, Frédéric Pétrot, Axel Jantsch, et al. 3D integration for NoC-based SoC
Architectures. Springer, 2011.
34 Zheng Shi and Alan Burns. Real-time communication analysis for on-chip networks with
wormhole switching. In Second ACM/IEEE International Symposium on Networks-on-Chip,
2008.
35 Zheng Shi and Alan Burns. Improvement of schedulability analysis with a priority share policy
in on-chip networks. In 17th International Conference on Real-Time and Network Systems,
pages 75–84, 2009.
36 Zheng Shi and Alan Burns. Real-time communication analysis with a priority share policy in
on-chip networks. In 21st Euromicro Conference on Real-Time Systems, pages 3–12. IEEE,
2009.
37 S. Tobuschat, P. Axer, R. Ernst, and J. Diemer. IDAMC: A NoC for mixed criticality systems.
In IEEE 19th International Conference on Embedded and Real-Time Computing Systems and
Applications, 2013. doi:10.1109/RTCSA.2013.6732214.
38 Sebestian Tobuschat. Predictable and Runtime-Adaptable Network-On-Chip for Mixed-critical
Real-time Systems. PhD thesis, TU Braunschweig, 2019.
39 S. Wasly, R. Pellizzoni, and N. Kapre. HopliteRT: An efficient FPGA NoC for real-time
applications. In International Conference on Field Programmable Technology, pages 64–71,
December 2017.
40 Saud Wasly, Rodolfo Pellizzoni, and Nachiket Kapre. Worst case latency analysis for hoplite
FPGA-based NoC. Technical report, University of Waterloo, 2017.
41 Qin Xiong, Zhonghai Lu, Fei Wu, and Changsheng Xie. Real-time analysis for wormhole noc:
Revisited and revised. In Proceedings of the 26th edition on Great Lakes Symposium on VLSI,
pages 75–80, 2016.
42 Qin Xiong, Fei Wu, Zhonghai Lu, and Changsheng Xie. Extending real-time analysis for
wormhole nocs. IEEE Transactions on Computers, 66(9):1532–1546, 2017.
