Runtime connection-oriented guaranteed-bandwidth network-on-chip with extra multicast communication service by Samman, Faizal Arya
Microprocessors and Microsystems 38 (2014) 170–181Contents lists available at ScienceDirect
Microprocessors and Microsystems
journal homepage: www.elsevier .com/locate /micproRuntime connection-oriented guaranteed-bandwidth network-on-chip
with extra multicast communication service0141-9331/$ - see front matter  2013 Elsevier B.V. All rights reserved.
http://dx.doi.org/10.1016/j.micpro.2013.07.006
⇑ Tel.: +62 411588111.
E-mail address: faizalas@unhas.ac.idFaizal Arya Samman ⇑
Universitas Hasanuddin, Fakultas Teknik, Jurusan Teknik Elektro, Jl. Perintis Kemerdekaan Km. 10, Makassar 90245, Indonesiaa r t i c l e i n f o
Article history:
Available online 3 September 2013
Keywords:
Network-on-chip
Wormhole Cut-Through Switching
Quality of service
Guaranteed-bandwidth service
Best-effort service
Connection-oriented protocol
Multicast communicationa b s t r a c t
This paper presents a ﬂexible runtime connection-oriented guaranteed-bandwidth Network on Chip
(NoC). Comparing with a standard time-division multiplexing (TDM) method, our local ID-based method
provides better ﬂexibility to establish dynamic runtime connections. A speciﬁc pre-designed algorithm
for ﬁnding a conﬂict-free scheduling, as commonly used in the TDM-based method, is not needed. The
contention problem is solved with the hardware solution based on the locally organized message identity
(ID), in which ﬂits belonging to the same stream packet will have the same unique/local identity-tag (ID-
tag) on each communication link. The ID-tags of each stream will vary locally over communication links
and are updated. The updating is organized by ID-tag mapping management units. The routing is orga-
nized using runtime programmable routing reservation table. In addition, the proposed methodology
supports also a deadlock-free multicast routing service.
 2013 Elsevier B.V. All rights reserved.1. Introduction
On-chip interconnection networks are an interesting alternative
communication infrastructure to bring a new paradigm to design
and develop a Multiprocessor System-on-Chip (MPSoC) and Chip-
level Multiprocessor (CMP) systems. Bus systems, which is tradi-
tionally used as communication fabrics for SoC systems, tend to
perform bottleneck, especially if they are loaded with a large num-
ber of high-speed processing elements. Instead of using the tradi-
tional bus systems as communication media among processing
element (PE) units, on-chip interconnection networks is proposed,
reﬂecting a concept of scalable shared communication media.
Several multimedia applications for MPSoCs consist of some
communication edges that must be performed with a certain com-
munication bandwidth. Video/audio streaming data transmission
from a core to one or to many cores needs a constant transmission
rate. Performance degradation at one the communication edges
could reduce the overall application performance or even could
break the multimedia applications. Hence, a speciﬁc service of
the network is required to guarantee the data bandwidth of the vi-
deo/audio streaming. This paper will present a methodology on
how the bandwidths of unicast and multicast communication
edges in an on-chip radio system application benchmark [9] can
be well guaranteed. The methodology proposes a runtime virtual
cicuit conﬁguration technique, where connections are establishedduring application execution time. Moreover, the technique sup-
ports also a runtime multicast communication service.
Guaranteed-bandwidth (GB) service can be implemented by
using end-to-end connection establishment technique, where a
stream header reserves required bandwidth during connection set-
up phase before sending the video/audio streaming. By further
applying a policy where every network link cannot be consumed
by considered trafﬁc exceeding its maximum capacity, then
(long-term) saturated network condition can be avoided (non-
blocking trafﬁc ﬂow is guaranteed). Guaranteed-service can be
implemented by allowing multiple packets to share the same link.
The link sharing can be realized by using a data multiplexing tech-
nique. The shared link conﬁguration in a network is also commonly
called as switched virtual circuit (SVC) conﬁguration. However, the
guaranteed-bandwidth method is only suitable for NoC-based
multicore processor systems, where processors intensively sending
and receiving streaming data that require expected constant end-
to-end communication rates.
This paper proposes one of the SVC method by using dynamic
local identity (ID) assignment technique. By using this technique,
a message is not split into packets, but it is split into ﬂow control
digits (ﬂits) with extra bit ﬁelds for dynamic unique/local ID and
ﬂit-type control label per ﬂit. The idea results in a novel routing
paradigm, where the network routes ﬂits instead of packets
(‘‘routes ﬂits not packets’’).
The rest of the paper is organized as follows. Section 2 will
present the state-of-the-arts of the switched virtual circuit
conﬁguration methods. In this case, we compare our method with
TDMA-based data multiplexing technique. Section 3 presents the
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 171contributions of the paper. The basic infrastructure required to
implement the proposed method is described in Section 4, i.e.
the required standard packet format and the microarchitecture.
Section 5 describes the connection-oriented multicast communica-
tion protocol. The ID-based routing mechanism dan the ID Man-
agement Scheme are described in detail in Section 6. Section 7
shows an experimental result. In this section, we evaluate our data
switched virtual circuit conﬁguration method on a radio system
application benchmark from Nokia [9]. The synthesis results of
the NoCs are presented in Section 8. Brief discussions about the
implementation issues of the multiple access methods are pre-
sented in Section 9. Section 10 concludes the work and presents
the main advantageous features of the proposed method.2. State-of-the-art of data multiplexing techniques for NoCs
2.1. NoCs with TDMA technique
A commonly used method to provide guaranteed-service for
NoCs is a pipeline circuit switching based on the Time-Division
Multiple Access (TDMA) method. Æthereal [12], Asynchronous
clockless MANGO NoC [1], circuit switched PNoC [4], Sonics [19],
DSPIN [11], Nostrum [10] and a NoC SMT Switch [8] are NoC exam-
ples that use such methodology. Fig. 1(a) presents the conceptional
view of the TDMA method. The link connecting the input and out-
put port of the routers is shared by four packets, i.e. stream packet
A, B, C and D. Each packet establishes a virtual circuit conﬁguration
based on time slots allocation on the outgoing port. In the ﬁgure,
we assume that the link has 8 time slots. The more time slots are
allocated for a packet, the more bandwidth (BW) it reserves. Thus,
the packets A, D, B and C reserve 50%, 25%, 12.5% and 12.5% of the
maximum link BW capacity, respectively. A packet allocated at
time slot St on a link must be allocated to time slot St+1 on the next
link. Based on Fig. 1(a) for instance, packet D allocated to time slots
S1 and S2 on the link must be allocated to time slots S2 and S3 on the
next link.2.2. NoCs with IDMA technique
In this section, we introduce a concept based on local identity
(ID) division multiple access (IDMA) technique. Fig. 1(b) presents
the concept, in which local ID slots can be reserved by single data
stream as its ID-tag. The local ID tag appears on every ﬂit and is up-
dated every time the data stream acquires the next link. Flits
belonging to the same stream will always have the same local ID.
In order to guarantee a correct routing function, an ID Management
Unit must index every reserved ID slot by identifying the previous
ID tag of the stream/message, which reserves one ID slot and from
which port the stream/message comes.Fig. 1. State-of-the-art of the data mBased on the Fig. 1(b), for instance, packet D is allocated to local
ID slot (IDN) number 1 (its new ID-tag), and is identiﬁed by the ID
Slot Table as a packet from input port 5 having previous ID tag 0 in
the router R1. The message D reserved also 25% of the maximum
link bandwidth capacity (Bmax). In the next router R2, the stream/
messages are routed based on their current/new ID-tags. Thus,
the packet D ﬂows from output Port 1 (East) of the router R1 to
the input Port 3 (West) of the next router R2 with new ID-tag 1.
The number of available ID slots reﬂects the maximum number
of stream/messages allowed to form switched virtual circuit con-
ﬁgurations on the link. The bandwidth can be guaranteed by fur-
ther implementing a connection-oriented communication
protocol, where the requested BW attached on a header ﬂit bit
ﬁelds is used to reserve the expected end-to-end communication
bandwidth over the network links.
2.3. Comparisons of the multiple access methods
The TDMA-based switching requires a pre-design time-slot allo-
cation algorithm to achieve a conﬂict-free routing and scheduling.
UMARS + algorithms [3], TDM-based Virtual Circuit Conﬁguration
(VCC) Method [9], and a time-slot allocation algorithm made for
lSpidergon NoC [2] are the NoC examples that use the time-slot
allocation algorithm. The IDMA-based method does not need such
time-slot allocation algorithms, because the local ID slot on each
outgoing link is reserved and allocated autonomously by header
ﬂits of a streaming data during application execution time (ﬂexible
runtime autonomous switched virtual circuit reconﬁguration). The
same technique could be certainly applied to the TDMA-based
method, but the probability in which the header ﬂit fails to estab-
lish connection is very high especially in a very high trafﬁc situa-
tion. The need for the time-slot allocation algorithm in the
TDMA-based method is due to the conﬂict-free requirement.
A NoC design methodology that is based on three kernels, i.e.
trafﬁc classiﬁcation, ﬂit-based switching and path pre-assignment
and link-BW setting has been introduced [7]. The trafﬁc are classi-
ﬁed into guaranteed-latency (GL), guaranteed-bandwidth (GB) and
best-effort (BE) trafﬁc. The GL trafﬁc have stringent maximum de-
lay requirement from data injection until data acceptance. The GB
trafﬁc requires constant end-to-end communication bandwidth,
while the BE trafﬁc does not have bandwidth requirement neither
stringent data transfer latency. The link allocation (path assign-
ments) for the GL and GB trafﬁc is static or computed off-line at de-
sign time [7]. For a new application, the path assignment must be
done again at design time. Therefore, the proposed methodology is
not suitable for application mapping, where the applications are
known after chip-manufacturing.
An extra bit ﬁeld for dynamic local/unique ID-tag and an extra
ﬁeld to identify the ﬂit type of each ﬂit are attached on each ﬂit.
The extra bit ﬁelds are useful to guarantee that ﬂits belonging toultiplexing techniques for NoCs.
172 F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181the same packet will have the same local ID-tag on every local
communication channel. The following advantages can be then
achieved.
 The concept of Wormhole Cut-Through Switching can be intro-
duced, where wormhole packets can be interleaved ﬂit-by-ﬂit
or can cut-through at ﬂit-level, resulting in interesting and
unique performance characteristics [16] compared to tradi-
tional wormhole switching method.
 The concept of Hold–Release Tagging Mechanism for Multicast
Scheduling Policy can be introduced [13,14,17], which theoreti-
cally results in a unique solution for a runtime deadlock-free
multicast routing method [15].
 The concept of Flexible Runtime Connection-Oriented Guaranteed-
Bandwidth for Quality-of-Service can be introduced, which is
explained in this paper, where the switched virtual circuit con-
ﬁguration can be made during application execution time.
There are also other multiple access methods for NoCs that have
been introduced so far such as Spatial-Division Multiple Access
(SDMA) [6] and Code-DivisionMultiple Access (CDMA) [18] methods.
However, since both methods are minor in the NoC research area,
we do not discussed thesemethods and compare them in this paper.
3. Contribution
The state-of-the-arts of the switched virtual circuit conﬁguration
methods or commonly called as multiple access methods are pre-
sented in this paper. The multiple access technique based on the lo-
cally organizedmessage identity (ID) formulticast-enabledNoC has
been introduced in our previous works [13,14]. However, the previ-
ous works implement the IDMA technique by using a best-effort
communication protocols without guaranteed-bandwidth service.
In the case of runout of available ID slots, the previous works must
allow packet dropping mechanism to avoid packets stall. Hence,
retransmission of data should be undertaken when such situation
occurs. This mechanismwill be time-consuming. Therefore, this pa-
per introduces a concept of guaranteed-bandwidth multicast NoC
with runtime connection-oriented communication protocol.
By using the connection-oriented protocol, themessageswill not
be sent to the NoC before a virtual circuit (connection), which guar-
antees the expected average end-to-end data throughput has been
successfully established. The advantages of using such concept are
(1) the packet dropping will never happen, (2) guaranteed-band-
width service can be further implemented, and thus (3) network
will not be saturated, since the considered trafﬁc are not allowed
to consume a link exceeding its maximum bandwidth capacity.
A simple data transfer technique by applying local addresses
(labels) has been also presented [5]. However, the local labels pre-
sented in the work must be pre-computed at design time for each
trafﬁc in an application by using a path-based labeling algorithm.
The routing paths for a target application is also statically com-
puted by using a precount procedure to count the maximum num-
ber of the used communication pairs. We propose a runtime
(dynamic) local message identity (ID) technique, which offers a
ﬂexibility without a pre-processing algorithm. Our runtime dy-
namic local labeling method will be suitable for post-manufacture
MPSoC application mappings, where in general applications are
preliminary unknown before SoC fabrications.
4. Basic Infrastructure for the IDMA method
4.1. Packet format
Fig. 2(a) shows a data stream (message), which consists of many
ﬂow control digits (ﬂits). Each ﬂit comprises of a 7-bit control ﬁeldand data/header information in 32-bit data word (Note: Message
in our NoC is not partitioned into several packets). Fig. 2(b) shows
the detail packet format used in the NoC. Each ﬂit can be identiﬁed
through its 3-bit ﬂit type ﬁeld as stream header (head), a stream
databody (dbod), a tail (tail) and response/status (resp) ﬂit and its
4-bit local ID-tag. The source and target addresses of router nodes
are attached respectively on the header ﬂits. The ‘‘ReqBW’’ infor-
mation is attached on the headers and is used for BW reservation
(BW allocation). The ‘‘ReqBW’’ information attached on the tail ﬂit
is used for BW deallocation.
A unicast message will have only one header ﬂit, while a multi-
cast message for N number of target/destination nodes contains N
number of header ﬂits. As shown in Fig. 2(b), the multicast mes-
sage has N number of header ﬂits for N number of multicast desti-
nation nodes, i.e. ðXtk ;Ytk Þ; k 2 f1;2; . . . ;Ng, where each header is
for each target/destination node. The Ext ﬁeld is extension bits that
can be used for other requirements in the future. Each target node
k will send back a response ﬂit with ID-tag number ‘‘1111’’ to in-
form the status of the guaranteed-bandwidth connection made
by a header ﬂit for target node ðXtk ;Ytk Þ. Table 1 shows the binary
encoding of the ﬂit types.
Deﬁnition 4.1. Each ﬂit coming from input port n can be formal-
ized as Fn(tag, type), where type = {head,dbod, tail,resp} and
tag = {0, 1, 2, . . . , Nslot  1}, Nslot is the amount number of available
ID slots per link. Because there are 4 bits for ID-tag ﬁeld then we
will have Nslot = 24 = 16.
Beside routing information, i.e. source (Xs, Ys) and target
ðXtk ;Ytk Þ addresses), requested communication bandwidth (Req-
BW) information is attached to each header ﬂit. The 12 least-signif-
icant bits of the header and tail ﬂit ﬁelds are used to identify the
requested communication bandwidth (ReqBW) of the data
stream/packet. The ReqBW information attached in a tail ﬂit is used
to remove the bandwidth reservations. Header and tail ﬂits, which
belong to the same multicast packet, will have equal ReqBW value.
If a header is routed to a new outgoing link, then an amount of
bandwidth (BW) (equal to ReqBW value) and an ID slot are re-
served for the data stream/packet. The other header ﬂits belonging
to the same message with the previous header ﬂit will not reserve
again new BW space and new ID slot, when they enter the same
outgoing link.
The response ﬂit type as shown in Fig. 2(b) contains a connection
status ﬁeld. This ﬁeld is used to indicate the status of the connection
made by the header ﬂit, whether the connection establishment is
successful or unsuccessful. The response ﬂit assigned with ID-tag
‘‘1111’’ is used to sent back a connection status from a target node
ðXtk ;Ytk Þ to a source node (Xs, Ys). If the connection establishment is
successful, i.e. ID slot and a number of ReqBW bandwidth are suc-
cessfully reserved on the routing paths from the receiver node
ðXtk ;Ytk Þ to the sender node (Xs, Ys), then payload/databody ﬂits
can be injected from the sender node with a constant injection rate
of ReqBW as attached in the header ﬂit. The tail ﬂit will remove the
BW and ID slot reservation after the end of the data injection.
4.2. Generic microarchitecture
The generic microarchitecture of our NoC router (switch) is pre-
sented in Fig. 3. The ﬁgure overviews the router’s components, the
data paths and control paths connected to the router’s compo-
nents. The descriptions of the microarchitecture are given in this
subsection.
Deﬁnition 4.2. The set of input–output port is NIO = {1, 2, . . . , N},
where N is the total number of ports. The port number is deﬁned as
n 2 NIO.
Fig. 2. A data streaming and the NoC packet format.
Table 1
Flit types encoding for BE and GB packet services.
Hex Binary Flit type
0 ‘‘000’’ Not data
4 ‘‘100’’ Header for GB streams
5 ‘‘101’’ Databody for GB streams
6 ‘‘110’’ Tail ﬂit for GB streams
7 ‘‘111’’ Response/status ﬂit
Fig. 3. XHiNoC’s generic microarchitecture.
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 173In our current microarchitecture, we have N = 5 such that
n 2 NIO = {1, 2, 3, 4, 5}. In general, the router consists of 4 main
components located in every input and output port n. At every in-
put port n, there are 2 components, i.e. a FIFO buffer (Fn) and a Rout-
ing Engine with Data Buffering (REB) ðRnÞ. In the current
microarchitecture, a static XY routing algorithm is implemented
in the REB unit. At each output port n, there are also 2 components,
i.e. aMultiplexor with ID Management (IDM) Unit (MIM) (Mn) and an
arbiter unit (An).Deﬁnition 4.3. Set of components at each input and output ports n
is deﬁned as Wn ¼ fFn;Rn;Mn;Ang. *NIO = 5, the total set of
components in a router is
S5
n¼1Wn.4.2.1. Buffer slots
The depth of the FIFO buffers in our current microarchitecture is
set to 2 registers only. Beside implementing buffer slots in the FIFO
buffer, we also implement a single buffer slot in the REB (Routing
Engine with Buffer) unit. The purpose of the buffer slot insertion is
to improve router performance. By inserting the buffer slot, each
data is delayed for only one stage, i.e. output arbitration stage,
Since the data is buffered soon after a routing direction has been
computed, the data can be switched out to an output port next cy-
cle after output arbitration stage. Without the buffer slot, the data
must be delayed for two stages before making the output switch or
link traversal stage, i.e. route compute stage and output arbitration
stage.
4.2.2. Routing engine
Beside the single buffer slot, a runtime-programmable routing
reservation table and a routing state machine are implemented
in the REB unit. In other words, the XHiNoC’s routing engine com-
bines a routing look-up table and a routing machine. This combina-
tion allows us to establish connection at runtime, which is
certainly also enabled by the attachment of the extra bit ﬁelds
for dynamic local ID and ﬂit type label on every ﬂit of streaming
data.
In our current router microarchitecture, we use static/deter-
ministic XY (X-First) routing algorithm, which is implemented in
the routing state machine. Flits of stream packets will be ﬁrstly
routed to X-direction (east or west), then routed to Y-direction
(north or south).
4.2.3. Routing and arbitration control signals
Beside datapaths, there are also two control paths, i.e. routing
request signal and arbitration signal paths, which are intercon-
174 F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181nected in the crossbar switch of the router. The paths are used to
control and synchronize the data ﬂows and data switching.
The formal deﬁnition of the routing request signals is explained
as follows.
Deﬁnition 4.4. r(i, j) is single-bit routing request signal from Ri to
Aj, where i,j 2 NIO = {1, 2, 3, 4, 5}. *r(i, j) is a digital (binary) signal,
then r(i, j) 2 {0, 1}.
According to Deﬁnition 4.4, we can further deﬁne that the rout-
ing request binary vector from input port i to the 5 output ports is
r(i) = [r(i, 1) r(i, 2) r(i, 3) r(i, 4) r(i, 5)]. For example, when a ﬂit from
input port 2 requests a routing to output port 4, then we have
r(2, 4) = 1, or the routing request vector is r(2) = [00010]. When
the request from the input port 2 is a multicast request to output
ports 1, 3 and 4 for instance, then the routing request vector is
r(2) = [10110].
Another control path used in the router is the routing arbitra-
tion path, which is generated by the arbiter unit and sent to the
REB unit to grant a routing request. The formal deﬁnition of the
routing arbitration signals is explained as follows.
Deﬁnition 4.5. a(i, j) is single-bit arbitration signal from Aj to Ri,
where i,j 2 NIO = {1,2,3,4,5} as a feedback signal to grant the
routing signal r(i, j). *a(i, j) is a digital (binary) signal, then
a(i, j) 2 {0,1}.
When the REB component at input port i sends routing request
signal r(i, j) to all arbiter units, then it will receive a routing
acknowledge signal a(i, j) from the arbiter units. If a data ﬂit is buf-
fered in the FIFO, it will be then routed and buffered again in the
REB unit. The routing signal is sent to the arbiter unit located at
the requested output port. If the REB unit receives a routing
acknowledge signal from the arbiter unit, it will be switched out
to the outgoing link in the next cycle.
According to Deﬁnition 4.5, we can further deﬁne a routing
arbitration control vector from output port j as
a(j) = [a(1, j) a(2, j) a(3, j) a(4, j) a(5, j)]T. For example, when a ﬂit
from input port 2 requests a routing to output port 4, then the arbi-
ter unit will generate an arbitration control signal a(2,4) = 1, or the
arbitration control vector is a(4) = [01000]T. When there are more
than one request to the output port 4, for example 3 requests from
input ports 1, 2 and 5, then the arbiter at the output port 4 will suc-
cessively generate 3 arbitration control vector, i.e a(4) = [00001]T
for request from input port 1, a(4) = [01000]T for request from in-
put port 2 and a(4) = [00001]T for request from input port 5. The
arbitration (input selection) will be rotated by the arbiter unit
ﬂit-by-ﬂit, port-by-port. Note: Only one element of the arbitration
vector can be set to 1. Otherwise the ﬂits will conﬂict and the rou-
ted data ﬂit will not valid.
Based on the microarchitecture and control vectors presented in
this section, our XHiNoC router can switch 5 ﬂits from 5 different
input–output ports in parallel. In general, the formal description
of the control path matrices for the routing and arbitration control
signals is presented in Deﬁnition 4.6.
Deﬁnition 4.6. Based on Deﬁnitions 4.4 and 4.5, a time-dependent
routing request matrix R and an arbitration matrix A can also be
deﬁned in the following:
R ¼
rð1;1Þ rð1;2Þ rð1;3Þ rð1;4Þ rð1;5Þ
rð2;1Þ rð2;2Þ rð2;3Þ rð2;4Þ rð2;5Þ
rð3;1Þ rð3;2Þ rð3;3Þ rð3;4Þ rð3;5Þ
rð4;1Þ rð4;2Þ rð4;3Þ rð4;4Þ rð4;5Þ
rð5;1Þ rð5;2Þ rð5;3Þ rð5;4Þ rð5;5Þ
0
BBBBBB@
1
CCCCCCA
ð1ÞA ¼
að1;1Þ að1;2Þ að1;3Þ að1;4Þ að1;5Þ
að2;1Þ að2;2Þ að2;3Þ að2;4Þ að2;5Þ
að3;1Þ að3;2Þ að3;3Þ að3;4Þ að3;5Þ
að4;1Þ að4;2Þ að4;3Þ að4;4Þ að4;5Þ
að5;1Þ að5;2Þ að5;3Þ að5;4Þ að5;5Þ
0
BBBBBB@
1
CCCCCCA
ð2Þ5. Connection-oriented multicast protocol
The runtime connection-oriented guaranteed-bandwidth multi-
cast routing protocol implemented in our NoC consists of four main
phases, i.e. connection establishment, connection status response,
data transmission and connection termination (see Fig. 4).
5.1. Connection establishment
The ﬁrst phase is connection establishment, where the data pro-
ducer node sends a request of a data transmission by injecting
multicast header ﬂits one-by-one to multiple destination nodes.
If a data producer core in the NoC will send a multicast packet to
T number of data consumer cores, then T number of header ﬂits
must be injected one-by-one by the data producer core, where
each header brings information about the address of each data con-
sumer core and the requested communication BW (ReqBW).
As presented in Fig. 4(a), core A at node (1,2) sends 3 header
ﬂits as a request for multicast connection setup. Each of them
(h1,h2 and h3) is sent to destination cores at node (1,1), (2,2) and
(3,1), respectively. As shown in the ﬁgure, for example h3 header
ﬂows from core A to core F through the network and reserves ID
slots 2, 3 and 1 in the links ‘3, ‘4 and ‘14, respectively.
In our current NoC, the ID-tag ﬁeld is 4 bits, resulting in 16
available local ID slots per link. In this case M is set to ‘b01111’
or ‘0xF’. Every destination node j can detect the ID-tag of the
incoming header ﬂit hj to analyze if the connection setup is suc-
cessful or not. If on a certain link, a header ﬂit of the stream packet
A fails to reserve an ID slot, then the header ﬂit will be allocated to
ID slotM = 15 (or use the ID-tag numberM=‘b01111’). Hence, ID tag
15 (b01111) is reserved as an escaping tag, where the failed header
ﬂits will use this tag, and normal header ﬂits should not be allo-
cated to this ID slot number.
The BW reservation can be unsuccessful because of 2 reasons,
i.e. because no more available (free) ID slot on the link ‘j (case 1),
or free BW space, which can meet the expected communication
BW, is not sufﬁcient (case 2). Once a header ﬂit is allocated to ID
slot M, then it will always be routed afterwards with ID slot M
starting from the link on which it fails until it reaches the destina-
tion node.
5.2. Connection status response
The second step is the response phase, where every destination
node that receives a header ﬂit hj from the source node will analyze
the header ﬂit to know if the header has successfully established
the multicast connection or failed. Afterwards, the processor or
hardware core at destination node j sends back a response ﬂit sj.
The data consumer node informs the data producer node about
the status of the connection establishment made previously by
the header hj (See Fig. 4(b)), by writing any information (special
code) on the response ﬂit. As shown in the ﬁgure, all status re-
sponse ﬂits ﬂow through the network with ID-tag number 15
(‘b01111’ or ‘0xF’). The successful connection will implicate that
the expected data rate or communication bandwidth (BW) of the
multicast stream packet can be guaranteed.
As presented in Fig. 4(b), every target node, i.e. core D at node
(1,1), core B at node (2,2) and core F at node (3,1) receives a header
Fig. 4. Connection-oriented multicast routing protocol.
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 175ﬂit h1, h2 and h3, respectively. Afterwards, each target node ana-
lyzes the accepted header ﬂit and sends back a response ﬂit to
the source core A, where the response ﬂits ﬂow through the NoC
links with ID-tag M = 15. The response ﬂit indicates the successful
of the connection establishment.5.3. Data transmission
The third phase is the data transmission. If the producer has
known that the multicast connection is set up successfully and
the bandwidth on each reserved communication link is guaran-
teed, then it will send the multicast stream packet into the NoC
through the same path set up previously by the header ﬂits as pre-
sented in Fig. 4(c). The payload data ﬂits are sent with the same ID-
tag with the previously routed header, such that the data ﬂits can
track the paths conﬁgured by the headers.
If one of the multicast connections is not successfully estab-
lished, then the producer node will ﬁrstly terminate the multicast
connection by sending a tail ﬂit. Afterwards, it will inject again new
header ﬂits to establish new connections. However, this runtimeconnection-oriented communications can potentially cause a live-
lock situation when a connection is never established.
In order to guarantee a livelock freedom from such situation, a
new protocol must be implemented in the network interface or in
the application layers. The protocol could for example limit the
number of attempts to repeat the connection setup. For instance,
In the application layer, the software protocol will only make 5 at-
tempts to establish connection. Afterwards, a node that has made 5
attempts and always fails to established connection could broad-
cast a special header ﬂits for all nodes. The special header ﬂits
could be indicated for example through the ‘‘ReqBW’’ ﬁeld, which
contains all-zero-bit.
After receiving the special header ﬂits, the other nodes could for
example renew and reduce their allocated bandwidth. Hence, the
failed node could use the free BW space, although the free BW
space is probably lower than its requirement. This decision should
be wisely made in the software protocol layer. When the BWs of all
communications are scaled down, the application performance will
degrade, but probably the application could still run well enough.
In the case of ID slot run out, all nodes could also divide their
streams into some partitions. Thus, the stream of the failed node
176 F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181will also have a chance to use a free ID slot for some cycles, and re-
lease again or share the ID slot for other stream ﬂows. This chal-
lenging issue could also be solved with other solutions. Since this
issue is suitable implemented on application layers, we will not
deeply discuss it in this paper.5.4. Connection termination for successful connection establishment
In the fourth phase as presented in Fig. 4(d), the tail ﬂit is in-
jected to close or terminate the multicast connection from the
source to target node at the end of the data sending phase. The tail
ﬂit will remove the bandwidth allocation and ID slot reservation of
the considered stream packet, which has been allocated and re-
served previously by the considered header ﬂit.5.5. Connection termination for unsuccessful connection establishment
In the previous subsection, connection termination mechanism
for the case where the connection establishment is successful has
been explained. In the case of unsuccessful connection establish-
ment, a speciﬁc mechanism is also provided. When a header ﬂit
fails to reserve a free ID-slot or an expected BW space in an inter-
mediate node between the source and target node. The failed head-
er ﬂit will always use ID-tag 15 starting from the intermediate
node until the target node without making a new ID-tag and rout-
ing table reservations.
After the failed header ﬂit reaches the target node and its ID-tag
ﬁeld is indicated with ID-tag number 15 (it means that it fails to
establish connection), the target node will send a response ﬂit to
inform the source node that the connection setup fails. This infor-
mation can be obtained from the binary label attached on the con-
nection status ﬁeld in the response ﬂit. After the response ﬂit
reaches the source node, the source node will send a tail ﬂit to tear
down the connection from the source node only until the interme-
diate node. Afterward, the tail ﬂit will be dropped at the considered
outgoing port in the intermediate node. A few cycles after sending
the tail ﬂit, the source node can start sending a new header ﬂit to
establish again a new connection.6. ID-based routing and ID management
Every stream packet reserves one ID slot as its ID-tag on each
communication link, and ﬂits belonging to the same message will
have the same unique ID tag on the same link. In our NoC router,
there are 5 input–output ports, i.e. East(E), North(N), West(W),
South(S) and Local(L), which are assigned to port number 1, 2, 3,
4 and 5, respectively. The ID-based routing and ID management
mechanism can be divided into several procedures as follows.
1. ID-tag updating at the output port made by a header ﬂit (Sec-
tion 6.1, Fig. 5).
2. Routing table reservation at the input port made by a header ﬂit
(Section 6.2, Fig. 5).
3. ID-tag indexing made by data body (payload) ﬂits (probably by
a header ﬂit) at the output port (Section 6.3, Fig. 6).
4. Routing table indexing made by data body (payload) ﬂits at the
input port (Section 6.4, Fig. 6).
5. ID-tag and routing table terminations made by tail ﬂits at the
input and output ports (Section 6.5).
Routing table reservation (Section 6.2) and routing table index-
ing (Section 6.4) are parts of the ID-based routing mechanism.
While ID-tag updating (Section 6.1) and ID-tag indexing (Sec-
tion 6.3) are parts of the ID management mechanism.6.1. ID-tag updating
This mechanism is visually presented in Fig. 5. At the East out-
put port of the router node R1, a header ﬂit with ID-tag 1 coming
fromWest (W) input port is being switched out to the East E output
port. The ID-tag updating procedure is explained in the following
items.
1. As presented in Fig. 5, the header ﬂit with ID-tag 1 are coming
from West (W) input port.
2. The Arbiter unit from East output port selects the ﬂit by giving
W signal arbitration to the multiplexor such that the header ﬂit
is switched out to the East output port.
3. When the ID-Management (IDM) unit in the multiplexor
detects the type of the ﬂit (header type in this case), then it
looks for a free ID slot in the ID Slot Table and has found that
ID slot number 2 is free (0F0). The nID column in the ID Slot
Table represents the reservable ID slot that can be used by a ﬂit
as its new ID-tag. The Sta column is the status of the slot num-
ber that can be set to free (0F0) or used (0U0).
4. At the same cycle, the IDM unit writes the previous ID-tag num-
ber of the header, i.e. 1, and from which port the header ﬂit
comes, i.e. W input port, into the slot number 2 of the column
ID and From of the Table, respectively.
5. Also at the same cycle, the UsedBW register is updated (Used-
BW UsedBW + reqBW) by adding the current value of the used
BW with the required BW of the packet attached on the header
ﬂit ﬁeld.
6. The slot number 2 is then used by the header ﬂit as its new ID-
tag to ﬂow on the link and the status of the ID slot changes from
free (0F0) to used (0U0) state.
Note: If another header ﬂit coming from W input port with ID-
tag 1 is switched to the East output port in the next time periods
(which means that it belongs to the same packet with the header
ﬂit shown in Fig. 5), then the header ﬂit will not reserve a new
ID-tag, but it will directly be allocated to ID-slot 2. This procedure
will be explained in Section 6.3.
6.2. Routing table reservation
This mechanism is also visually presented in Fig. 5. At the West
input port of the router node R2, now the header ﬂit is buffered in
the FIFO buffer at the West (W) input port. The routing reservation
procedure is explained in the following items.
1. As presented in Fig. 5, the header ﬂit with ID-tag 2 is now buf-
fered in the data buffer of the REB unit and is being routed by
the REB.
2. When the REB detects the type of the ﬂit (header ﬂit in this
case), then the Routing State Machine (RSM) and Routing Reser-
vation Table (RRT) are activated.
3. Routing slot number 2 in the RRT is selected in accordance with
the ID-tag of the header ﬂit.
4. When the header type is detected, then the routing information
will be selected from the RSM unit.
5. At the same cycle, address information (Xt, Yt) attached in the
header ﬂit is fed through to the RSM to make a routing decision.
In this example, we see that the routing direction is S (South).
Hence, the column S in the RRT is set such that the routing
information 00 0 0 1 00 is obtained as shown in the ﬁgure.
6.3. ID-tag indexing
This mechanism is visually presented in Fig. 6. At the East out-
put port of the router node R1, a payload ﬂit with ID-tag 1 coming
RE
ID slot 2 is found free
and reserved to record
old ID and from which
port the header come.
’Free’ status is set
to ’Used’ status.
U
F
nID
Sta.
Used status
Free status
new local ID status
new local ID:
:
:
:
NoC
Communication Link
...
ID
M
E W LSN
RE
Routing State
Machine
H
ead
X
s
Y
s
X
t
Y
t
H
ead
X
s
Y
s
X
t
Y
t
2 2
... ... ... ... ...
3
2
1
0
E
N
W
S
L
0
1
0
FIFO Queue
4.Read Buffer +
Routing ComputeBuffer
3.Write
0
0
Routing Reservation Table
Swrite route
R
eq
R
eq
BW BW
H
ead
X
s
Y
s
X
t
Y
t
0
0
1
W
H
ead
X
s
Y
s
X
t
Y
t
2
...
ID
1
E
N
W
S
L
A
rbiterE
N
W
S
L
MIM
L
W
W
From nID
0
1
2
3
...
M
Sta.
F
U
U
F
F
...
Traversal
2.Switch/Link
Arbitration
1.Output
ID Slot Table
write
w
rit
e
Search for a free ID slot
ID
 re
pl
ac
e
R
eq
BW Req
BWReqBW
UsedBW
UsedBW
+
Ch
ec
k 
BW
E
N
W
S
L
)troPtupnItseW()troPtuptuOtsaE(
West
(Port 3)
(Port 2)
(Port 4)
(Port 1)
West
(Port 3)
(Port 2)
(Port 4)
East
(Port 1)
East
South
South
N
orth
N
orth
Local
(Port 5)
Local
(Port 5)
Node R1
Router
Node R2
Router
Output Port
Input Port
Communication Link
NoC
Fig. 5. ID-tag update and routing table reservation made by a header ﬂit.
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 177fromWest (W) input port is being switched out to the East E output
port. The ID-tag indexing procedure is explained in the following
items.
1. As presented in Fig. 6, the databody ﬂit with ID-tag 1 are com-
ing from West (W) input port.
2. The Arbiter unit from East output port selects the ﬂit by giving
W signal arbitration to the multiplexor such that the databody
ﬂit is switched out to the East output port.
3. When the ID-Management (IDM) unit in the multiplexor
detects the type of the ﬂit (databody type in this case), then it
looks for combination of (ID, From) in the ID and From columns
of ID Slot Table. In this case, the databody ﬂit gives combination
(1,W).
4. The IDM unit ﬁnds the combination in the slot number 2.
Hence, the slot number 2 is then used by the databody ﬂit as
its new ID-tag to ﬂow on the link.
6.4. Routing table indexing
This mechanism is visually presented in Fig. 6. At the West in-
put port of the router node R2, now the payload ﬂit is buffered in
the FIFO buffer at the West (W) input port. The routing indexing
procedure is explained in the following items.1. As presented in Fig. 6, the databody ﬂit with ID-tag 2 is now
buffered in the data buffer of the REB unit and is being routed
by the REB.
2. When the REB detects the type of the ﬂit (databody ﬂit in this
case), then only the Routing Reservation Table (RRT) is acti-
vated, and the routing information will be selected from the
RRT unit.
3. Routing slot number 2 in the RRT is selected in accordance with
the ID-tag of the databody ﬂit. Hence, the routing decision is
obtained by fetching the routing information from the slot
number 2, i.e. 01 1 0 1 00 as shown in the ﬁgure.6.5. Bandwidth, ID-tag and routing terminations
When a tail ﬂit of a packet is switched out to an output port,
then the tail ﬂit will make ID-tag indexing procedure as explained
in Section 6.3. But at the last cycle, the bandwidth reservation, the
ID and From column in the slot number according to the ID-tag
number of the tail ﬂit will be set free. The status of the ID-slot
number will also be set from used (0U0) to free (0F0) state.
When a tail ﬂit of a packet is routed from an input port, then the
tail ﬂit will make routing-indexing procedure as explained in
Section 6.4. But at the last cycle, the Routing Information in the
ID Slot Table
D
B
od
0
0
1
W
D
B
od
2
...
ID
1
E
N
W
S
L
A
rbiterE
N
W
S
L
MIM
L
W
W
...
From nID
0
1
2
3
...
M
Sta.
U
U
U
F
F
...
Traversal
2.Switch/Link
Arbitration
1.Output
D
ata Payload
D
ata Payload
Ch
ec
k
ID
 R
ep
la
ceE
N
W
S
L
(East Output Port)
F
nID
Sta.
new local ID
new local ID status
Free status
Used statusU :
:
:
:
Communication Link
NoC
...
ID
M
E N S L
Routing Reservation Table
W
D
B
od
D
B
od
2 2
... ... ... ... ...
3
2
1
0
E
N
W
S
L
1
1
0
1
0
FIFO Queue
RE
4.Read Buffer +
Routing ComputeBuffer
3.Write
D
ata Payload
D
ata Payload
Machine
Routing State
(West Input Port)
West
(Port 3)
(Port 2)
(Port 4)
(Port 1)
West
(Port 3)
(Port 2)
(Port 4)
East
(Port 1)
East
South
South
N
orth
N
orth
Local
(Port 5)
Local
(Port 5)
Node R1
Router
Node R2
Router
Output Port
Input Port
NoC
Communication Link
Fig. 6. ID-tag indexing and routing table indexing made by a databody ﬂit.
Fig. 7. Node-to-node trafﬁc ﬂow for a radio system.
178 F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181routing slot number according to the ID-tag number of the tail ﬂit
will be set free.
7. Experimental results
In this section, the connection-oriented multicast method will
be veriﬁed on a radio system application benchmark from Nokia
[9]. The simulation is made at register-transfer level (RTL simula-
tion) by using an HDL (hardware description language) simulator.
The allocation of each task after application mapping on 2D 4  4
mesh topology is presented in Fig. 7. In general, the application
consists of 11 communication edges. Two of them i.e. communic-
ation a and h are multicast data communication. Communication
j is a broadcast (one-to-all) data communication, where the core
at node1 broadcasts data to all other cores. Communication k is
all-to-one data communication, where core at node1 receives data
from all other cores.
The injection rate in our NoC is controlled by inserting jitters
between two consecutive data ﬂits. Jitter is a zero ﬂit inserted in
one cycle period. The more jitters inserted between two consecu-
tive data ﬂits, the lower the setpoint of the bandwidth (BW). If
one data ﬂits are injected with Njit number of jitters in between,
then the BW setpoint will be Irate ¼ 1Njitþ1 ﬂit per cycle or word per
cycle, or one data ﬂit is injected in every Njit + 1 number of cycles.
In this simulation, the data frequency is set to 1 GHz, then 1 ﬂit/cy-cle is equal to 1 GHz  4  = 4000 MB/s (1 word consists of 4 by-
tes). However, the maximum BW capacity of our NoC link is only
1
2 ﬂit/cycle. Hence, the maximum data BW is Bmax ¼ 4000 12 ¼
2000 MB=s. Because our NoC router can perform 5 simultaneous
IO connections, then the maximum BW capacity of the NoC router
is 5  2000 MB/s = 10 GB/s. Thus, if we expect a BW of 512 MB/s,
then we can set Njit = 6, resulting in BW setpoint of
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 1794000 16þ1 ¼ 571:43 MB=s. If we set Njit = 7, then the BW setpoint
will be 4000 17þ1 ¼ 500 MB=s. Finer BW granularity can be con-
trolled by a compute element, which actually produces data sent
to network interface.
Fig. 8 shows the measurement of the expected, setpoint and ac-
tual measured BW for the on-chip radio system application. It looks
that the minimum bandwidth requirements of all communication
edge can be meet. The expected bandwidth is obtained from Fig. 7.
The setpoint BW is made by the source node based on the insertion
of the number of jitters to set the BW requirement as explained be-
fore. The actual bandwidth is measured at the destination node.
For the sake of simplicity, the ID slot reservation results of the
communication a  j and communication k is separated. Fig. 9(a)
presents one of many possible local ID slot reservations made
autonomously by the header ﬂits for communication edge a until
j. If the communication j is performed ﬁrstly, the multicast header
ﬂits will be sent ﬂit-by-ﬂit to all other nodes. Thus, this broadcast-
ing communication edge will reserve the ﬁrst local ID slots on
every communication media (i.e. ID slot 0). The other communica-
tion edges (communication a–i) will then reserve the rest ID slots
that are not reserved by communication j. Fig. 9(b) depicts also one
of many possible combinations of the local ID slot reservations for
communication edge k, when this connection is set up after all
other communication edges have established their connections.
This communication edge will also reserve the local ID slots that
have not reserved by the communication a–j. The bottom part of
the ﬁgure exhibits the ID slot reservation in the Local output port
of the router node1. The runtime local ID slot reservation is very
ﬂexible, because the header ﬂits, which reserve the ID slots auton-
omously, will just check available (free) ID slots that can be utilized
as their ID-tag from the ID Slot Table.
Reserved BW space on an output port is deﬁned as the total
accumulated BW reserved by some steams on the output port.
When a stream with a certain BW ﬂows into an output port, the
stream’s BW will be allocated to the MIM unit at the output port
in accordance with the requested BW space indicated on the head-
er ﬂit of the stream (please see again the packet format shown in
Fig. 2 in Section 4.1). The more streams ﬂow into an output port,
the more reserved BW spaces allocated at the output port.
Fig. 10 presents the reserved BW spaces on every output port of
all 16 router nodes. Fig. 10(a) shows the BW reservation for commu-
nication edge a until j, while Fig. 10(b) presents the BW reservation
for communication j. The ﬁgures represent a congestion situation on
each router node and hotspot locations in the network. In Fig. 10(a),
we can see that the hotspot occurs at node6, where total BW con-
sumption of all output port in the node is about 4100 MB/s, or about
41% of total maximum BW capacity (10 GB/s) of the router. In
Fig. 10(b),we can see clearly that thehotspots are located in the local
outputport of the routernode1, in the southoutput port of the router
node5 and of the router node9, respectively. 0
 100
 200
 300
 400
 500
 600
 700
 800
a b c d e f g h i j k
ba
nd
w
id
th
 (M
B/
s)
communication pair
expected
setpoint
actual/measured
Fig. 8. Setpoint and actual measurements of the communication bandwidth.8. Synthesis results
The synthesis results of the XHiNoC prototypes with multicast
guaranteed-bandwidth (GB) service and the Æthereal NoC with
the same technology size are presented in Table 2. The synthesis
is made using 65-nm CMOS standard-cell technology library from
Taiwan Semiconductor Manufacturing Company (TSMC). The NoC
with GB service is synthesized with target frequency of 1.47 GHz
(0.68 ns clock period).
For general overview, let us see the synthesis result of the Æthe-
real NoC [12]. The maximum frequency to transfer data in the
Æthereal NoC, which combines the BE and GT services with 32-
bit word size, is 500 MHz using a 130-nm standard-cell technol-
ogy, resulting in an aggregate bandwidth of 5  500 MHz 
32 bits = 80 Gbit/s. Hence, the bisection bandwidth for 32-bit data
width of the Æthereal’s link is 2  80 Gbit/s = 160 Gbit/s. The total
logic area of the Æthereal NoC is 0.2600 mm2.
For synthesis with 65 nm CMOS technology, the XHiNoC with
guaranteed-bandwidth (GB) service has logic cell area of about
0.0457 mm2 with 32-bit data width can be clocked until
1.47 GHz. With selected 1 GHz data frequency, the aggregate band-
width of the XHiNoC router (static routing, 2-depth FIFO, 32-bit
word size, 5 I/O ports) is 5 1000 MHz 32 bits 12 ¼ 80 Gbit=s.
Hence, the bisection bandwidth (for 32-bit data width and 1 GHz
data frequency) of the XHiNoC’s link is 2  80 Gbit/s = 160 Gbit/s.
The aggregate bandwidth is divided by two because of two-cycle
delay between every ﬂit during data link traversal pipeline stage,
or the maximum data rate per link in XHiNoC is 12 ﬂit/cycle. There-
fore, in order to gain the same bisection bandwidth, the XHiNoC
should be clocked twice faster than the Æthereal NoC.
Table 3 shows the power estimation of the XHiNoC router with
connection-oriented guaranteed-bandwidth service. The power
analysis is made with target frequency 1.47 GHz with maximum
switching activity, i.e. all input ports are actively switched with
1.47 GHz data frequency.9. Discussions
The ﬁnest BW granularity presented by the IDMA-based multi-
ple access method is independent from the available local ID slots.
Based on our current implementation, we use 12 bits from header
ﬂit bit ﬁeld for requested BW of every stream/message capable of
providing 212 = 4096 BW granularity. In our current microarchitec-
ture, we set the maximum link BW of 2000 MB/s as 0x7FF. By
using a TDMA method, the BW granularity depends on the number
of time slots on each link.
In the TDMA NoC version with pre-processing time-slot sched-
uling algorithm, a central reconﬁgurator unit is required to pro-
gram and allocate a time slot for each message on every port of
the routers to guarantee the conﬂict-free routing conﬁguration. A
global network view is required in this case. Thus, the pre-com-
puted time slot allocation algorithmmust be processed before run-
ning the real application. The XHiNoC concept does not require
such pre-processing scheduling algorithm, because the connection
can be established at runtime during application execution time.
The central reconﬁgurator unit is also not require any more, and
replaced by the runtime-programmable ID management table
and runtime-programmable routing reservation table, which are
hardcoded into the router hardware.10. Conclusions
A guaranteed-bandwidth multicast-enabled NoC with the
runtime connection establishment method has been presented in
this paper. By implementing the connection-oriented data
Fig. 9. One of many possible runtime local ID slot reservation conﬁgurations for each communication.
Fig. 10. Number of bandwidth reservations at each outgoing port of all 16 network nodes.
Table 2
Comparison of the XHiNoC and Æthereal NoC.
XHiNoC Æthereal NoC
Technology size 65 nm 130 nm
Technology vendor TSMC No info
Target frequency 1.47 GHz 500 MHz
Total logic cell area 0.0457 mm2 0.2600 mm2
Bisection BW (32-bit) 160 Gbit/s 160 GBit/s
@ 1 GHz data
Frequency
Table 3
Power estimation of the XHiNoC using 65-nm CMOS technology library.
Synthesis result
Target frequency 1.47 GHz
Est. net switch. power 1.9800 mW
Est. cell intern. power 39.3734 mW
Est. cell leakage power 2.2853 lW
180 F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181communication, the total accepted bandwidth for each communi-
cation link in the NoC will not exceed its maximum bandwidth
capacity. Each stream ﬂow must register its bandwidth require-
ment on a BW management unit at each outgoing port of theNoC routers. A stream will not be allowed to ﬂow on a link, which
has no more free BW space. In this situation, the NoC will not be
saturated, and communication rate of each stream ﬂow can be
guaranteed accordingly. An input port does not need to differenti-
ate the incoming data streams with different bandwidth require-
ments. This is also the reason why the guaranteed BW
communication can work with a simple 2-slot FIFO buffer.
F.A. Samman /Microprocessors and Microsystems 38 (2014) 170–181 181The advantageous feature and characteristic of the NoC is the
simplicity to design the guaranteed-bandwidth router. The unicast
or multicast connection conﬁguration and termination are ﬂexible
and made autonomously at runtime by header ﬂits. The experi-
mental result presented in this paper has shown that the IDMA
method works well and can be an alternative solution to improve
the quality-of-service for NoCs.
Acknowledgments
The author gratefully acknowledges the comments, helpful sug-
gestions and positive critics made by the anonymous reviewers,
and DAAD (Deutscher Akademischer Austausch-Dienst, German Aca-
demic Exchange Service) that has awarded the author with DAAD-
Scholarship to obtain his doctor of engineering (Dr.-Ing.) degree at
Technische Universität Darmstadt in Germany. The author would
also like to thank LOEWE-Zentrum AdRIA in Fraunhofer Institute
LBF Darmstadt for further cooperation within Project AdRIA (Adap-
tronik-Research, Innovation, Application) funded by Hessian Min-
istry of Science and Arts, and Prof. Thomas Hollstein at Tallinn
University of Technology for valuable discussions about net-
works-on-chip topic.
References
[1] T. Bjerregaard, J. Sparsø, Implementation of guaranteed services in the MANGO
clockless network-on-chip, IEE Proceedings Computers and Digital Techniques
153 (4) (2006) 217–229.
[2] S. Evain, J.-P. Diguet, D. Houzet, NoC design ﬂow for TDMA and QoS
management in a GALS context, EURASIP Journal on Embedded Systems 6
(2006) 1–12.
[3] A. Hansson, K. Goossens, A. Radulescu, Uniﬁed approach to mapping and
routing on a network-on-chip for both best-effort and guaranteed service
trafﬁc, VLSI Design, Journal of Hindawi Publishing Corp 7 (2007) 1–16.
[4] C. Hilton, B. Nelson, PNOC: a ﬂexible circuit-switched NoC for FPGA-based
systems, IEE Proceedings Computers and Digital Techniques 153 (3) (2006)
181–188.
[5] M. Koibuchi, K. Anjo, Y. Yamada, A. Jouraku, H. Amano, A simple data transfer
technique using local address for networks-on-chip, IEEE Transactions on
Parallel and Distributed Systems 17 (12) (2006) 1425–1437.
[6] A. Leroy, D. Milojevic, D. Verkest, F. Robert, F. Catthoor, Concepts and
implementation of spatial division multiplexing for guaranteed throughput
in networks-on-chip, IEEE Transactions on Computers 57 (9) (2008) 1182–
1195.
[7] S. Lin, L. Su, G. Zhou, D. Jin, L. Zeng, Design networks-on-chip with latency/
bandwidth guarantees, IET Computers & Digital Techniques 3 (2) (2009) 184–
194.
[8] J. Liu, L.-R. Zheng, H. Tenhunen, Interconnect intellectual property for
Network-on-Chip (NoC), Elsevier Journal of Systems Architecture 50 (2–3)
(2004) 65–79.
[9] Z.LuA. Jantsch, TDM virtual-circuit conﬁguration for network-on-chip, IEEE
Transactions on Very Large Scale Integration (VLSI) Systems 16 (8) (2008)
1021–1034.
[10] M. Millberg, E. Nilsson, R. Thid, A. Jantsch, Guaranteed-bandwidth using
looped containers in temporally disjoint networks within the nostrumnetwork on chip, in: Proc. Design Automation and Test in Europe (DATE’04),
2004, pp. 890–895.
[11] I.M. Panades, A. Greiner, A. Sheibanyrad, A low cost network-on-chip with
guaranteed service well suited to the GALS approach, in: Proc. the 1st Int’l
Conf. and Workshop on Nano-Networks, 2006, pp. 1–5.
[12] E. Rijpkema, K. Goossens, A. Radulescu, J. Dielissen, J. van Meerbergen, P.
Wielage, E. Waterlander, Trade offs in the design of a router with both
guaranteed and best-effort services for networks on chip, Proceedings IEE
Computers & Digital Techniques 150 (5) (2003) 294–302.
[13] F.A. Samman, T. Hollstein, M. Glesner, Multicast parallel pipeline router
architecture for network-on-chip, in: Proc. Design Automation and Test in
Europe (DATE’08), 2008, pp. 1396–1402.
[14] F.A. Samman, T. Hollstein, M. Glesner, Adaptive and deadlock-free tree-based
multicast routing for networks-on-chip, IEEE Transactions on Very Large Scale
Integration (VLSI) Systems 18 (7) (2010) 1067–1080.
[15] F.A. Samman, T. Hollstein, M. Glesner, New theory for deadlock-free multicast
routing in wormhole-switched virtual-channelless networks-on-chip, IEEE
Transactions on Parallel and Distributed Systems 22 (4) (2011) 544–557.
[16] F.A. Samman, T. Hollstein, M. Glesner, Wormhole cut-through switching: ﬂit-
level messages interleaving for virtual-channelless network-on-chip, Elsevier
Science Journal, Microprocessors and Microsystems – Embedded Hardware
Design 35 (3) (2011) 343–358.
[17] F.A. Samman, T. Hollstein, M. Glesner, Planar adaptive network-on-chip
supporting deadlock-free and efﬁcient tree-based multicast routing method,
Elsevier Science Journal, Microprocessors and Microsystems – Embedded
Hardware Design 36 (6) (2012) 449–461.
[18] X. Wang, T. Ahonen, J. Nurmi, Applying CDMA technique to network-on-chip,
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15 (10)
(2007) 1091–1100.
[19] W.D. Weber, J. Chou, I. Swarbrick, D. Wingard, A quality-of-service mechanism
for interconnection networks in system-on-chips, in: Proc. Design Automation
and Test in Europe (DATE’05), 2005, pp. 1232–1237.
Faizal Arya Samman was born in Makassar, Indonesia.
He received his Bachelor of Engineering degree in
Electrical Engineering from Gadjah Mada University at
Yogyakarta, Indonesia in 1999. In 2002, he received his
Master of Engineering degree with Scholarship Award
from Indonesian Ministry of National Education in
Control and Computer System Laboratory and in Inter-
University Center for Microelectronics Research, at
Bandung Institute of Technology in Indonesia. In 2002,
he was appointed to be a research and teaching staff at
Hasanuddin University in Makassar, Indonesia. From
2006 until 2010 he received Scholarship Award from
DAAD (Deutscher Akademischer Austausch Dienst – German Academic Exchange
Service) to pursue doctoral degree at Technische Universität Darmstadt in Germany.
From 2010 until 2012, he was a postdoctoral fellow within the research cooperation
framework between Darmstadt University of Technology and Fraunhofer Institute
LBF in Darmstadt in LOEWE-Zentrum AdRIA (Adaptronik-Research, Innovation,
Application). He is now a research and teaching staff at Department of Electrical
Engineering, Universitas Hasanuddin in Makassar. His research interests include
network-on-chip microarchitecture, adaptive multiprocessor systems, program-
ming models for multiprocessor systems, design and implementation of analog and
digital electronic circuits for adaptronic and control system application as well as
energy harvesting systems, wireless sensor nodes, wired/wireless distributed con-
trol systems.
