Runtime Contention and Bandwidth-Aware Adaptive Routing Selection Strategies for Networks-on-Chip by Samman, Faizal Arya et al.
Runtime Contention and Bandwidth-Aware
Adaptive Routing Selection Strategies
for Networks-on-Chip
Faizal Arya Samman, Member, IEEE, Thomas Hollstein, Member, IEEE, and
Manfred Glesner, Fellow, IEEE
Abstract—This paper presents adaptive routing selection strategies suitable for network-on-chip (NoC). The main prototype
presented in this paper uses contention information and bandwidth space occupancy to make routing decision at runtime during
application execution time. The performance of the NoC router is compared to other NoC routers with queue-length-oriented adaptive
routing selection strategies. The evaluation results show that the contention- and bandwidth-aware adaptive routing selection
strategies are better than the queue-length-oriented adaptive selection strategies. Messages in the NoC are switched with a wormhole
cut-through switching method, where different messages can be interleaved at flit-level in the same communication link without using
virtual channels. Hence, the head-of-line blocking problem can be solved effectively and efficiently. The routing control concept and the
VLSI microarchitecture of the NoC routers are also presented in this paper.
Index Terms—Network on chip, bandwidth-aware adaptive routing, contention-aware adaptive routing, congestion-aware adaptive
routing
Ç
1 INTRODUCTION AND MOTIVATION
NETWORK-ON-CHIP (NoC)is a feasible communicationinfrastructure formany-core processor systemsbecause
of the scalable bandwidth capacity of the NoC. Currently,
there are many research challenges in the field of many-core
processor systems starting from abstract application layer
until physical network layer. In the network layer, optimum
network and router architecture design in terms of cost (logic
area, power, etc.) as well as its performance issues (network
bandwidth capacity, router latency, etc.) [13] are the challen-
ging topics. Among the topic around router architecture as
the main part of a network communication infrastructure,
switching methods, routing algorithms, quality-of-service
and flow control have been extensively discussed in
literature. Specifically, routing algorithm in any case could
give impact on the area and the network performance.
In general, the routing algorithm can be made in
deterministic (static) or adaptivemanner.Network designers
are motivated to design adaptive routing algorithms because
of two main objectives, i.e., to avoid entering hotspot links
such that communication performance can be increased, and
to avoid entering faulty network components (faulty switch
or link). Theworks in [18], [11], and [16], for instancespropose
fault-tolerance adaptive routing algorithms. Network faults
can turn a regular network into a nonregular network. The
work in [10] presents a fault-tolerance routing algorithm by
balancing traffic over network faults and nonregularity due
to the network component faults.
The main issue related to the adaptive routing algo-
rithms is deadlock configuration problem due to cyclic
dependency. The works in [4] and [5] have presented theory
about deadlock-free routing algorithms and formal descrip-
tions about the deadlock configuration. Turn models can be
principally used to design a deadlock-free adaptive routing
algorithm [6]. A deadlock-free adaptive routing method to
cover the problem of oversized IP components placement in
irregular mesh-based network is presented in [23].
Most of routing implementations made at design time
use routing tables to route messages (packets). The contents
of the routing tables are programmed at design time, and
then adaptive routing paths are assigned in every routing
table in the network nodes by using some technique. The
work in [19], for example, presents an offline (at design-
time) routing method called “Application-Specific Routing
Algorithm” (APSRA) used to increase the degree of routing
adaptivity for hotspot avoidance. The “Segment-based
Routing” (SR) presented in [14] proposes also an offline
routing method, in which the network is segmented into
some subnets and restrictions are applied to avoid deadlock
configurations. Another method is the dynamic routing
protocol in [12] used for balancing distribution of traffic in
NoCs. However, since the aforementioned routing methods
[19], [14], [12] used a static or offline routing design
approach, they cannot be classified into a pure adaptive
routing method.
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013 1411
. F.A. Samman is with Universitas Hasanuddin, Fakultas Teknik, Jurusan
Teknik Elektro, Jl. Perintis Kemerdekaan Km. 10, Makassar 90245,
Indonesia. E-mail: faizalas@unhas.ac.id.
. T. Hollstein is with the Department of Computer Engineering, Dependable
Embedded Systems Group, Tallinn University of Technology, Raja 15,
12618 Tallinn, Estonia. E-mail: thomas@ati.ttu.ee.
. M. Glesner is with the Institut fu¨r Datentechnik, Technische Universita¨t
Darmstadt, Forchungsgruppe Mikroelektronische Systeme, Merckstr. 25,
64283 Darmstadt, Germany. E-mail: glesner@mes.tu-darmstadt.de.
Manuscript received 26 Feb. 2012; revised 8 May 2012; accepted 31 May
2012; published online 22 June 2012.
Recommended for acceptance by J. Flich.
For information on obtaining reprints of this article, please send e-mail to:
tpds@computer.org, and reference IEEECS Log Number TPDS-2012-02-0147.
Digital Object Identifier no. 10.1109/TPDS.2012.200.
1045-9219/13/$31.00  2013 IEEE Published by the IEEE Computer Society
In most embedded system applications, where most of
the NoC platforms are likely to be used, the intercore
communication patterns are known. Therefore, the offline
(static) congestion avoiding techniques can be used, result-
ing in a much simpler router. A runtime (dynamic)
adaptive routing method is however an interesting ap-
proach in the future NoC-based multicore embedded
systems, where applications may not be known in advance.
Indeed, some embedded IC vendors in multicore era could
potentially not only market IP cores but also system
architectures [3], where many applications can be mapped
onto the system architectures. Therefore, the implementa-
tion of the runtime adaptive routing will simplify an
embedded system production because the designers will
not need to configure the routing information on the on-
chip router anymore. In this context however, the runtime
techniques will need extra area cost and complexity.
In lookup-table-based routing algorithms, the size of the
tables will increase as the network size increases, since all
entries must be added in the tables. Some works propose
then different techniques to reduce the size of the routing
tables. The work in [15] presents a region-based routing
algorithm aimed at reducing the size of routing tables for
NoCs by grouping destination network into network
regions. The work in [8] shows a simple data transfer
technique by applying local addresses (labels), which are
computed offline for each flow in an application (at design-
time routing approach).
With the same background mentioned in [8], our
proposed methodology can be implicitly viewed also as a
technique to reduce the number of entries in the routing
tables based on runtime variable (dynamic) local message
identity (ID) technique that will be explained later. In our
experiments, all considered traffic can be still routed under
several scenarios, although the number of available ID slots
per link is set less than the number of node entries in theNoC.
Our methodology can be classified into runtime distributed
routing approach, where the routing is made locally in every
NoC router at runtime during application execution time.
The remaining sections are organized in the following.
Section 2 presents the state of the art of the adaptive routing
selection strategies that have been proposed so far for NoCs.
Section 3 describes briefly the main contribution of this
work. A 2D planar adaptive routing algorithm for mesh
NoC platform is presented in Section 4. Section 4 shows also
different adaptive routing selection functions and the VLSI
microarchitecture of the contention- and bandwidth-aware
(CBWA) and queue-length-oriented NoC router. Section 5
shows the performance evaluations of the different adaptive
routing selection function under different traffic scenarios
and different network sizes. Sections 6 and 7 present the
synthesis results and concluding remarks, respectively.
2 STATE OF THE ART OF ADAPTIVE ROUTING
SELECTION STRATEGIES
2.1 Selection Based on FIFO Queue Occupancy
(FQO)
A commonly used adaptive routing policy is based on
buffer occupancy, where the “congestion information” (CI)
of a set of possible admissible output ports connected to the
downstream (next-hop) routers are traced back to upstream
routers. The CI data can be represented as the length of data
queues in the FIFO buffer, which can be indicated by
multiple-bit signal, or the buffer status (free or busy), which
can be indicated by single-bit signal. These CI signal will be
used by a packet on a current router to select a best routing
direction between alternative downstream outgoing links at
any instant time. A “stress value,” which indicates how
many packets coming into the downstream outgoing links
at a unit time [25], can also be used as an alternative CI data
for packet-switched routers to make routing decisions.
Many works have used this queue-length-oriented adaptive
routing selection such as in [9], [17] [7] and [25].
A specific technique to drain messages from hotspot
areas called “Contention-Aware Input Selection” (CAIS) is
presented in [24]. Rather than adaptively selecting less
congested outgoing from downstream directions, the CAIS
method focuses on selection of input ports from upstream
(backtrace) directions. When two or more input ports
request the same output port, an arbiter unit at the output
port will select an input port having more waiting packets
in its upstream direction. It seems that the adaptive routing
path selection is made by the arbitration unit rather than by
the routing engine unit.
The work in [2] has presented an interesting method to
make adaptive routing selection based on the number of
free buffer slots and availability of buffer in the two-hop
adjacent neighbors “Neighbor-on-Path Routing Selection
Strategy.” However, the main critic to apply such metho-
dology is the problem of unpredictable traffic situation as
shown in Fig. 1.
Packet A will be routed from node (1,1) to (3,3). By
measuring two-hop neighbor CI, the packet A at node (1,1)
can overview four alternative paths, i.e., node-to-node
paths ð1; 1Þ  ð1; 2Þ  ð1; 3Þ; ð1; 1Þ  ð1; 2Þ  ð2; 2Þ; ð1; 1Þ 
ð2; 1Þ  ð3; 1Þ, and ð1; 1Þ  ð2; 1Þ  ð2; 2Þ. However, packet
header A has only two alternative output selections, i.e., to
North or East output port as depicted in Snapshot 1 of the
figure. All four adjacent neighbors send back CI, i.e., the
length of data queues in the FIFO buffers and the buffer
status (“free” or “busy”). At the same time packet headers
B and C come to node (2,1) via Local and East input port,
respectively, in which packet A certainly does not know
such situation. In this case, packet A will not use ð1; 1Þ 
ð1; 2Þ  ð1; 3Þ path. As shown in Snapshot 2, we assume
that the routing engine finally decides to route packet
header A to East output port, and at the same time at node
(2,1), packet headers B and C are routed to West and
North, respectively. Now, unexpected situation occurs,
1412 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013
Fig. 1. Problem in the unpredictable two-hop neighbor-on-path conges-
tion measurement.
where packet A has selected a nonoptimal path because of
the unpredictable traffic situation. The same situation
could also happen, when packet A would be routed to
the North output port.
2.2 Selection Based on Bandwidth-Space
Occupancy
Another approach to make adaptive routing decision is the
strategy based on bandwidth-space occupancy. Fig. 2
shows us a snapshot of a network situation of the
difference orientations between the adaptive routing
strategies based on FQO or can be called as Congestion-
Aware Adaptive Routing Selection Strategy and bandwidth
space occupancy or can be called as a Bandwidth-Aware
(BWA) Adaptive Routing Selection Strategy. Fig. 2 presents a
snapshot of the network situation where the FQO strategy
reads the CI traced back from two-hop possible down-
stream neighbor routers.
In Fig. 2, packetA coming toWest input port at node (1,2)
will be routed to node (3,1). We can see that packet A has
three alternative paths to reach node (3,1), i.e., node-to-node
paths ð1; 2Þ  ð2; 2Þ  ð3; 2Þ  ð3; 1Þ, ð1; 2Þ  ð2; 2Þ  ð2; 1Þ 
ð3; 1Þ, and ð1; 2Þ  ð1; 1Þ  ð2; 1Þ  ð3; 1Þ. The header of the
packet A (flit A1) has alternative output ports at the instant
time, i.e., East and South. While the header flit of packet A is
coming to node (1,2), at the same router node, the payload
flit of packet B is coming from North input port and the
payload flit of packet C is coming from South input port.
They have acquired in advance the South and East output
ports and have reserved 50 and 100 percent of the maximum
BW space (Bmax) of the output ports, respectively.
As presented in Fig. 2, the two-hop CI signals are sent
back to node (1,2). If the packet A reads the two-hop CI
signals and buffer availability (like the strategy used in [2]),
then packet A will select South output port as the best
output path, because the South output port presents the CI
signal value of 1 and the East output port presents the CI
signal value of 1 and 2 for two consecutive paths, i.e., the
formed paths when the packet A would be routed to the
East port. But if only 1-hop CI signals are considered (not
presented in the figure), then there is no different hotspot
situation based on the viewpoint of the packet A, because
both East and South neighbor send back the same queue-
length (data queue occupancy), i.e., 1 data queue.
The situation presented in Fig. 2 is actually the main
functionality of using the N-hop neighbor CI without
considering its drawbackmentioned in Section 2.1.However,
if the packet A just reads the actual bandwidth (BW) space
occupancy of the two alternative output port at the current
node (1,2), then packet A can view the difference hotspot
situation.Hence, packetAwill be routed also to South output
port when using the BWA strategy because it has more free
BW spaces.
3 CONTRIBUTION
The work in [20] presents BWA adaptive routing method.
However, this method computes adaptive routing paths
offline (at design time). A runtime BWA adaptive routing
function called AdNoC is presented in [1]. The proposed
method selects an output port having more free bandwidth
spaces. The BWA adaptive routing selection of our NoC
called eXtendable Hierarchical NoC (XHiNoC) has the same
strategy as AdNoC’s strategy. However, AdNoC uses
virtual channel (VC) buffers, leading to extra large area
overhead, while XHiNoC do not use them. The “band-
width/contention/congestion look-ahead” method used in
XHiNoC can compute immediately a routing decision in
one cycle period. Meanwhile, the AdNoC implementation
result requires 4 cycle periods to make routing decision,
leading to routing computation time overhead.
Moreover, the XHiNoC can also implement many
strategies or the combination of the adaptive output
selection strategies. The specific work presented in this
paper is the evaluation on the performance of the conges-
tion-aware, contention-aware and BWA adaptive routing
selection strategy, where routing decisions are made at
runtime during application execution time. A concept of
adaptive routing with a capability to interleave different
messages at flit-level in the same communication link has
been introduced in [21]. However, the work makes adaptive
routing decisions based on the contention information
between alternative output ports.
4 ALGORITHMS AND MICROARCHITECTURE
4.1 Two-Dimensional Planar Adaptive Routing
Algorithm
Fig. 3a shows a 2D mesh-planar topology, where the NoC is
divided into two subnets, i.e., Xþ (increment) subnetwork
depicted in solid lines and X (decrement) subnetwork
depicted in dashed lines. If a target node offset of a packet is
xoffset ¼ xtarget  xsource  0, then the packet will be routed
through the Xþ subnetwork, while if its target node offset is
xoffset  0, then it will be routed through the X
 subnet-
work. Once a packet is routed to a subnetwork, it will not
move to another subnet. By using such routing rule, the
minimal planar adaptive routing algorithm will be free from
a cyclic dependency (free from a deadlock configuration).
The main advantage of this NoC topology architecture
compared to the turn models approach commonly used in
the standard-mesh structure is that a minimal adaptive
routing can be made in all nonzero offset directions with
maximal two alternative routing directions. As shown in
Fig. 3a, for example, when the targets of packets are located
SAMMAN ET AL.: RUNTIME CONTENTION AND BANDWIDTH-AWARE ADAPTIVE ROUTING SELECTION STRATEGIES FOR NETWORKS-ON-... 1413
Fig. 2. A situation of two-hop CI trace back and actual link bandwidth
consumption.
in North-East area (node 1 to 12), South-East area (node 21
to 18), North-West area (node 5 to 8), or South-West area
(node 25 to 14), then the packets can use one of three
possible paths adaptively to reach the target nodes.
Algorithm 1 presents the 2D planar adaptive routing
algorithm used for the 2D mesh planar multicast router. The
routing algorithm is divided into two subrouting codes for
Xþ and X subnetwork in the 2D mesh planar topology. In
the Xþ subnet, the set of output ports that can be selected
are fEAST; SOUTH1; NORTH1; LOCALg. In the X sub-
net, the set of output ports that can be selected are
fWEST; SOUTH2; NORTH2; LOCALg.
Algorithm 1. 2D Planar Adaptive Routing Algorithm
1: Xoffs ¼ Xtarget Xsource
2: Yoffs ¼ Ytarget  Ysource
3: while Packet is in SubnetXþ i.e.,(Xoffs  0) do
4: if Xoffs ¼ 0 and Yoffs ¼ 0 then
5: Routing ¼ LOCAL
6: else if Xoffs ¼ 0 and Yoffs > 0 then
7: Routing ¼ NORTH1
8: else if Xoffs ¼ 0 and Yoffs < 0 then
9: Routing ¼ SOUTH1
10: else if Xoffs > 0 and Yoffs ¼ 0 then
11: Routing ¼ EAST
12: else if Xoffs > 0 and Yoffs > 0 then
13: Routing ¼ Select(NORTH1; EAST )
14: else if Xoffs > 0 and Yoffs < 0 then
15: Routing ¼ Select(SOUTH1; EAST )
16: end if
17: end while
18: while Packet is in SubnetX i.e.,(Xoffs  0) do
19: if Xoffs ¼ 0 and Yoffs ¼ 0 then
20: Routing ¼ LOCAL
21: else if Xoffs ¼ 0 and Yoffs > 0 then
22: Routing ¼ NORTH2
23: else if Xoffs ¼ 0 and Yoffs < 0 then
24: Routing ¼ SOUTH2
25: else if Xoffs < 0 and Yoffs ¼ 0 then
26: Routing ¼WEST
27: else if Xoffs < 0 and Yoffs > 0 then
28: Routing ¼ Select(NORTH2;WEST )
29: else if Xoffs < 0 and Yoffs < 0 then
30: Routing ¼ Select(SOUTH2;WEST )
31: end if
32: end while
4.2 Local ID-Based Data Multiplexing
We use a wormhole cut-through switching technique [22],
where flits of different messages can be interleaved at flit-
level, and share the same communication media based on
the locally organized message identity. Flits belonging to
the same message will always have the same local ID-tag
when acquiring a communication medium (network link).
The wormhole messages can be interleaved at flit-level
because every flit has a unique local ID-tag, which
dynamically changes, to differentiate it from other flits that
belong to different packets in the same link. This switching
scheme results in a special routing paradigm, in which the
on-chip switch “routes flits instead of packets.”
The local ID tag of a message is updated by an ID
management (IDM) unit implemented at output port, when
the message enters a new communication channel. By using
this kind of wormhole switching, the head-of-line blocking
problem commonly happen in the traditional wormhole
switching can be solved without implementing VCs. The
ID-based switching method performs equally to the VC-
based method, where the VC flow control interleaves
packets from different VCs. Compared to the traditional
VC-based wormhole switching, the ID-based method
demands less area, because it enables us to implement 2-
depth single buffer per port. Data buffers increase sig-
nificant not only logic area but also power dissipation. The
discussion about area comparison of the ID-based router
and the VC-based routers can also be found in our previous
paper [22]. In the paper, we can see a very significant area
overhead of a NoC with VCs compared to our XHiNoC
with ID-based method (without VCs) by using the same
CMOS technology size.
The concept of the wormhole switching is depicted in
Fig. 4, where different messages can be interleaved in the
same buffer pool or can virtually cut through at flit level.
Each message reserves one ID slot in order to be able to use
the link. Based on such situation, contention information in
the output port can be achieved by counting the number of
1414 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013
Fig. 3. Mesh-planar-based network and possible minimal planar
adaptive routing paths.
the reserved ID slots in the link. Moreover, if the ID slot
reservation is followed by bandwidth reservation, then a
BWA adaptive routing selection strategy can be also
implemented in our NoC.
As presented in Fig. 4b, two example cases are exhibited.
The first case (upper figure) shows a flit interleaving where
the total bandwidth consumption of all messages are
100 percent of the maximum link bandwidth capacity
(Bmax), i.e., each of four messages consumes 25 percent
Bmax. The second case presents that 57.5 percent of the Bmax
have been consumed by all packets, i.e., packet A is
20 percent, and packet B, C, and D are 12.5 percent Bmax,
respectively. Thus, there is still 42.5 percent free BW that
can be used by other wormhole packets coming to the link.
Fig. 4c shows the flow of data when using the traditional
packet switching method. Each header of packets is blocked
and must wait until all flits of the previously switched
packet have been forwarded. Due to the head-of-line
blocking problem, in the traditional packet switching, the
latency will tend to increase exponentially as the packet size
or the number of packets is increased. This situation does
not happen in our NoC, where the flits of different packets
can be interleaved each other as depicted in Fig. 4b,
resulting in better network latency characteristic, where
the latency could tend to increase linearly as the number of
flits is increased for certain traffic scenarios.
The mechanisms to reserve a local ID slot from the ID
slot table and to program a routing output direction in the
routing reservation table (RRT) by the wormhole packets
are made at runtime during application execution time.
Therefore, the XHiNoC uses a special packet format for the
wormhole packets by introducing a flit type bit field (beside
the ID-tag bit field) in every flit of the wormhole packets to
enable such mechanisms. The ID-based architecture will
give more significant benefits if packets are very long.
4.3 Adaptive Routing Selection Functions
Five router implementations based on information that are
considered to make routing decision and based on the
viewpoint of our NoC microarchitecture will be presented.
The three considered information are described in the
following:
. Identity (ID) slot occupancy (the number of free ID
slots). This information can be called also as
Contention Information of an output port, i.e., the
number of messages that have contented (competed)
so far to access the output port. Since our router can
interleave different wormhole messages at flit-level
in the same link without using VCs, then the number
of reserved ID slots will represent the number of the
wormhole messages that have been mixed in the
outgoing link.
. BW space occupancy (the number of free BW space).
This information can be called also as BW-Reservation
Information of an output port, i.e., the number of BW
spaces that have been reserved by messages to
access the output port.
. Buffer space occupancy (the number of data queue in a
FIFO buffer). This information can be called also as
CI of an output port, i.e., the queue length in the
FIFO buffer at the input port of the next neighbor
switch connected directly to the output port.
4.3.1 BW-ID Version
This prototype uses two information signals to make
routing decisions. The first prioritized signal is the number
of the reserved bandwidth spaces, and the second one is the
number of used ID slots (ID slot occupancy). This adaptive
routing strategy can be called as a Contention- and BWA
Adaptive Routing Selection Strategy.
Messages are routed to an output direction having less
reserved bandwidth spaces. If the numbers of the reserved
BW spaces between two output ports are equal, then the
second prioritized signal is used, i.e., the number of
reserved ID slots. When the numbers of the reserved BW
spaces between the alternative output ports are equal, the
messages are then routed to an output direction having less
reserved ID slots.
4.3.2 FQ-ID Version
This prototype uses also two information signals to make
routing decisions. The first prioritized signal is the number
of the used buffer spaces, and the second one is the ID slot
occupancy. This adaptive routing strategy can be called as a
Contention- and Congestion-Aware (CCA) Adaptive Routing
Selection Strategy.
Messages are routed to an output direction having less
utilized buffer spaces. If the number of the used buffer
spaces between two output ports are equal, then the
second prioritized signal (the number of reserved ID slots)
is used. The messages are then routed to an output
direction having less reserved ID slots when the numbers
of FIFO queue occupancies between the alternative output
ports are the same.
SAMMAN ET AL.: RUNTIME CONTENTION AND BANDWIDTH-AWARE ADAPTIVE ROUTING SELECTION STRATEGIES FOR NETWORKS-ON-... 1415
Fig. 4. Local ID-based data multiplexing and traditional packet switching.
4.3.3 BW Version
This prototype uses single information signals to make
routing decisions. This adaptive routing strategy can be
called as a BWA Adaptive Routing Selection Strategy. The BW
version adaptive routing selection function is simpler than
the BW-ID version because the usedID signals from both
alternative output ports are removed from the selection
mechanism. Messages are routed to an output direction
having less reserved BW spaces.
4.3.4 FQ Version
This prototype uses single information signals to make
routing decisions. Messages are routed to an output
direction having less used FIFO buffer spaces. This adaptive
routing strategy can be called as Congestion-Aware Adaptive
Routing Selection Strategy.
4.3.5 ID Version
This prototype uses also single information signals to make
routing decisions. This adaptive routing strategy can be
called as Contention-Aware Adaptive Routing Selection Strat-
egy. In this router prototype, messages are routed to an
output direction having less reserved local ID tags.
4.4 Router Microarchitecture
The microarchitectures of the NoC router that uses the
CBWA adaptive routing and the CCA adaptive routing
selection strategies are presented in Fig. 5. For the sake of
simplicity, only the router components in East input port
and in West output port are depicted. The router is
designed based on a 2D mesh-planar topology, where
each router has seven IO ports, i.e., East, North1, North2,
West, South1, South2 and Local ports. Crossbar inter-
connect is customized to optimize the logic area of the
router based on the allowed turns in the 2D planar
adaptive routing algorithm. The rest router internal IO
connections representing the prohibited turns are removed
from the architecture.
Set of components at each input and output ports n is the
FIFO buffer, the Routing Engine with Data Buffering (REB),
the Multiplexor with ID Management unit (MIM) and the
Arbiter unit. Based on the crossbar interconnects shown in
Fig. 5, the each port name is assigned to a port number as
follows: East (1), North (2), West (3), South (4), North2 (5),
South2 (6), and Local (7).
Set of subcomponents in the REB module at output port
n are the Routing State Machine (RSM), the Route Buffer,
the RRT and Grant Controller (GC). In the REB unit, the
combination of the RSM, in which the planar adaptive
routing algorithm is implemented (Algorithm 1), and the
RRT is implemented to support runtime adaptive routing
mechanism. The GC unit is used to control the read
operation of the FIFO Buffer, and the Route Buffer is used
to store data that will be routed to an output port.
We can see that we need a little bit effort to reconfigure
the microarchitecture of the XHiNoC at design time from
BW-ID to FQ-ID version. The required modifications are
1. add new output and input ports for the queue-
length signal from the FIFO buffer,
2. replace bði; jÞ (BW ) signal paths with qði; jÞ (QL)
signal paths,
3. replace the RSM with a new RSM, and
4. remove the BW accumulator unit.
4.5 Packet Format
The detail packet format and the control bits used in the
XHiNoC architecture for the CBWA and BWA adaptive
router is presented in Fig. 6. In our NoC, rather than
splitting a message into packets, it is split into several flits.
Hence, single message (short, long or even very long) can be
associated as single packet, which consists of single header
flits, payload data flits and single tail flit. At the bottom part
of Fig. 6, we can see examples of a short message (four flits)
and very long message (N number of flits). Even if the
1416 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013
Fig. 5. Switch microarchitectures for routers with BW-ID adaptive routing strategy. The output port of the FQ-ID adaptive routing strategy is shown in
the right side.
message is a very long stream of data, it has only single
header and tail flit, and it is not divided into packets.
Formally, each flit can be then defined as F ðtype; kÞ,
where type ¼ fheader; databody; tailg. A single flit has 39-bit
width, 32 bits for dataword plus 9 extra bits, i.e., 3-bit field
to define the type of flits and 4-bit field to determine the
local identity label or ID-tag k of a message. With 4 bits ID
field, we can have a number of 24 ¼ 16 ID-tags, such that
k ¼ f0; 1; 2; . . . ;MgM¼15.
An extra 12-bit field in the header and tail flits is used to
present the expected communication bandwidth. When a
header flit of a message flows through an output port of a
multiplexor, the value in this field will be used to reserve
BW for the message on the output port. The tail flit is used
to remove the BW reservation.
4.6 BW Management
An issue on how BW reservation can be managed at
runtime is briefly explained in this section. When multiple
applications are concurrently running, then there will be
situation where the network will be saturated, and further,
BW accumulator (BW occupancy) register on a link cannot
cover the total considered BWs of packets flowing through
the link. For Napp number of application, there would be a
probability that the total BW occupancy on a link exceeds
the maximum value of BW accumulator. BW management
should be made to guarantee that this problem could not
happen. A simple solution can be made by setting the
maximum data rate of communications in an application to
be equal to the maximum value of the BW accumulator
divided by the total number of applications. However, the
final optimum solution for such issue is not discussed
further in this paper.
The width of the required (ReqBW ) field determines the
resolution of the BW space in each outgoing port. For q-bit
field of the ReqBW , the resolution of BW space is 2q. Hence,
when q ¼ 12 as set in the Fig. 6, then the number of BW
variations that can be used by the messages to reserve BW
space at the output port is 212 ¼ 4;096. When we use Mega-
Byte per second (MB/s) as the unit of the required BW and
the maximum capacity of the link were for instance 4,096
MB/s, then if the required BW is 80 MB/s, for example,
then the binary signal of the ReqBW will be ½000001010000.
However, for a practical use in a specific application that
does not require an accurate BW resolution, the width of the
ReqBW field can be reduced into a reasonable width. This
reduction can also reduce the logic area and static power of
the considered on-chip router.
4.7 Routing Slot Reservation
Fig. 7a shows us in detail how a header flit of a packet
reserves a routing slot in the RRT unit at the West input
port. The header with ID-tag 3 comes into West input port.
The REB unit routes and buffers the header flit in its data
register. At the same cycle, the RSM unit computes the
requested routing direction based on target address written
in the header bit fields. The RSM unit selects one between
two alternative output ports based on two signals indicat-
ing the number of used ID slots (usedID) and used BW
spaces (usedBW ) in the two alternative output port. Both
signals are concatenated (usedBW&usedID) by the RSM
unit. The output port having less concatenated signal will
be selected as the best output direction. The output routing
made by the RSM unit is then stored in the slot number 3
of the RRT unit (in accordance with the ID-tag of the
header flit). The routing output decision is controlled by
the type field of a flit via a two input multiplexor. If the flit
type is a header, then routing decision is computed and
fetched from the RSM unit.
When a databody flit belonging to the same message
flows through the REB component as presented in Fig. 7b,
the routing direction will be indexed by the databody flit
using its ID-tag. The routing direction is thus fetched directly
from the RRT. The header with ID-tag 3 belongs to the same
message with the databody with ID-tag 3, because they
have the same ID-tag number. When a tail flit flows through
the REB component, the same mechanism takes place like
the index operation made by the databody flit, but at the
same cycle, the routing direction is removed from the RRT.
4.8 Bandwidth Space and ID-Slot Reservations
In the output port, there are two main components, i.e., an
arbiter unit and a crossbar multiplexor with MIM. The
IDM consists of an ID slot table, BW accumulator and ID
accumulator units. Fig. 8a shows how the IDM unit
functions to allocate a header flit to a new local ID slot
as its new ID tag. The ID tag of the header flit is 0 and it
requires to perform a communication rate of 80 MB/s.
First, when a header flit type is detected, a free ID slot is
looked for. As shown in the Fig. 8a, it looks that new ID
slot (IDN) 3 is found free, and then it is used as the new
ID tag for the header flit. At the same cycle, the previous
ID tag of the header and from which port the header flit
comes is written in the slot number 3. The select signal set
by the arbiter unit will determine from which port the
header flit comes. The BW accumulator unit increments
the actual reserved BW spaces (The increment is equal to
SAMMAN ET AL.: RUNTIME CONTENTION AND BANDWIDTH-AWARE ADAPTIVE ROUTING SELECTION STRATEGIES FOR NETWORKS-ON-... 1417
Fig. 6. Packet format for the CBWA and BWA adaptive routing selection
strategy.
Fig. 7. ID-based routing table reservation and assignment.
the required BW of the header). Meanwhile, the ID
accumulator unit increments also the reserved ID slot
from 2 to 3 (usedID( usedIDþ 1).
When a databody flit (ID-tag 0) belonging to the same
message with the previously header flows through the
MIM component as presented in Fig. 8b, the IDM unit will
check the current ID-tag of the databody flit and from
which port it comes. As shown in the Fig. 8b, the pair of
both signals (ID-tag 0, from L port) is detected in the ID slot
number 3; thus, the databody flit will also have ID-tag
number 3, which is the same as the header’s ID that is
previously switched (see Fig. 8a). When a tail flit flows
through the MIM component, the same mechanism takes
place like the operation made by the databody flit, but at the
same cycle, the BW and ID reservations will be reduced
from BW accumulator unit and removed from the ID slot
table, respectively.
5 EXPERIMENTAL RESULTS
In this section, the five NoC prototypes with different
adaptive routing selection strategies are simulated. The
adaptive routing for the five prototypes are minimal. It
means that messages will not be routed away from their
destination node. Thus, the message will have maximum
two alternative routing direction on intermediate nodes.
Two performance metrics are used to evaluate the NoCs,
i.e., the measurements on average bandwidth and tail flit
acceptance latency on each target node. We measure also
the injection and acceptance rate in every cycle on each
communication pair (both the source and target nodes). We
present also the distribution of the BW reservations for each
scenario to overview the hotspot locations in the network
during simulation. The performance measurements are
interesting in our NoC context because of the specification
of the packet format and the use of bandwidth-oriented
adaptive routing. Thus, transient and steady-state behaviors
of the NoC can be analyzed in detail.
5.1 Transpose Scenario in a 44 Mesh Network
Fig. 9 shows the tail flit acceptance (latency) measurement
in clock cycle under transpose scenario in 4 4 mesh
network, in which a source node located in ði; jÞ will send
packet to a target node located in ðj; iÞ where i 6¼ j. Hence,
there will be 12 communication pairs with 12 tail flit latency
measurement (Lk; k 2 f1; 2; . . . ; 12g). In the simulation, we
measure the number of clock cycle to receive the tail flits at
each target node. The average latency is then formulated as
1
12

12
k¼1Lk.
The measurements are made for eight different injection
rates, i.e., 1
16
, 1
8
, 1
6
, 1
5
, 1
4
, 1
3
, 1
2
, and 1 flit per cycle (fpc). The 1
S
fpc means that one flit is injected to each source node in
every S cycle. We inject 500 flits from each source node and
measured the NoC performance for each different data
injection rate. Thus, the latency of the 500th flit is measured
and presented in the figure. In this case, we set node ð1; 1Þ at
the south-west edge as node number 1, and node ð4; 4Þ at
the north-east edge as node number 16. Hence, node 1,
node 6, node 11, and node 16, whose node address (i ¼ j)
will give zero acceptance latency, because these nodes do
not send and receive messages.
As depicted in Fig. 9, in the case where the NoC is not
saturated (the injection rate is lower than the rate that can
make the NoC become saturated), the latency tends to
increase as the injection rate is decreased. But it tends to be
convergent to a certain value as the injection rate is higher
and starts making the NoC to be saturated. When the NoC
is saturated, the link-level flit flow control used in our NoC
will keep the continuity of data injection at the source node
and will dynamically follow the variable data rate
condition at the target node. This unique performance
characteristic is achieved due to the use of the specific
wormhole cut-through switching method [22]. By switch-
ing and routing packets flit-by-flit and interleaving them
each other at flit-level, then this unique performance
characteristic is obtained. From Fig. 9, we can also see that
the performance of the congestion-aware (FQ-version)
adaptive routing technique is lower compared to the other
adaptive routing strategies.
The average tail flit latency under transpose scenario in
4 4 mesh NoC for different workload sizes is shown in
Fig. 10. The initial injection time on every data producer
node for the three scenarios is set randomly. Fig. 10a shows
the simulation result when the injection rate is set to 1
8
or
0.125 fpc. This scenario represents a case where the network
is not saturated. For the saturating condition scenario, we
also analyze the average network latency by using injection
rate of 1
2
or 0.5 fpc as presented in Fig. 10b. In Fig. 10c, the
injection rates of 1
2
, 1
3
, 1
4
, and 1
5
are applied randomly to the
source nodes.
As shown in the three subfigures, the latency increases
linearly as the workload size is increased. The same results
are given for the different setting of the data injection rates.
This is again a unique performance characteristic of our
NoC that uses the novel wormhole cut-through switching
method [22].
The bandwidth space occupancy for every output port of
all network nodes are presented in Fig. 11. The simulation
1418 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013
Fig. 8. Local ID slot reservation.
Fig. 9. The tail flit acceptance measurements under transpose scenario
in a 4 4 mesh network.
result is obtained by randomly applying different injection
rates to the source nodes. The set of the injection rates are 1
2
,
1
3
, 1
4
, and 1
5
fpc. We can see that the distribution of the
bandwidth hotspots is variant in the scenario for the
different adaptive routing selection strategy.
Different initial injection times could give different
performance evaluation results because correct decisions
to make an optimal routing direction are strongly depen-
dent on the dynamic neighbor states of the FIFO buffer
occupancy, bandwidth space, and ID slots reservation of
the link at certain instant time as explained in Section 2.
The experimental result presented in this section is one of
many simulations that could be run to test the performance
of the five selected adaptive router prototypes. The
following section will describe another simulation result
in a larger network size with bit-complement data
distribution scenario.
6 SYNTHESIS RESULTS
The synthesis results of the five adaptive NoC routers with
different routing selection function are presented in Table 1.
The NoC routers are synthesized using 130-nm CMOS
standard-cell library from Faraday Technology. The target
data frequency for the five adaptive NoC router prototypes
is 1 GHz. The table presents the total logic cell area and the
estimated dynamic power (net switching and cell internal
power). We can see in the table that the BW-ID version of
the BWA adaptive routers has more logic cells area and
power than the other prototypes.
In Table 1, we can see also that the BW-version of the
BWA adaptive router has larger logic cell area than the FQ-
version. The area overhead is due to the overhead of the
bandwidth accumulator unit, which is integrated in each
crossbar multiplexor component of the router together with
the IDM unit. As presented in the table, the ID-version of
the adaptive NoC router has the least logic cell area
compared to the other adaptive NoC prototypes.
7 CONCLUSIONS
The CBWA adaptive NoC routers, which select the best
outgoing port at runtime based on the bandwidth occu-
pancy and the number of the free reservable ID slots, are
presented so far in this paper.
The awareness of the routing engine units to the number
of free bandwidth spaces at alternative outgoing ports is
aimed at avoiding congestion situations, in which the
bandwidth capacity of communication channels is over-
loaded. In any case, the BWA adaptive routing selection
strategy will help to balance the bandwidth utilization of
the total NoC bandwidth capacity provided by the overall
communication channels. The CBWA adaptive routing
method considers not only the bandwidth space occupancy
but also the number of messages contenting to acquires the
alternative output ports. Hence, the CBWA adaptive
routing method would theoretically make efforts to balance
the distribution of traffic on the NoC links.
The implementation of the BWA adaptive routing
selection strategy would be potentially used in hetero-
geneous NoC-based multiprocessor systems especially in a
SAMMAN ET AL.: RUNTIME CONTENTION AND BANDWIDTH-AWARE ADAPTIVE ROUTING SELECTION STRATEGIES FOR NETWORKS-ON-... 1419
Fig. 11. Bandwidth space reservation at each output port under
transpose scenario in 4 4 mesh NoC.
TABLE 1
Synthesis Results of the Adaptive Router Prototypes
Fig. 10. Average tail flit acceptance latency under transpose scenario in 4 4 mesh NoC.
case where several processing element cores may inject
data to the NoC with different injection rates. The
differences are due to the application requirements, or
maximum rate of tile processors and the task complexity
executed in each tile processor.
ACKNOWLEDGMENTS
In advance, the authors would like to thank reviewer’s
comments and critics to the paper’s content and presenta-
tion, and DAAD (Deutscher Akademischer Austausch-Dienst,
German Academic Exchange Service) awarding DAAD-
Scholarship to Faizal Arya Samman to pursue doctoral
degree at Technische Universita¨t Darmstadt in Germany.
The authors would also like to thank LOEWE-Zentrum
AdRIA in Fraunhofer Institute LBF Darmstadt for further
cooperation and for possible implementation of the concept
and the switch architecture to design adaptive multi-
processing systems for adaptronic systems within Project
AdRIA (Adaptronik-Research, Innovation, Application)
funded by Hessian Ministry of Science and Arts with grant
number IIIL4 - 518/14.004 (2008).
REFERENCES
[1] M.A. Al Faruque, T. Ebi, and J. Henkel, “Run-time Adaptive on-
Chip Communication Scheme,” Proc. IEEE/ACM Int’l Conf.
Computer-Aided Design (ICCAD ’07), pp. 26-31, 2007.
[2] G. Ascia, V. Catania, M. Palesi, and D. Patti, “Implementation and
Analysis of a New Selection Strategy for Adaptive Routing in
Networks-on-Chip,” IEEE Trans. Computers, vol. 57, no. 6, pp. 809-
820, June 2008.
[3] M. Coppola, M.D. Grammatikakis, R. Locatelli, G. Maruccia, and
L. Pieralisi, Design of Cost-Efficient Interconnect Processing Units:
Spidergon STNoC. CRC Press, 2009.
[4] W.J. Dally and C.L. Seitz, “Deadlock-Free Message Routing in
Multiprocessor Interconnection Networks,” IEEE Trans. Compu-
ters, vol. C-36, no. 5, pp. 547-553, May 1987.
[5] J. Duato, “A New Theory of Deadlock-Free Adaptive Routing in
Wormhole Networks,” IEEE Trans. Parallel and Distributed Systems,
vol. 4, no. 12, pp. 1320-1331, Dec. 1993.
[6] C.J. Glass and L.M. Ni, “The Turn Model for Adaptive
Routing,” Proc. 19th Int’l Symp. Computer Architecture, pp. 278-
287, 1992.
[7] J. Hu and R. Marculescu, “DyAD: Smart Routing for Networks-
on-Chip,” Proc. 41st Ann. Design Automation Conf. (DAC ’04),
pp. 260-263, 2004.
[8] M. Koibuchi, K. Anjo, Y. Yamada, A. Jouraku, and H. Amano,
“A Simple Data Transfer Technique Using Local Address for
Networks-on-Chip,” IEEE Trans. Parallel and Distributed Systems,
vol. 17, no. 12, pp. 1425-1437, Dec. 2006.
[9] M. Li, Q.-A. Zeng, and W.-B. Jone, “DyXY: A Proximity
Congestion-Aware Deadlock-Free Dynamic Routing Method for
Network on Chip,” Proc. 43rd Ann. Design Automation Conf.
(DAC ’06), pp. 849-852, 2006.
[10] S.-Y. Li, C.-H. Huang, C.-H. Chao, K.-H. Huang, and A.-Y. Wu,
“Traffic-Balanced Routing Algorithm for Irregular Mesh-Based
On-Chip Networks,” IEEE Trans. Computers, vol. 57, no. 9,
pp. 1156-1168, Sept. 2008.
[11] D.H. Linder and J.C. Harden, “An Adaptive and Fault Tolerant
Wormhole Routing Strategy for k-ary n-Cubes,” IEEE Trans.
Computers, vol. 40, no. 1, pp. 2-12, Jan. 1991.
[12] P. Lotfi-Kamran, M. Daneshtalab, C. Lucas, and Z. Navabi,
“BARP: A Dynamic Routing Protocol for Balanced Distribution of
Traffic in NoCs,” Proc. Conf. Design, Automation and Test in Europe
(DATE ’08), pp. 1408-1413, 2008.
[13] R. Marculescu, U.Y. Ogras, L.-S. Peh, N.E. Jerger, and Y. Hoskote,
“Outstanding Research Problems in NoC Design: System, Micro-
architecture, and Circuit Perspectives,” IEEE Trans. Computer-
Aided Design of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3-
21, Jan. 2009.
[14] A. Mejia, J. Flich, and J. Duato, “On the Potentials of Segment-
Based Routing for NoCs,” Proc. 37th Int’l Conf. Parallel Processing,
pp. 594-603, 2008.
[15] A. Mejia, M. Palesi, J. Flich, S. Kumar, P. Lo´pez, R. Holsmark, and
J. Duato, “Region-Based Routing: A Mechanism to Support
Efficient Routing Algorithms in NoCs,” IEEE Trans. Very Large
Scale Integration Systems, vol. 17, no. 3, pp. 356-369, Mar. 2009.
[16] S. Murali, D. Atienza, L. Benini, and G. De Micheli, “A Method for
Routing Packets across Multiple Paths in NoCs with In-Order
Delivery and Fault-Tolerance Guarantees,” J. Hindawi Pub. VLSI
Design, vol. 2007, pp. 1-11, 2007.
[17] E. Nilsson, M. Millberg, J. Oberg, and A. Jantsch, “Load
Distribution with the Proximity Congestion Awareness in a
Network on Chip,” Proc. Design and Test in Europe, Conf. and
Exhibition (DATE ’03), pp. 1126-1127, Mar. 2003.
[18] J.L. Nunez-Yanez, D. Edwards, and A.M. Coppola, “Adaptive
Routing Strategies for Fault-Tolerant On-Chip Networks in
Dynamically Reconfigurable Systems,” IET Computers and Digital
Techniques, vol. 2, no. 3, pp. 184-198, 2008.
[19] M. Palesi, R. Holsmark, S. Kumar, and V. Catania, “Application
Specific Routing Algorithms for Networks on Chip,” IEEE
Trans. Parallel and Distributed Systems, vol. 20, no. 3, pp. 316-330,
Mar. 2009.
[20] M. Palesi, G. Longo, S. Signorino, R. Holsmark, S. Kumar, and V.
Catania, “Design of Bandwidth Aware and Congestion Avoiding
Efficient Routing Algorithms for Networks-on-Chip Platforms,”
Proc. Second ACM/IEEE Int’l Symp. Networks-on-Chip (NOCS ’08),
pp. 97-106, 2008.
[21] F.A. Samman, T. Hollstein, and M. Glesner, “Adaptive and
Deadlock-Free Tree-Based Multicast Routing for Networks-on-
Chip,” IEEE Trans. Very Large Scale Integration Systems, vol. 18,
no. 7, pp. 1067-1080, July 2010.
[22] F.A. Samman, T. Hollstein, and M. Glesner, “Wormhole Cut-
through Switching: Flit-Level Messages Interleaving for Virtual-
Channelless Network-on-Chip,” Microprocessors and Microsystems,
vol. 35, no. 3, pp. 343-358, May 2011.
[23] M.K.-F. Scha¨fer, T. Hollstein, H. Zimmer, and M. Glesner,
“Deadlock-Free Routing and Component Placement for Irregular
Mesh-based Network-on-Chip,” Proc. IEEE/ACM Int’l Conf.
Computer-Aided Design (ICCAD ’05), pp. 238-245, 2005.
[24] D. Wu, B.M. Al-Hashimi, and M.T. Schmitz, “Improving Routing
Efficiency for Network-on-Chip through Contention-Aware Input
Selection,” Proc. Asia and South Pacific Design Automation Conf.
(ASP-DAC ’06), pp. 36-41, 2006.
[25] T.T. Ye, L. Benini, and G. De Micheli, “Packetization and Routing
Analysis of On-Chip Multiprocessor Networks,” J. System
Architecture, vol. 50, nos. 2/3, pp. 81-104, 2004.
Faizal Arya Samman received the bachelor of
engineering degree in electrical engineering
from Universitas Gadjah Mada, Yogyakarta, in
1999 and the master of engineering degree from
Institut Teknologi Bandung (Control and Com-
puter System Laboratory) in 2002 with Scholar-
ship Award from Indonesian Ministry of National
Education. In 2002, he was appointed as a
research and teaching staff at Universitas
Hasanuddin in Makassar, Indonesia. He re-
ceived the PhD degree in 2010 from Technische Universita¨t Darmstadt,
Germany with scholarship award (2006-2010) from Deutscher Akade-
mischer Austausch-Dienst (DAAD, German Academic Exchange
Service). He is currently working toward the postdoctoral research in
LOEWE-Zentrum AdRIA (Adaptronik-Research, Innovation, Application)
within the research cooperation framework between Technische
Universita¨t Darmstadt and Fraunhofer Institut LBF in Darmstadt. His
research interests include network on-chip (NoC) microarchitecture,
NoC-based multiprocessor system-on-chip, design and implementation
of analog and digital electronic circuits for control system applications on
FPGA/ASIC as well as energy harvesting systems and wireless sensor
networks. He is a member of the IEEE.
1420 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 24, NO. 7, JULY 2013
Thomas Hollstein received the graduation
degree in electrical engineering/computer en-
gineering from the Darmstadt University of
Technology in 1991. In 1992, he joined the
research group of the Microelectronic Systems
Lab at Darmstadt University of Technology. He
was involved in several research projects in
neural and fuzzy computing and industrial
VHDL-based design. In 1995, he focused his
research on hardware/software codesign and in
2000 he received the PhD degree on “Design and interactive Hardware/
Software Partitioning of complex heterogeneous Systems” at Darmstadt
University of Technology. Since 2000, he has been a senior researcher,
leading a research group focusing System-on-Chip communication
architectures, the design of reconfigurable HW/SW Systems-on-Chip
and integrated SoC test and debug methodologies. His current research
interests are in the fields of Networks-on-Chip, Hardware-/Software Co-
Design, Systems-on-Chip design, printable organic and inorganic
electronics, and RFID circuit and system design. Furthermore, he gives
lectures on VLSI design and CAD methods. Since 2001, he has been
member of a leader team initiating and establishing a new international
master programme in “Information and Communication Engineering” at
Darmstadt University of Technology. In 2010, he was appointed as a
professor at Tallinn University of Technology in the Department of
Computer Engineering, Dependable Embedded Systems Group. He is a
member of the IEEE.
Manfred Glesner received the diploma degree
and the PhD degree from Saarland University,
Saarbru¨cken, Germany, in 1969 and 1975,
respectively. His doctoral research was based
on the application of nonlinear optimization
techniques in computer-aided design of electro-
nic circuits. He received three Doctor Honoris
Causa degrees from Tallinn Technical Univer-
sity, Estonia (1996), Poly-technical University of
Bucharest, Romania (1997), and mongolian
Technical University, Ulan Bator (2006). Between 1969 and 1971, he
has researched work in radar signal development in Fraunhofer Institute
in Werthoven/Bonn, Germany. From 1975 to 1981, he was a lecturer in
the areas of electronics and CAD with Saarland University. In 1981, he
was appointed as an associate professor in electrical engineering with
the Darmstadt University of Technology, Germany, and in 1989 was
appointed as a full professor for microelectronic system design. His
current research interests include advanced design and CAD for micro-
and nanoelectronic circuits, reconfigurable computing systems and
architectures, organic circuit design, RFID design, mixed-signal circuit
design, and process variations robust circuit design. With the EU-based
TEMPUS initiative, he built up several microelectronic design centers in
Eastern Europe. Between 1990 and 2006, he acted as a speaker of two
DFG-funded graduate schools. He is a member of several technical
societies and he is active in organizing international conferences. Since
2003, he has been the vice-president of the German Information
Technology Society (ITS) in VDE and also a member of the DFG
decision board for electronic semiconductors, components, and inte-
grated systems. He was a recipient of the honor/decoration of “Palmes
Academiques” in the order of Chevalier by the French Minister of
National Education (Paris) for distinguished work in the field of education
in 2007/2008. He is a fellow of the IEEE.
. For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
SAMMAN ET AL.: RUNTIME CONTENTION AND BANDWIDTH-AWARE ADAPTIVE ROUTING SELECTION STRATEGIES FOR NETWORKS-ON-... 1421
