Designing Message-Dependent Deadlock Free Networks on Chips for Application-Specific Systems on Chips by Murali, Srinivasan et al.
Designing Message-Dependent Deadlock Free Networks on Chips for
Application-Specific Systems on Chips
Srinivasan Murali, Paolo Meloni§, Federico Angiolini‡, David Atienza†+, Salvatore Carta¶,
Luca Benini‡, Giovanni De Micheli†, Luigi Raffo§
CSL, Stanford University, Stanford, USA, smurali@stanford.edu
§DIEE, University of Cagliari, Cagliari, Italy, {paolo.meloni@diee.unica.it, luigi@diee.unica.it}
‡DEIS, Univerity of Bologna, Bologna, Italy, {fangiolini@deis.unibo.it, lbenini@deis.unibo.it}
¶DMI, University of Cagliari, Cagliari, Italy, salvatore@unica.it
† LSI, EPFL, Lausanne, Switzerland,{david.atienza,giovanni.demicheli@epfl.ch}
+DACYA, Complutense University of Madrid (UCM), Madrid, Spain.
Abstract
Networks on Chip (NoC) has emerged as the paradigm
for designing scalable communication architecture for Sys-
tems on Chips (SoCs). Avoiding the conditions that can lead
to deadlocks in the network is critical for using NoCs in
real designs. Methods that can lead to deadlock-free op-
eration with minimum power and area overhead are im-
portant for designing application-specific NoCs. A major
class of deadlocks that occur in NoCs are due to the de-
pendencies among the resources shared by different mes-
sage types. In this work, we consider the problem of avoid-
ing message-dependent deadlocks during the NoC topology
synthesis phase. We show that by considering this issue dur-
ing topology synthesis, we can obtain a significantly better
NoC design than traditional methods, where the deadlock
avoidance issue is dealt with separately. Our experiments
on several SoC benchmarks show that our proposed scheme
provides large reduction in NoC power consumption (an av-
erage of 38.5%) and NoC area (an average of 30.7%) when
compared to traditional approaches.
Keywords: Networks on Chips, Systems on Chips,
Message-dependent deadlocks, routing-dependent dead-
locks, topology, synthesis.
1 Introduction
Today’s Systems on Chips (SoCs) consist of a large num-
ber of computing and storage cores that are interconnected
by means of single or multiple layers of buses In order
to cope with the large communication demands of such
SoCs, a modular, scalable interconnect based on Networks
on Chips (NoCs) is needed [1]-[6].
Designing a custom-tailored interconnect that satisfies
the performance and design constraints of the SoC is impor-
tant to achieve efficient NoC designs [27]-[32]. A critical,
but often neglected issue when designing NoCs is that they
have to guarantee deadlock-free operation. If the NoC has
no support to either avoid or recover from deadlocks, then
correct functionality of the system cannot be guaranteed.
This can lead to system crashes and unexpected system be-
havior, which is clearly unacceptable for SoCs. Designing
efficient methods that avoid such a situation with minimum
power and area overhead is an important research area in
the NoC domain.
The deadlocks that can occur in NoCs can be broadly
categorized into two classes: routing-dependent deadlocks
and message-dependent deadlocks [33], [7]-[12]. Routing-
dependent deadlocks occur when there is a cyclic depen-
dency of resources created by the packets on the various
paths in the network. For regular topologies (such as the
mesh, torus), the use of restricted routing functions based on
turn models is an effective way to avoid routing-dependent
deadlocks [9], [10]. For custom application-specific NoCs,
obtaining deadlock-free paths is a bigger challenge. In fact,
recently there have been several works that have addressed
deadlock-free path selection mechanisms for custom NoC
designs [12], [22], [32]. While there are several works
that target freedom from routing-dependent deadlocks in
NoCs, relatively few works exist on obtaining freedom from
message-dependent deadlocks. The major focus of this pa-
per is to address this important issue of obtaining message-
dependent deadlock-free network operation.
Message-dependent deadlocks occur when interactions
and dependencies are created between different message
types at network endpoints, when they share resources in
the network. Even when the underlying network is designed
to be free from routing-dependent deadlocks, the message-
level deadlocks can block the network indefinitely, thereby
affecting the proper system operation. An example situation
where a message-dependent deadlock occurs is presented in
Figure 1(a). In this example, two of the cores are masters
and two other cores are slaves. In this system, we assume
two kinds of messages: request and response. Consider
the following situation: Master 1 sends a request to Slave
1 (Req 1), Slave 1 is replying to a previously issued request
to Master 1 (Resp 1) and at the same time, Slave 2 sends
a response to Master 2 (Resp 2). When requests and re-
sponses share the same links, Resp 2 is waiting for link 1
which is used by Req 1 and Resp 1 waits for link 4 used by
3-901882-19-7 2006 IFIP 158
Switch 1 Switch 4
Slave 2Master 1
Req 1
Resp 1
Resp 2
Link 3
Slave 1
Switch 3
Link 2
Switch 2
Master 2
Link 1
Link 4
(a) Example of deadlock
Master 2
Master 1 Slave 2
Switch 4Switch 1
other for response
one for request, 
2 Vitual channels
Slave 1
Switch 3Switch 2
(b) Logically separate networks
Network
Master 2
Request Network
Slave 1
Master 1 Slave 2
Response
(c) Physically separate networks
Resp 2. Meanwhile, Req1 is waiting for Slave 1, the oper-
ation of which has been stalled as Resp 1 could not com-
plete. Thus, none of the messages can move ahead, leading
to a deadlock situation. An interesting point to note here is
that message-level deadlocks can be avoided if the receivers
have infinitely large buffering or if they have perfectly ideal
operation (consuming all received data instantly), which
would avoid queuing of the packets in the network. Ob-
viously, such a solution is not feasible to obtain in practice.
In traditional multi-processor interconnection networks,
the most common ways to avoid message-dependent dead-
locks are the use of separate logical or physical networks for
the different message types [13]-[21]. This would ensure
that the different message types do not share the network
components, thereby guaranteeing freedom from message-
dependent deadlocks. The most common method to achieve
separate logical networks is the of use of separate virtual
channels for the different message types [13]. For the ex-
ample design presented in Figure 1(a), each router input will
need two virtual channels: one for the request messages and
the other for the response messages (refer Figure 1(b)). This
separation of message types is maintained at all the switches
in the network. In the case of separate physical networks,
the request network is built separately from the response
network, an example of which is shown in Figure 1(c). This
is the most commonly used solution in complex bus designs
such as the STBus from STMicroelectronics [19] and sev-
eral multi-processor designs [20], [21].
In this work, we show that by mapping the different
message types onto different network resources during the
topology mapping and synthesis phase, we can achieve
much better NoC designs (in terms of power consumption
and network area) than traditional approaches. We present a
topology synthesis algorithm that specifically considers the
message types and ensures the creation of a network that
is free from message-dependent deadlocks. We also im-
plement the common methods of deadlock avoidance: hav-
ing separate virtual channels and having physically separate
networks for the message types. For all the schemes, we
make the underlying network operation free from routing-
dependent deadlocks by applying existing methods from
[12]. We perform experiments on several SoC designs,
which show that our proposed scheme provides large reduc-
tion in the NoC power consumption (an average of 38.5%)
and area (an average of 30.7%) when compared to the tra-
ditional approaches.
2 Previous Work
The motivation for the use of NoCs has been established
in several works [1]-[6]. The use of turn models to avoid
deadlocks in mesh and torus networks has been presented
in [10]. There has been a large body of work that have fo-
cused on developing routing-dependent deadlock-free oper-
ation for interconnection networks [9]-[12]. Several other
works exist in the area of recovering from deadlocks in net-
works [7], [8].
The design of application specific NoCs has been ex-
plored in several works [24]-[30]. None of these works ad-
dress the issue of message-level deadlock avoidance, which
is critical for proper system operation. Avoiding routing-
dependent deadlocks for mesh topologies has been consid-
ered in [24]. Avoiding routing deadlocks for custom NoC
topologies have been presented in [22], [31], [32].
The use of logically separated networks to avoid
message-dependent deadlocks has been utilized in sev-
eral industrial multi-processors, such as [14]-[17]. The
use of physically separated networks to remove message-
dependent deadlocks is used in many designs, such as [20],
[21]. In [5], message-level deadlock freedom is achieved
by a different mechanism than using logically or physi-
cally separated networks. It utilizes an end-to-end flow
control scheme, which ensures that messages are sent from
the sender only when the receiver has enough buffering re-
sources to store them. This is coupled together with a net-
work design that uses time division multiplexing to divide
the network resources among the various communicating
elements, providing guaranteed throughput to connections.
This leads to buffering free network for such connections
and removal from message-level deadlocks. The deadlock
avoidance mechanism using their protocol is presented in
[23]. As we target general NoC designs that need not sup-
port such end-to-end flow control mechanisms, we do not
compare such a scheme with our method presented here.
3 Topology Synthesis with Message-Level
Deadlock Freedom
We implement our message-dependent deadlock-free
path selection routine as a plug-in to an established NoC
topology synthesis flow. We assume that the application
kernels are parallelized and mapped onto different proces-
sors and hardware cores using existing tools, as done in
159
sustained traffic
rates
ory Filter
IFFT
Disp
lay
ARMMem
FFT
100
100
100
100100 100
200
10
Figure 1. Exam-
ple filter appli-
cation
critical stream
weighted
by 10
100
200
100
100
v6
v2
100100
100v4
v5
v3
v1
100
Figure 2. Core
graph with sus-
tained rates and
critical streams
Partition 2(p1)
(p2)
(p3)
Partition 3 Response Message
Request Message
Partition 1
100
100
v5
v2v1 200
100
100
v3
v6
v4
100 100
100
(a) Min-cut partitions
0.63
p3
p2
p1
0.63
0.70
0.70
0.63
0.63
(b) Path selection
Figure 3. Algorithm Examples
earlier works [24]-[32]. The communication traffic flow
between the various cores is represented by a core graph,
which is taken as the input to the topology synthesis flow.
The core graph for a small filter example (Figure 1) is shown
in Figure 2. The edges of the core graph are annotated with
the sustained rate of traffic flow, multiplied by the criticality
level of the flow, as done in [26].
Before presenting the message-dependent deadlock-free
path selection routine, we first present the basic topology
synthesis flow. In the topology synthesis procedure (Algo-
rithm 1), we synthesize several topologies: starting from
a topology where all the cores are connected to a single
switch to a topology where each core is connected to a sep-
arate switch. For the chosen switch count, the input core
graph is partitioned into those many min-cut partitions (re-
fer to step 2 of Algorithm 1). At this point, the communica-
tion traffic flows within a partition has been resolved.
Now, we integrate in the main flow the core contribution
of this work (in step 4 of Algorithm 1), i.e. an algorithm
(PATH COMPUTE) that maps the communication flows to
physical paths while guaranteeing deadlock freedom. This
algorithm is explained in detail in the following paragraphs.
Once the paths for a topology are selected, Algorithm 1 re-
sumes, where the design area, power consumption and wire-
length for the topologies are obtained. Then, the topology
that best optimizes the user objectives and satisfies all the
design constraints is chosen. The topology synthesis flow,
without considering freedom from message level deadlocks,
has been presented by us in detail in [32].
Now, we explain the path selection mechanism (Al-
gorithm 2) that guarantees message-dependent deadlock-
free operation of the NoC. In the first step of the
PATH COMPUTE algorithm, the flows are ordered in de-
creasing rate requirements, such that bigger flows are as-
signed first. The heuristic of assigning bigger flows first
has been shown to provide better results (such as lower
power consumption and more easily satisfying bandwidth
constraints) in several earlier works [25], [31]. Then, for
each flow in order, we first evaluate the message type of
the flow (step 2 of Algorithm 2). The message types can
either be fed explicitly by the user, or can be implicitly
considered by the tool. As an example for implicitly con-
sidering the type, in shared memory systems, all the traf-
fic flows that originate from processors and terminate into
memory devices are of request type. While those that origi-
nate from the memories and terminate in the processors are
Algorithm 1 Topology Design Algorithm
1: Vary the number of switches in the design from 1 to the
total number of cores in the design. Repeat steps 2 to 7
for each switch count.
2: For the chosen switch count, find that many min-cut
partitions of the communication graph. Cores in each
partition are attached to the same switch.
3: Check for bandwidth constraint violations when estab-
lishing the switches. The bandwidth of each link is the
product of the NoC operating frequency and link width,
which are inputs to the flow.
4: Find the connectivity between the switches using the
function PATH COMPUTE (presented in Algorithm 2).
5: Evaluate the switch power consumption and average
hop-delay based on the selected paths.
6: Perform floorplan of the design. Obtain design area,
wire-lengths. Check for timing violations on the wires
and evaluate the power consumption on wires.
7: If solution minimizes objective, satisfies all constraints,
note the design point and the topology.
8: Choose the best topology and design point based on the
user objectives.
of response type. Note that in shared memory systems, all
inter-processor communication occur through the memory
devices. Note that, if the connection between any pair of
cores constitutes multiple message types, then each mes-
sage type needs to be treated as a separate traffic flow.
Next, we evaluate the amount of power that will be dis-
sipated across each of the switches, if the traffic for the cho-
sen flow uses that switch (steps 3-5 of Algorithm 2). This
power dissipation value on each switch depends on the size
of the switch, the amount of traffic already routed on the
switch and the frequency of operation. It also depends on
how the switch is reached (from which other switch) and
whether an already existing physical channel will be used
to reach the switch or a new physical channel will have to
be opened. The last information is needed, because open-
ing a new physical channel increases the switch sizes and
hence the power consumption of this flow and others that
are routed through the switch.
In our NoC architecture, we permit the instantiation of
multiple physical links between any two switches. When
finding whether a switch is reachable from another switch
for the current traffic flow, we evaluate whether any physical
160
links between the switches have already been established. If
so, we see the message type of the traffic flows that have al-
ready been routed on the links. From the set of established
links, we choose a link that supports the same message type
as the current traffic flow and has enough bandwidth avail-
able to support the current flow. If no such link is available
between the switches, we evaluate the cost of opening up a
physical link for the current traffic flow.
The process of evaluating the power consumption for the
current traffic flow is repeated for all pairs of switches. Fi-
nally (in step 6 of Algorithm 2), the set of links from the
source to destination of the flow that has the least power
consumption is chosen. When choosing the paths, routing-
dependent deadlocks are also avoided by applying an exist-
ing method from [12]. Now physical connections are actu-
ally established on the chosen path and the message type of
the current flow is assigned to the links that have been used
for the flow.
Algorithm 2 PATH COMPUTE
1: For each traffic flow in decreasing order of the band-
width requirements, perform steps 2 to 6.
2: Find the message type supported by the chosen traffic
flow.
3: For i1 from 1 to number of switches in the current de-
sign and j1 from 1 to number of switches in the current
design, repeat steps 4 and 5.
4: If one or more physical links exists between the
switches i1 and j1, evaluate whether any link exists that
has already been supporting the current message type
& has bandwidth to support the current flow. If so, find
the marginal power consumption to re-use this existing
link.
5: Else find the marginal power consumption for opening
and using the link for this traffic flow.
6: Find the least cost path (path with least power con-
sumption) across the switches. For any links that were
newly established for this traffic flow, associate the
message type of this flow to the links. When select-
ing paths, choose only those paths that have turns not
prohibited for removing routing-dependent deadlocks
(based on the method from [12]).
7: Return the chosen paths, new switch sizes, connectivity
between switches and the type of message supported by
each of the links.
Example 1 Let us consider the example from Figure 3(a). The
input core graph has been partitioned into 4 partitions. We assume
2 different message types: request and response for the various
traffic flows. Each partition pi corresponds to the cores attached
to the same switch. Let us consider routing the flow with a band-
width value of 100 MB/S between the vertices v1 and v2, across
the partitions p1 and p2. The traffic flow is of the message type
request. Initially no physical paths have been established across
any of the switches. If we have to route the flow across a link be-
tween any two switches, we have to first establish the link. The
cost of routing the flow across any pair of switches is obtained.
We annotate the edges between the switches by the cost (marginal
increase in power consumption) of sending the traffic flow through
the switches (Figure 3(b)). The cost on the edges from p2 are dif-
ferent from the others due to the difference in initial traffic rates
within p2 when compared to the other switches. This is because,
the switch p2 has to support flows between the vertices v2 and
v3 within the partition. The least cost path for the flow, which is
across switches p1 and p2 is chosen. Now we have actually es-
tablished a physical path and a link between these switches. We
associate the message type request for this particular link. This
is considered when routing the other flows and only those traffic
flows that are of request type can use this particular physical link.
We also note the size and switching activity of these switches that
have changed due to the routing of the current flow.
4 Experimental Results
In this section, we present detailed experimental stud-
ies of our approach (which we further refer to as INT-TOP
meaning message-dependent deadlock avoidance integrated
with topology synthesis process) and compare it with tradi-
tional approaches:
(1) Using logically separate networks (L-SEP): In this
scheme, we use separate buffers at each input, with as many
buffers as the different message types, modeling the vir-
tual channel based approach to remove message-dependent
deadlocks.
(2) Using physically separate networks (P-SEP): In this
scheme, we design physically different networks for each
message type. For both these schemes we apply our topol-
ogy synthesis procedure to obtain the network topologies.
(3) With a design that has no support to avoid message-
dependent deadlocks (ORIG). Note that this base system
cannot be employed in SoCs, as it cannot guarantee proper
system operation. We present the experimental results for
this scheme to only evaluate the overhead incurred in the
other schemes to support deadlock-free operation.
4.1 Experimental Platform and Models
We utilize the ×pipes [36] NoC architecture to imple-
ment the synthesized topologies. We built accurate analyti-
cal models for the power consumption and area of the net-
work components. To get the power estimates, the place-
ment and routing of the components is performed using
Cadence SoC Encounter [35] and accurate wire capaci-
tances and resistances are obtained, as back-annotated in-
formation from the layout, with a 0.13µ technology library.
The switching activity in the network components is varied
by injecting functional traffic. The capacitance, resistance
and the switching activity report are combined to estimate
power consumption using Synposys PrimePower [34].
4.2 Comparison on SoC designs
We apply the deadlock prevention methods to five dif-
ferent SoC designs: Multi-media system (MULT 30 cores),
IMage Processing application (IMP-27 cores), Video PRO-
Cessor (VPROC-42 cores), MPEG4 decoder (12 cores) and
Video Object Plane Decoder (VOPD-12 cores). The com-
munication characteristics of some of these benchmarks is
presented in [37]. There are two types of messages that
are supported in each design: request and response. Each
design consists of almost equal number of request and re-
sponse traffic flows. This is because, every processor core
communicates through the memory core, necessitating two-
way communication (hence a request and response traffic
flow) between the processors and memories. To make a fair
comparison of the different schemes, we use the same syn-
thesis approach and design constraints for synthesizing the
topologies.
161
400 MB/S
PM11
Proc11
PM1PM0
Proc0 Proc1
400 MB/S
20 MB/S 20 MB/S
SHM SMM INT
. . .
. . .
(a) Core graph
PM 2
SMM
Proc 0
PM 0
Proc 5
PM 5
Proc 10
PM 10
Proc 1
PM 1
PM 4
Proc 3
Proc 4
PM 6
PM 7
Proc 6
Proc 7
PM 11
Proc 11
PM 3
PM 9
Proc 9
Proc 8
PM 8
SHM
INT
Proc 2
(b) Designed topology
Figure 4. Core graph and designed topology for IMP
MULT IMP VPROCMPEG VOPD
0
50
100
150
200
250
N
et
w
or
k 
Po
w
er
 (m
W
)
 
 
ORIG
L−SEP
P−SEP
INT−TOP
Figure 5. Power consump-
tion of different schemes
The communication pattern (core graph) for one of the
applications (IMP) and the best synthesized topology for
our proposed scheme (INT-TOP) are presented in Figures
4(a) and 4(b). The design consists of 12 processors (Proc
0 to Proc 11), a private memory for each processor (PM 0
to PM 11), a shared memory (SHM), a semaphore memory
(SMM) and an interrupt device (INT). In the application, all
communication from the processors are of request message
type and communication to the processors are of response
message type. In Figure 4(b), those links that support re-
quest message type are in bold and those links that support
response message type are dashed.
The network power consumption, based on the func-
tional traffic for the various designs using the different
schemes is presented in Figure 5. As seen from this figure,
the INT-TOP scheme presented in this work, outperforms
the two conventional message-dependent deadlock avoid-
ance schemes: L-SEP and P-SEP. Our proposed scheme
leads to an average of 38.5% reduction in NoC power
consumption when compared to the state-of-the-art dead-
lock avoidance schemes. When compared to our INT-TOP
scheme, the L-SEP scheme requires large buffering require-
ments, as each virtual channel needs separate buffering re-
sources. The P-SEP scheme requires more switches than
the INT-TOP scheme, as the request and response mes-
sages utilize different networks. Interestingly, our proposed
scheme incurs only a 2.5% increase in power consumption
when compared to the ORIG scheme, where no message-
dependent deadlock avoidance support is provided. This
is mostly due to the efficient allocation of links to the dif-
ferent message types by our topology synthesis procedure.
The switch area for the different schemes for the SoC de-
signs, normalized with respect to the area of the base sys-
tem (ORIG) is presented in Figure 6. The proposed method
results in an average of 30.68% reduction in area when com-
pared to the state-of-the-art schemes.
4.3 Effect of Different Number of Message Types
In this sub-section, we examine the power consumption
of the proposed scheme, when the number of different mes-
sage types is varied. The number of message types in a sys-
tem depends on the underlying computation architecture.
Cache coherent systems typically support several different
message types. As an example, the S-1 multi-processor sup-
ports 4 different message types [18] and each type must be
mapped onto different resources in the network. In [17],
a more sophisticated protocol is used, which leads to seven
different message types. To see the impact on the number of
different message types, we created a synthetic benchmark
having the traffic characteristics of the VPROC design. In
this benchmark, around 80 different traffic flows exist, each
one representing a message. We fixed the number of mes-
sages and varied the number of message types in the design
from 1 to 7. The network power consumption for our pro-
posed scheme, for the different number of message types is
presented in Figure 7. This figure shows that our proposed
scheme results in efficient designs, even for a large number
of message types. Moreover, the rise in power consump-
tion with an increasing number of message types saturates
(designs with 6 and 7 message types have nearly the same
power consumption), as most messages are already mapped
onto unique links in the network.
4.4 Frequency Trade-offs
The algorithm presented here can also be used to per-
form frequency selection for a certain design. In this case,
the frequency of operation of the NoC can be varied and the
best topology can be synthesized for each frequency point.
A higher operating frequency results in links having more
bandwidth. Thus a smaller NoC can satisfy the design con-
straints. A trade-off curve for frequency vs power consump-
tion of the network for the VPROC is presented in Figure 8.
From such a curve, the most power-efficient operating fre-
quency can be chosen for the design.
5 Conclusions
For Networks on Chips (NoCs) to be used in indus-
trial designs, NoCs should guarantee proper system oper-
ation under all conditions. Achieving deadlock-free opera-
tion of the network with minimum power consumption and
area overhead is critical for application-specific NoCs. In
this work, we have focused on addressing the major is-
sue of avoiding message-dependent deadlocks during the
network operation. We have shown that by mapping the
162
ORIG L−SEP P−SEP INT−TOP
0
0.5
1
1.5
N
or
m
al
iz
ed
 S
w
itc
h 
A
re
a
Figure 6. Normalized switch
area for the different schemes
1 2 3 4 5 6 70
50
100
150
Number of Message Types
N
et
w
or
k 
Po
w
er
  (
m
W
)
Figure 7. Effect of number of
message types
400 500 600 700 8000
50
100
150
Network Frequency (MHz)
N
et
w
or
k 
Po
w
er
  (
m
W
)
Figure 8. Power consumption
variation with frequency
different message types onto different network resources
during the topology mapping and synthesis phase, we can
achieve large reductions in network power consumption
and network area when compared to the state-of-the-art ap-
proaches. In future work, we plan to compare deadlock re-
covery schemes with the proposed scheme for NoCs.
6 Acknowledgments
This work is supported by the US National Science
Foundation (NSF, contract CCR-0305718) for Stanford
University. It is also supported by the Swiss National
Science Foundation (FNS, Grant 20021-109450/1) and the
Spanish Government Research Grant TIN2005-5619. The
work is also supported by a grant from Semiconductor Re-
search Corporation (SRC project number 1188) and a grant
by STMicroelectronics for DEIS.
References
[1] L. Benini and G.De Micheli, “Networks on Chips: A New SoC Paradigm”, IEEE
Computers, pp. 70-78, Jan. 2002.
[2] M. Sgroi et al., “Addressing the System-on-a-Chip Interconnect Woes Through
Communication-Based Design”, Proc. DAC, pp. 667-672, June 2001.
[3] S. Kumar et al., “A Network on Chip Architecture and Design Methodology”,
Proc. ISVLSI, pp. 117-122, April 2002
[4] P. Guerrier, A. Greiner, “A generic architecture for on-chip packet-switched in-
terconnections”, Proc. DATE, pp. 250-256, March 2000.
[5] K. Goossens et al., “The Aethereal network on chip: Concepts, architectures, and
implementations”, IEEE Design and Test of Computers, Vol. 22(5), pp. 21-31,
Sept-Oct 2005.
[6] W. Dally, B. Towles, “Route Packets, not Wires: On-Chip Interconnection Net-
works”, Proc. DAC, pp. 684-689, June 2001.
[7] Y. H. Song, T. M. Pinkston, “A Progressive Approach to Handling Message-
Dependent Deadlock in Parallel Computer Systems”, IEEE TPDS, Vol. 14(3),
pp. 259-275, March 2003.
[8] Y. Choi, “Deadlock Recovery Based Router Architectures for High Performance
Networks”, PhD Dissertation, University of Southern California, June 2001.
[9] G. Chiu, “The Odd-Even Turn Model for Adaptive Routing”, IEEE TPDS, Vol.
11(7), pp. 729-738, July 2000.
[10] C. Glass, L. Ni, “The turn model for adaptive routing”, Proc. ISCA, pp. 278-
287, 1992.
[11] J. Duato, “A New Theory of Deadlock-Free Adaptive Routing in Wormhole
Networks”, IEEE TPDS, Vol. 8(8), pp. 790-802, Aug 1997.
[12] D. Starobinksi et al., “Application of network calculus to general topologies
using turn-prohibition”, IEEE/ACM Transactions on Networking, Vol. 11, Issue
3, pp. 411-421, June 2003.
[13] W. J. Dally, H. Aoki, “Deadlock-Free Adaptive Routing in Multi-computer Net-
works Using Virtual Channels”, IEEE TPDS, Vol. 4(4), pp. 466-475, April 1993.
[14] S. Scott, G. Thorson, Optimized Routing in the Cray T3D”, Proc. Workshop
Parallel Computer Routing and Comm., pp. 281-294, May 1994.
[15] S. Scott, G. Thorson, “The Cray T3E Network: Adaptive Routing in a High
Performance 3D Torus”, Proc. Symp. Hot Interconnects IV, pp. 147-156, Aug.
1996.
[16] J. Carbonaro, Cavallino, “The Teraflops Router and NIC”, Proc. Symp. Hot
Interconnects IV, pp. 157-160, Aug. 1996.
[17] S.S. Mukherjee et al., “The Alpha 21364 Network Architecture”, Proc. Symp.
HOT Interconnects 9, pp. 113-117, Aug. 2001.
[18] L. Widdoes, S. Correll, :The S-1 Project: Developing High Performance Com-
puters”, Proc. COMPCON, pp. 282-291, Spring 1980.
[19] “http://www.st.com/stonline/prodpres/dedicate/soc/cores/stbus.htm”.
[20] J. Laudon, D. Lenoski,” The SGI Origin: A ccNUMA Highly Scalable Server”,
Proc. ISCA, pp. 241-251, June 1997.
[21] D. Lenoski et al., “ The Directory-Based Cache Coherence Protocol for the
DASH Multiprocessor”, Proc. ISCA, pp. 148-159, 1990.
[22] A. Hansson, K. Goossens, A. Radulescu, “UMARS: A Unified Approach
to Mapping and Routing on a Combined Guaranteed Service and Best-Effort
Network-on-Chip Architecture”, Technical Report 2005/00340, Philips Re-
search, April 2005.
[23] B. Gebremichael et al., “Deadlock Prevention in the Aethereal Protocol”, Proc.
Working Conference on Correct Hardware Design and Verification Methods
(CHARME), Oct 2005.
[24] J. Hu, R. Marculescu, ’Exploiting the Routing Flexibility for En-
ergy/Performance Aware Mapping of Regular NoC Architectures’, Proc. DATE,
March 2003.
[25] S. Murali, G. De Micheli, “SUNMAP: A Tool for Automatic Topology Selec-
tion and Generation for NoCs”, Proc. DAC 2004.
[26] S. Murali et al., “Mapping and Physical Planning of Networks on Chip Archi-
tectures with Quality-of-Service Guarantees”, Proc. ASPDAC 2005.
[27] A.Pinto et al., “Efficient Synthesis of Networks on Chip”, ICCD 2003, pp. 146-
150, Oct 2003.
[28] W.H.Ho, T.M.Pinkston, “A Methodology for Designing Efficient On-Chip In-
terconnects on Well-Behaved Communication Patterns”, HPCA 2003, pp. 377-
388, Feb 2003.
[29] T. Ahonen et al. ”Topology Optimization for Application Specific Networks on
Chip”, Proc. SLIP 04.
[30] K. Srinivasan et al., “An Automated Technique for Topology and Route Gener-
ation of Application Specific On-Chip Interconnection Networks”, Proc. ICCAD
’05.
[31] A. Hansson et al., “A unified approach to constrained mapping and routing on
network-on-chip architectures”, pp. 75-80, Proc. ISSS 2005.
[32] S. Murali et al., “Designing Application-Specific Networks on Chips using
Floorplan Information”, to appear in ICCAD 2006.
[33] W. J. Dally, B. Towles, ”Principles and Practices of Interconnection Networks”,
Morgan Kaufmann , Dec 2003.
[34] “http://www.synopsys.com/products/power/primepower ds.pdf”.
[35] “http://www.cadence.com/products/digital ic/soc encounter/index.aspx”.
[36] S. Stergiou et al., “×pipesLite: a Synthesis Oriented Design Library for Net-
works on Chips”, pp. 1188-1193, Proc. DATE 2005.
[37] D. Bertozzi et al., ”NoC Synthesis Flow for Customized Domain Specific
Multi-Processor Systems-on-Chip”, IEEE Transactions on Parallel and Dis-
tributed Systems, Feb 2005.
163
