Generalized Fault-Tolerance Topology Generation for Application Specific
  Network-on-Chips by Chen, Song et al.
ar
X
iv
:1
90
8.
00
16
5v
1 
 [c
s.A
R]
  1
 A
ug
 20
19
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 1
Generalized Fault-Tolerance Topology Generation for
Application Specific Network-on-Chips
Song Chen, Member, IEEE, Mengke Ge, Zhigang Li, Jinglei Huang, Qi Xu, and Feng Wu, Fellow, IEEE
Abstract—The Network-on-Chips based communication archi-
tecture is a promising candidate for addressing communication
bottlenecks in many-core processors and neural network proces-
sors. In this work, we consider the generalized fault-tolerance
topology generation problem, where the link (physical channel)
or switch failures can happen, for application-specific network-on-
chips (ASNoC). With a user-defined maximum number of faults,
K, we propose an integer linear programming (ILP) based method
to generate ASNoC topologies, which can tolerate at most K
faults in switches or links. Given the communication requirements
between cores and their floorplan, we first propose a convex-
cost flow based method to solve a core mapping problem for
building connections between the cores and switches. Second,
an ILP based method is proposed to solve the routing path
allocation problem, where K+1 switch-disjoint routing paths are
allocated for every communication flow between the cores. Finally,
to reduce switch sizes, we propose sharing the switch ports for the
connections between the cores and switches and formulate the
port sharing problem as a clique-partitioning problem, which is
solved by iteratively finding the maximum cliques. Additionally,
we propose an ILP-based method to simultaneously solve the core
mapping and routing path allocation problems when only physical
link failures are considered. Experimental results show that the
power consumptions of fault-tolerance topologies increase almost
linearly with K because of the routing path redundancy for fault
tolerance. When both switch faults and link faults are considered,
port sharing can reduce the average power consumption of fault-
tolerance topologies with K = 1, K = 2 and K = 3 by 18.08%,
28.88%, and 34.20%, respectively. When considering only the
physical link faults, the experimental results show that compared
to the FTTG (fault-tolerant topology generation) algorithm, the
proposed method reduces power consumption and hop count by
10.58% and 6.25%, respectively; compared to the DBG (de Bruijn
Digraph) based method, the proposed method reduces power
consumption and hop count by 21.72% and 9.35%, respectively.
Index Terms—Network-on-Chip, Fault Tolerance, Path Alloca-
tion, Application-Specific Network-on-Chips
I. INTRODUCTION
With the constant scaling of semiconductor manufacturing
technologies, hundreds to thousands of processing cores can
This work was partially supported by the National Natural Science Foun-
dation of China (NSFC) under grant Nos. 61874102 and 61732020, Beijng
Municipal Science & Technology Program under Grant Z181100008918013,
and the Fundamental Research Funds for the Central Universities under grant
No. WK2100000005. The authors would like to thank Information Science
Laboratory Center of USTC for the hardware & software services.
S. Chen, M. Ge, Z. Li, and F. Wu are with the School of Microelectronics,
University of Science and Technology of China (USTC), China; (Email:
songch@ustc.edu.cn). S. Chen and F. Wu are also with USTC Beijing Research
Institute, Beijing, China
J. Huang is with State Key Laboratory of Air Traffic Management System
and Technology, China (email:huangjl@mail.ustc.edu.cn).
Q. Xu is with the School of Electronic Science and Applied Physics, Hefei
University of Technology, China (email: xuqi@hfut.edu.cn).
be easily integrated on a single chip [1]. Network-on-Chips
(NoCs) have emerged as an attractive solution to the intercon-
nection challenges of heterogeneous System-on-Chip designs
[2] [3] [4] and neuromorphic computing systems [5] [6] [7]
because NoCs have good scalability and enable efficient and
flexible utilization of communication resources when compared
to the traditional point-to-point links and buses. NoCs convey
messages (in packets) through a distributed system of routers
(sometime called switches in ASNoCs) interconnected by links,
and these routers may include network interfaces for connecting
cores to routers. In this work, we focus on ASNoCs, where the
customized irregular network topologies are used because of
their low energy consumption and low area overhead [8], [9].
With successive technology node shrinking, the transistor size
on chips has been scaled down to a few nanometers, where
radiation, electromagnetic interference, electrostatic discharge,
aging, process variation and dynamic temperature variation
are the major causes of failures in MOSFET based circuits
[10] [11] [12]. It is extremely difficult for a heterogeneous
system to guarantee long-term product reliability because of a
combination of these factors. To maintain network connectivity
and correct packet-switching operations, we consider fault-
tolerance issues of the network components in ASNoCs. NoC
with regular topologies can achieve fault tolerance by providing
alternative routing paths when messages or packets encounter
faulty network components. However, in ASNoCs, the path
diversity is greatly reduced for lowering the energy and area
overhead of the network components. Consequently, we have
to introduce structural redundancies, such as switches, ports,
links, and network interface, to address these faults [13]–[16].
Then, alternative routing paths are used for the packet switching
between the cores, thus bypassing the faulty region [17]. Note
that, generally, fault control in NoC involves two phases: fault
diagnosis and fault tolerance [18]. Our research mainly focuses
on the fault tolerance.
There are many previous works addressing the synthesis of
ASNoC topologies [9], [19]–[30]. However, these works rarely
consider fault tolerance in the NoC topologies. In particular, the
ASNoCs have low path diversities and cannot work normally
if any hardware faults occur in the switches or links. In
[31], a fault tolerant NoC architecture was proposed, where
the cores were linked to two switches instead of one, and a
dynamically reconfigured routing algorithm was used to bypass
faulty switches. Chatter et al. [32] proposed a fault-tolerant
method based on router redundancy. They allocated a spare
router for each router for fault tolerance, which increased the
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 2
consumed power and area. In [33], the authors placed a spare
router for each 2×2 router block in the mesh topology and used
multiplexers to switch the faulty routers to the intact routers,
which could thereby decrease the power and area overheads
compared to [32]. However, this method cannot be applied to
ASNoC designs.
Tosun et al. [34] proposed a fault-tolerant topology generation
(FTTG) method for ASNoC, which focused on permanent
link and switch port failures. The authors attempted to add
a minimum number of extra switches and links and use the
min-cut algorithm to ensure that each switch and link were on
a cycle, which provided at least two alternative routing paths
to achieve adequate fault-tolerance. The NoC topologies are
generated in two phases. In the first phase, the links between
the switches (switch topologies) are constructed, and in the
second phase, the links from the cores to switches are built (core
mapping). However, the switch topology and the core mapping
strongly depend on each other; consequently, it is challenging
for the FTTG method to effectively explore the design space
of the network topologies. Additionally, the FTTG can only
generate one-fault tolerant topologies and cannot be applied
toward generating multiple-fault tolerant network topologies.
Motivated by these arguments, we propose a method for
generating ASNoC topologies with consideration of both switch
faults and physical link faults, when given the communication
requirements between the cores, floorplan of the cores, and
maximum number of tolerable faults, K. The main contributions
of this work are as follows.
1) We propose a generalized fault-tolerant topology genera-
tion method with consideration of both switch faults and
link faults. A convex-cost flow based method is used to
solve the core mapping problem for building connections
between the cores and switches, and an ILP based method
is proposed to allocate K+1 switch-disjoint routing paths
for each communication flow.
2) To reduce the switch sizes, we propose sharing the switch
ports for the connections between the cores and switches,
and prove the conditions for port sharing on a switch.
The port sharing problem on a switch is formulated as
a clique-partitioning problem and heuristically solved by
iteratively finding a set of maximum cliques and solving
a maximum cardinality matching problem. Moreover, we
propose a heuristic method, where a series of maximum
independent set problems are solved for removing the
conflicts caused by port sharing on multiple switches.
3) Additionally, we also propose an ILP-based method to
simultaneously solve the core mapping and routing path
allocation problems when only the physical link failures
are considered.
Experimental results show that the power consumptions of
fault-tolerance topologies increase almost linearly with K be-
cause of the routing path redundancies (See Fig.12). When
both switch faults and link faults are considered, port sharing
can respectively reduce the average power consumptions of the
fault-tolerance topologies with K = 1, K = 2 and K = 3 by
18.08%, 28.88%, and 34.20%. When considering only the phys-
ical link faults, the experimental results show that, compared to
the FTTG, the proposed method reduces power consumption
and hop count by 10.58% and 6.25%, respectively; compared
to the DBG based method, the proposed method reduce power
consumption and hop count by 21.72% and 9.35%, respectively.
The remainder of this paper is organized as follows. Section
II formulates the K-fault-tolerant ASNoC topology generation
problem. The overview of the proposed framework is shown
in Section III. The generalized K-fault-tolerant topology gen-
eration methodology is discussed in section IV-A, IV-B and
V. Section VI discusses the generation method for link-fault-
tolerance topologies. The experimental results are provided in
Section VII, followed by the conclusions in Section VIII.
II. PRELIMINARIES & PROBLEM FORMULATION
A. NoC Architecture
In this work, the ASNoC architectures are assumed to sup-
port packet-switched communications with source routing and
wormhole flow control [35]. In the application-specific design,
the communication characteristics are known a priori, and
hence, a deterministic routing strategy is used; that is, the rout-
ing path for the communications is preallocated, which accord-
ingly determines the topology of the NoC. The ASNoC topology
architecture consists of two main components: switches and
customized electrical links. The switches are used to route
packets from the source to the destination, and the routing
information is included in the packet to specify the address of
the output port, to which the packet should be forwarded. Given
the communication characteristics of an application, this work
focuses on the generation of network topologies by preallocating
the routing paths for the communication flows.
B. Problem Definition
Let Vc = {ci|1≤ i≤ ncore} be the set of cores in an applica-
tion. The communication requirements (or communication flow
in this work) between the cores can be represented as a directed
graph, Gcc, and defined as follows.
Definition 1: Gcc = (Vc,Ecc) is directed. An edge (ci,c j) ∈
Ecc represents the communication from ci to c j. Besides, the
bandwidth requirement of the communication flow from ci to
c j is given by wi, j .
Fig. 1 shows an example of Gcc.
4
9 8
3
5
7
2
1
0
12
6
25 2083
4060
500
2083
1000
1000 870
180
10
4060
500
150
10 11
Fig. 1: Gcc of MP3EncMP3Dec encoder application.
In ASNoCs, the switches will be shared among the cores
for data communications. If only link failures between the
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 3
switches are considered [34], the mapping from the cores to
the switches is a many-to-one relationship, which is the same as
the clustering problem in the traditional ASNoC synthesis [26]
[20]. However, the mapping from cores to switches is a many-
to-many relationship if both switch failures and link failures
are considered. Let Vs = {si|1≤ i≤ nsw} be the set of switches.
We use the Cartesian products Vc×Vs, Vs×Vc, and Vs×Vs to
represent all possible connections from the cores to the switches,
from the switches to the cores, and from switches to switches,
respectively.
The problem of generalized fault-tolerance topology genera-
tion for ASNoCs is defined as follows.
Problem statement.
Given a core communication graph Gcc, number of switches
nsw, floorplan of the cores, and number of tolerable faults K,
we attempt to determine the placement of the switches and
construct a K-fault-tolerant ASNoC topology with minimization
of the power consumption of the ASNoC under the following
constraints:
• the latency constraint li, j (number of hops) for each com-
munication flow (ci,c j) ∈ Ecc,
• the switch size constraint max size, which is the maximum
number of ports that a switch could support given the NoC
operating frequency,
• and the bandwidth constraint BWmax for the physical links,
which is the product of the NoC frequency and bit-width
of the physical links.
The ASNoC topology can be represented as a direct graph
GNT (VNT ,ENT ), where VNT = Vc ∪Vs, and ENT includes two
types of edges:
• a subset of edges Lcs ⊆ Vc×Vs ∪Vs×Vc determined by
solving the core mapping problem, corresponding to the
connections between the cores and switches,
• and a subset of edges Lss ⊆Vs×Vs determined by solving
the routing path allocation problem, corresponding to the
physical links between the switches.
In GNT , there are K + 1 switch-disjoint paths for each
communication flow (ci,c j) in Gcc when both switch failures
and link failures are considered.
In the K-fault tolerance structures, K times more switch
ports are connected to each core for introducing routing path
redundancies. These switch ports greatly increase the area and
power consumption of switches. To reduce the number of the
switch ports, we also solve a port sharing problem for each
switch, which will be discussed in details in Section V.
III. OVERVIEW OF THE PROPOSED FRAMEWORK
Given the floorplan of ncore cores, their communication
requirements represented by Gcc, and the number of switches
Nsw, the placement of the switches is determined using the
method in [36].
As discussed in Section II-B, the NoC topology generation
problem mainly includes two subproblems: core mapping (CM)
and routing path allocation (PA). We first map the cores to
the switches using a min-cost-max-flow algorithm in Section
IV-A. Second, the routing path allocation is solved using an
ILP-based method in Section IV-B. If we fail to find K + 1
switch-disjoint paths for all the communication flows under
the given constraints, the number of switches is increased by
one and the generalized fault-tolerance topology generation
problem is solved again. This procedure is repeated until all the
communication flows have K+1 switch-disjoint routing paths.
To generate the K-fault-tolerant topology, we connect each
core to at least K+ 1 switches by K+ 1 ports, which greatly
increases the power and area overheads of the switches. How-
ever, for each flow of each core, only one of all the K+1 ports
work for data communication. Consequently, the switch ports
connecting different cores could be shared using multiplexers.
In Section V, we prove the conditions for port sharing on a
switch and propose a clique-partitioning formulation for the
problem, which is solved using a heuristic method. Moreover, a
heuristic method is proposed to remove the conflicts of routing
path selection, caused by port sharing on multiple switches.
Fig.2 illustrates the overall flow of generating fault-tolerance
ASNoC topologies.
Floorplaning of 
Switches
Are all flows 
solved ?
!" #$%
Core Mapping 
&Path AllocationAdd a switch
Port Sharing
INPUT: Gcc , nsw , 
Floorplaning of 
cores
OUTPUT: Fault-Tolerant 
ASNoC topology
Calculate the initial 
position of switches
Fig. 2: Overview of the proposed framework.
IV. FAULT-TOLERANCE TOPOLOGY GENERATION
A. Core Mapping
To generate K-fault-tolerant topology, we connect each core
to at least K+1 switches, and many switch ports are introduced
for connecting the cores, accordingly. For a switch, the area
increases quadratically with the port number, and the power
increases superlinearly with the switch size. Consequently, a
convex-cost flow based method is used to generate a core
mapping with evenly distributed core–switch connections.
In the core mapping stage, we build connections from the
source cores of the communication flows to the switches and
connections from the switches to the sink cores of the commu-
nication flows. Here, we consider building connections from the
source cores to the switches. The connections from the switches
to the sink cores are built similarly.
To build a convex-cost flow model, we construct a directed
graph Gcs(Vcs,Ecs), where Vcs = Vc ∪Vs ∪ {b, t} and Ecs =
Vc×Vs∪{Vs → t}∪{b→Vc}.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 4
The capacity of an edge (b,ci) ∈ {b→ Vc} is set to K+ 1
if there is an outgoing communication flow from ci and is set
to 0 otherwise. The capacities of the edges in {Vs → t} are set
to Ncs = ⌊ncore ∗ (K+ 1)/nsw⌋+1, which is close to the average
number of input ports that is used to connect cores on a switch.
All the other edges have a capacity of 1.
The edges in {b→ Vc} have zero cost. we can map on one
switch; For an edge (ci,s j)∈Vc×Vs, the cost is defined as Ebit×
Dci,s j ×∑k:(ci,ck)∈Ecc wi,k, where Dci ,s j is the distance between
core ci and switch s j and Ebit (set to 0.5 in this work) is the
bit energy of unit wire length (1mm).
To make the connections from cores to switches evenly
distributed among the switches, the costs of the edges in
{Vs→ t} are defined to be a function of the number of flows x on
the edges c(s j ,t)(x), which corresponds to a piecewise linear and
convex function for the flow costs. Let 0= d0 ≤ d1 ≤ ·· · ≤ dNcs
denote the breakpoints of the piecewise function and the costs
vary linearly in the interval [di−1,di],1≤ i≤ Ncs. In this work,
the edge cost function cs j ,t(x) = 10x and the interval between
adjacent breakpoints is set to 1. Consequently, the flow cost
is calculated as 10x2. Such a convex-cost flow problem can be
easily transformed into a traditional min-cost flow problem [37].
According to the solution to the convex cost flow model, the
edges in Vc×Vs that have non-zero flow will be selected as the
connections from cores to switches.
After we map the cores to switches, we have determined
the connections between the cores and the switches, denoted as
Lcs(⊆ Vc×Vs∪Vs×Vc). Hereafter, for a communication flow
(ci,c j) in Gcc, we define the switch ports connected to the
source core ci as core inports and the switch ports connected
to the sink core c j as core outports.
An example of core mapping for Gcc in Fig.1 is shown in
Fig.3, where K = 1. Each source core is connected to multiple
core inports and multiple core outports respectively through a
demultiplexer (DEMUX) and a multiplexer (MUX).
B. ILP based Path Allocation
To generate K-fault tolerant topologies, we have to find K+1
alternative switch-disjoint (node-disjoint) routing paths in a
complete graph of switches Gs(Vs,Vs×Vs) for each communica-
tion flow considering the costs of switches and links. To reduce
the internally node-disjoint paths problem to an edge-disjoint
paths problem [38], which is easily formulated as a constrained
min-cost multi-flow problem, we perform node splitting on Gs
and extend the graph for routing path allocation. Each switch
node u ∈Vs is split into two nodes u and u
′. A directed graph
Gpa(Vpa,Epa) is constructed as follows.
• Vpa =Vc∪Vs∪V
′
s , where V
′
s is the split node set of Vs.
• Epa = Lcs ∪ Esplit ∪ Elink, where Esplit = {(u,u
′)|u ∈
Vs ∧ u
′ is the corresponding split node of u} and
Elink ={(u
′,v)|(u,v)∈Vs×Vs∧u
′ is the corresponding split
node of u}. If there is a directed edge from u to v in Vs×Vs,
a corresponding directed edge from u′ to v is added in Epa.
In the following, we discuss how to find K+1 edge-disjoint
routing paths in Gpa for all the communication flows.
1) Computation of Switch Power and link power: The switch
power depends on the switch size, which includes the num-
ber of input ports and output ports. To calculate the size of
switches, we introduce two types of binary variables x
i, j,k
uv and
duv. x
i, j,k
uv = 1 indicates that the (k+ 1)-th routing path of the
communication flow (i, j) goes through the edge (u,v) ∈ Epa,
0 ≤ k ≤ K; otherwise, x
i, j,k
uv = 0. duv = 1 indicates that there
is at least one communication flow going through the edge
(u,v) ∈ Epa, and accordingly a physical link exists between the
switches u and v; otherwise, duv = 0. duv is calculated based on
binary variables x
i, j,k
uv as follows [39].
duv = min{ ∑
(i, j)∈Ecc
K
∑
k=0
xi, j,kuv ,1},∀(u,v) ∈ Elink. (1)
Then, the size of a switch u, including the input port number
ipu and output port number opu, can be calculated as follows.
ipu = ∑
v:(v,u)∈Elink
dvu+ cipu; opu = ∑
v:(v,u)∈Elink
duv+ copu. (2)
cipu and copu respectively represent the number of core
inports and core outports, which have been determined in the
core mapping stage. Consequently, the power consumption of
the switches are estimated using Orion 3.0 [40] as follows [29].
Psw(ipu,opu) = Tsw[ipu]+Csw ∗ opu (3)
where Tsw is a table mapping the input port number to power
consumption, and Csw is a constant.
The link power of a communication flow (i, j) on an edge
(u,v) ∈ Elink is determined by the communication requirement
wi, j and the physical distance between u and v, Du,v. Consider-
ing the cost of opening new physical links, an extra cost Cpl is
introduced if the physical link between u and v does not exist.
Therefore, the link power Psw(i, j,u,v) is calculated as follows
[41].
Plink = ∑
(u,v)∈Elink
∑
(i, j)∈Ecc
(Ebit ·wi, j ·Du,v · x
i, j,0
uv + duv ·Cpl), (4)
where Ebit represents the bit energy of the electrical link.
2) ILP Formulation for Routing Path Allocation: To simplify
the discussion, the communication flows in Gcc are relabeled as
(i, j) = (ci,c j) ∈ Ecc (Definition 1). The routing path allocation
for ASNoCs with K-fault tolerance can be formulated as a
constrained multiple flow problem Gpa as follows.
Min ∑
u∈Vs
Psw(ipu,opu)+Plink (5)
s.t. ∑
v:(u,v)∈Epa
xi, j,kuv − ∑
v:(v,u)∈Epa
xi, j,kvu =


1, if u= ci;
0, if u ∈Vs;
−1. if u= c j;
∀ k ∈ [0,K],∀ (i, j) ∈ Ecc, (5a)
K
∑
k=0
x
i, j,k
uu′
≤ 1,∀(u,u′) ∈ Esplit ,∀ (i, j) ∈ Ecc, (5b)
∑
(u,u′)∈Esplit
x
i, j,k
uu′
≤ li, j,∀ k ∈ [0,K],∀ (i, j) ∈ Ecc, (5c)
∑
(i, j)∈Ecc
K
∑
k=0
wi, j ∗ x
i, j,k
uv ≤ BWmax,∀(u,v) ∈ Elink, (5d)
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 5
ipu ≤ max size, opu ≤ max size, (5e)
xi, j,kuv ∈ {0,1},k ∈ [0,K],∀(u,v) ∈ Epa,∀(i, j) ∈ Ecc. (5f)
In the formulation above, the set of constraints (5a) defines
K+1 unit flows (paths) for each communication flow. The con-
straint (5b) ensures that the K+1 paths of any communication
flow (i, j) are edge-disjoint. The constraints (5c) ensure that the
latency constraint is satisfied for all K+1 routing paths of each
communication flow. The set of constraints (5d) represents the
limited bandwidth of physical links and the set of constraints
(5e) denotes the maximum port number for each switch.
The objective (5) is to minimize the total power consumption
of switches and the total link power consumption of the default
routing paths (k=0) for all communication flows.
The required runtime is unacceptable when directly solving
the above large-scale ILP problem. To reduce the runtime of
routing path allocation, we process the communication flows
one by one in descending order of bandwidth requirements,
which cause sub-optimal solutions but greatly reduce the run-
time. Additionally, before processing each communication flow,
the switch size constraint and the bandwidth constraint can be
preprocessed by removing, from the graph Gpa, the vertices
corresponding to the switches that have a maximum size and
the edges corresponding to the links that have no enough
bandwidth. Therefore, the ILP formulation for the routing
path allocation of a single communication flow (i, j) ∈ Ecc is
simplified as follows.
Min ∑
u∈Vs
Psw(ipu,opu)+ ∑
(u,v)∈Elink
Plk(i, j,u,v) (6)
s.t. (6a)
∑
v:(u,v)∈Epa
xi, j,kuv − ∑
v:(v,u)∈Epa
xi, j,kvu =


1, if u= ci;
0, if u ∈Vs;
−1. if u= c j;
∀ k ∈ [0,K], (6b)
∑
(u,u′)∈Esplit
x
i, j,k
uu′
≤ li, j,∀ k ∈ [0,K], (6c)
xi, j,kuv ∈ {0,1},k ∈ [0,K],∀(u,v) ∈ Epa, (6d)
where the cost of links Plk(i, j,u,v) can be simply defined as
follows.
Plk(i, j,u,v)=
{
Ebit ×wi, j×Du,v, i f physical link (u,v) exists;
Ebit ×wi, j×Du,v+Cpl, otherwise,
(7)
3) An example: As mentioned above, we solve an ILP model
for each communication flow (ci,c j) in Gcc. As shown in Fig.3,
we first allocate the routing paths of the communication flow
(c3,c0) and, a physical link from switch s2 to s0 is added, where
the default path includes only one switch s3 and the alternative
path is from s2 to s0; After we allocate routing paths for another
communication flow (c4,c3), a physical link from switch s1 to
s3 is added, where the default path includes only one switch
s2 and the alternative path is from s1 to s3. Fig.3 shows the
final network topology with one-fault tolerance, and the routing
paths for all communication flows are shown in Table I.
TABLE I: Routing Paths for One-Fault Tolerance
Flows Default Path Alternative Path
c1 → c0 s0 → s0 s3 → s3
c2 → c1 s0 → s0 s3 → s3
c3 → c0 s3 → s3 s2 → s0
c4 → c3 s2 → s2 s1 → s3
c7 → c5 s0 → s0 s1 → s1
c7 → c6 s0 → s0 s1 → s3
c8 → c7 s1 → s0 s2 → s3
c9 → c8 s1 → s1 s0 → s2
c10 → c9 s1 → s1 s2 → s2
c11 → c8 s1 → s1 s2 → s2
c12 → c4 s2 → s2 s3 → s1
c12 → c10 s2 → s2 s3 → s0 → s1
c12 → c11 s2 → s2 s3 → s1

 




	





































































	 

	


	 	

	


 

	


 

		


 

	

		
 

	

	
 

		


 

	

		
	 

			


		 

			


		 

	

	
			 	

	

	
		 		

	

	
Fig. 3: Final Network Topology.
V. SWITCH PORT SHARING
In a K-fault tolerance structure, K + 1 core inports or/and
K+ 1 core outports are required for each core on no less than
K + 1 switches, which greatly increase the area and power
consumption of switches. Many switch ports are not used
simultaneously because only one out of K + 1 routing paths
is used for each communication flow at a time. To reduce the
switch size, we propose the sharing of switch ports (on the same
switch) between the routing paths from different communication
flows, using multiplexers. Given the routing path allocation
of communication flows, we prove a sufficient and necessary
condition for the port sharing on a switch, and formulated the
problem into a clique partitioning problem.
The port sharing aims at the sharing of core inports/core out-
ports on the same switch. In this section, we first propose a two-
stage method to solve the core inport sharing problem, which
is also applicable to core outport sharing. Second, we proposed
a method to remove the conflicts caused by port sharing on
multiple switches. Finally, an independent set based formulation
is proposed for selecting routing paths for communication flows.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 6
A. Conditions for Port Sharing on a Switch
In this subsection, given a network topology with K-fault-
tolerance and the corresponding routing path allocation, we
derive the conditions for port sharing on one switch only.
For clarity, the conditions for the inport sharing in one-fault-
tolerance and K-fault-tolerance are respectively discussed in
Section V-A1 and Section V-A3.
1) Port Sharing in One-Fault-Tolerance Topologies: Suppose
there are two core inports, IP1 and IP2, on a switch sn, and
two communication flows, f1 and f2 which are shown on the
Fig.4(a). f1 has two routing paths, p
f1
1 and p
f1
2 , and p
f1
1 goes
through IP1. f2 has two routing paths, p
f2
1 and p
f2
2 , and p
f2
1 goes
through IP2. Fig.4(b) illustrates port sharing.
s
n
IP1
IP2
1
1
fp
2
1
fp
Ci
(a)
Cj
1
2
fp
2
2
fp
s
n
1
1
fp
2
1
fp
Ci
Cj
1
2
fp
2
2
fp
!
"
#
IP1
(b)
Fig. 4: (a)Two independent core inports. (b) Port sharing.
Lemma 1: The two inports on sn, IP1 and IP2 respectively
used by p
f1
1 and p
f2
1 , can be merged into a single port without
violating the property of one-fault tolerance if and only if p
f1
2
and p
f2
2 are vertex-disjoint.
Proof. IF. In the case that p
f1
2 and p
f2
2 are vertex-disjoint, after
IP1 and IP2 are merged into a single port the routing paths for f1
and f2 can always be found when one fault occurs. Because p
f1
2
and p
f2
2 cannot go through sn, the one fault could be on sn, p
f1
2 ,
or p
f2
2 . If the fault is on switch sn, f1 and f2 can use p
f1
2 and p
f2
2
for communication, respectively. If the fault is on p
f1
2 (or p
f2
2 ),
f1 and f2 can respectively use p
f1
1 and p
f2
2 (or p
f1
2 and p
f2
1 ) for
communication. It is concluded that p
f1
1 and p
f2
1 , respectively
going through IP1 and IP2, are not used simultaneously while
the topology keeps the property of one-fault tolerance.
ONLY IF. We show that if p
f1
2 and p
f2
2 are not vertex-disjoint,
merging IP1 and IP2 will violate the one-fault tolerance. p
f1
2 and
p
f2
2 must go through one common switch vertex if they are not
vertex-disjoint. Consequently, if the common switch is faulty
then f1 and f2 have to use p
f1
1 and p
f2
1 for communication,
which causes conflict use of core inports. Proof END.
Fig.5 shows the intersection relations between the routing
paths of the communication flows f1 and f2, which causes core
inport conflict if IP1 and IP2 are merged and one fault occurs
in the intersection point of routing paths p
f1
2 and p
f2
2 .
2) Port Sharing in Multiple-Fault Tolerance Topologies: In
this subsection, we give a generalized sufficient and necessary
condition for port sharing in K-fault-tolerant (K ≥ 1) structures.
Suppose there are two core inports, IP1 and IP2, on a switch
sn, and two communication flows, f1 and f2. f1 has K + 1
vertex-disjoint routing paths, p
f1
0 , p
f1
1 , · · · , p
f1
K , of which p
f1
0
goes through IP1. f2 has K + 1 vertex-disjoint routing paths,
p
f2
0 , p
f2
1 , · · · , p
f2
K , of which p
f2
0 goes through IP2.
Inport crosspoint
A crosspoint
1
1
fp
1
2
fp
2
1
fp 2
2
fp
Fig. 5: The intersection relation violates the one-fault tolerance
considering the routing paths of f1 and f2.
We construct a bipartite graph, IG(V,E), to represent the
intersection relations between the routing paths of f1 and f2
as follows.
The vertex set includes all the routing paths of f1 and
f2 except for p
f1
0 and p
f2
0 . Let Pf1 = {p
f1
i , i = 1, · · · ,K} and
Pf2 = {p
f2
i , i = 1, · · · ,K}. VIG = Pf1 ∪Pf2 . Further, if two rout-
ing paths from Pf1 and Pf2 have a common vertex, there is
an edge in EIG. That is, EIG = {(p
f1
i , p
f2
j )|p
f1
i ∈ Pf1 p
f2
j ∈
Pf2 and they have a common vertex.}. It is obvious that IG(V,E)
is a bipartite graph. Let C(IG) be the maximum cardinality
matching in IG(V,E). Then, we have the following conclusion.
Theorem 1: The two core inports on sn, IP1 and IP2,
respectively used by p
f1
0 of f1 and p
f2
0 of f2, can be merged
into a single port (port sharing) without violating the property
of K-fault tolerance if and only if C(IG)< K.
Proof. IF.We show that ifC(IG)<K the merging of IP1 and IP2
will not violate the K-fault tolerance. Note that the routing paths
from Pf1 (or Pf2) are vertex-disjoint (except for the source and
sink core vertices). One fault causes at most two faulty paths,
of which one path is from Pf1 and the other is from Pf2 and they
have a common vertex. We have two situations considering p
f1
0
and p
f2
0 .
(1). p
f1
0 and p
f2
0 are correct but only one of them can be used
since they share one core inport. Let K1 and K2 respectively
be the number of faulty paths in Pf1 and Pf2 . Without loss of
generality, we suppose K1 ≥ K2. When K1 < K, there must be
at least a correct routing path in both Pf1 and Pf2 and, hence,
K faults are tolerant. When K1 = K, we can conclude that
K2 <K. Because K2 =K1 =K indicates that C(IG) =K, which
contradicts C(IG)< K. Accordingly, we have at least a correct
path in Pf2 for communication flow f2 and p
f1
0 can be used
for f1. Consequently, the network topology maintains K-fault
tolerance.
(2). p
f1
0 and p
f2
0 are faulty when sn is faulty. In this case,
the paths in Pf1 and Pf2 are able to construct a K − 1 fault-
tolerance structure since the given topology is K-fault tolerant.
Consequently, the network topology also keeps K-fault toler-
ance.
ONLY IF. We show that if C(IG) = K, then merging IP1 and
IP2 will violate the K-fault tolerance. There will be perfect
matching in IG(V,E) if C(IG) = K. In the perfect matching,
each edge corresponds to an intersection of two routing paths
from Pf1 and Pf2 . All 2K paths in Pf1 and Pf2 are faulty when
K faults exactly occur on K intersected vertices. Consequently,
p
f1
0 and p
f2
0 are respectively the only available routing path for
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 7
f1 and f2, which cause a conflicting use of merged core inports
if we merge IP1 and IP2. Proof END.
1
1
f
p 2
1
f
p
1
2
f
p 2
2
f
p
1
1
f
p 2
1
f
p
1
2
f
p 2
2
f
p
(a) (b) (c)
1
0
f
p 2
0
f
p 1
0
f
p 2
0
f
p
1
fP
2
fP
1
fP
2
fP
s
n
IP1
IP2
1
0
f
p
2
0
f
p
Ci
Cj
1
1
f
p
2
1
f
p
1
2
f
p
2
2
f
p
Fig. 6: The intersection relation violates the 2-fault tolerance
considering the routing paths of f1 and f2 where p
f1
1 intersects
p
f2
1 at router sn.
Further, Fig.6 shows the intersection relation violates the 2-
fault tolerance considering the routing paths of f1 and f2 where
p
f1
1 intersects p
f2
1 at router sn.
3) Inport with Multiple Communication Flows: For each
core inport, there may be one or one more outgoing communi-
cation requirements. Accordingly, there may be one or multiple
paths going through one core inport.
Suppose there are two core inports, IP1 and IP2, on a switch sn
and there are respectively m and n communication flows going
through IP1 and IP2, as shown in Fig.7.
s
n
IP1
IP2
1
0 0
,...,
ii
mffp p
Ci
Cj
1
1 1
,...,
ii
mffp p
1
1 1
,...,
j j
nf fp p
1
"  
,...,
j j
nf fp p
Fig. 7: Multiple paths of the inports.
Corollary 1: The two inports IP1 and IP2 can be merged if
and only if all the m× n pairs of communication flows satisfy
Theorem 1.
4) Multiple Port Sharing: Suppose there are J core inports,
IP1, IP2, · · · , IPJ, on a switch sn, and J communication flows,
f1, f2, · · · , fJ . fi, i = 1, · · · ,J, has K+ 1 vertex-disjoint routing
paths, p
fi
0 , p
fi
1 , · · · , p
fi
K , of which p
fi
0 goes through IPi.
Theorem 2: The J core inports on sn, IPj, j = 1, · · · ,J,
respectively used by J communication flows, f1, f2, · · · , fJ , can
be merged into a single inport without violating the property
of K-fault tolerance if the merging relations between J inports
form a clique, which indicates that, for all pairs of (IPi, IPj),
i 6= j, and 1≤ i, j ≤ J, IPi and IPj can be merged according to
Theorem 1.
Proof. Let Pf j = {p
f j
k ,k= 1, · · · ,K} be the set of vertex-disjoint
routing paths of f j except for p
f j
0 . Similarly, we have two
situations considering p
fi
0 , i= 1, · · · ,J.
(1). p
f j
0 , j = 1, · · · ,J are correct but only one of them can
be used since they share one core inport. Let K j , j = 1, · · · ,J,
respectively be the number of faulty paths in Pf j . Without loss
of generality, we suppose that K j ≥K j+1, j= 1, · · · ,J−1. When
K1 < K, there must be at least a correct routing path in Pf j , j =
1, · · · ,J, and, hence, K-faults are tolerant. When K1 =K, we can
conclude that K j <K, j= 2, · · · ,J. This is because K j =K1 =K
indicates that there will be perfect matching considering the
intersection relations between the K paths from Pf1 and K paths
from Pf j and, hence, IP1 and IPj cannot be merged according
to Theorem 1, which is a contradiction. Accordingly, we have
at least a correct path in Pf j for communication flow f j , j =
2, · · · , J and p
f1
0 can be used for f1. Consequently, the network
topology maintains K-fault tolerance.
(2). p
f j
0 , j = 1, · · · ,J are faulty when sn is faulty. In this
case, the paths in Pf j , j = 1, · · · ,K, are able to construct a K−
1 fault-tolerance structure since the given topology is K-fault
tolerant. Consequently, the network topology also maintains K-
fault tolerance. Proof
END.
Suppose there are J core inports, IP1, IP2, · · · , IPJ, on a switch
sn, and there are mi communication flows going through IPi,
i= 1, · · · ,J.
Corollary 2: The J core inports on sn, IPj, j = 1, · · · ,J,
respectively used by m j, j= 1, · · · ,J, communication flows, can
be merged into a single inport without violating the property
if all the ∏Ji=1mi combinations of communication flows satisfy
Theorem 2.
Based on the above theorem and corollaries, we conclude the
following theorem.
Theorem 3: The J core inports on sn, IPj, j = 1, · · · ,J,
respectively used by m j, j= 1, · · · ,J, communication flows, can
be merged into a single inport without violating the property
of K-fault tolerance if the merging relations between J inports
form a clique.
B. Clique Partitioning for Port Sharing on a switch
Given a network topology and all routing paths for K-fault-
tolerance, we formulate the port sharing problem on a switch
as a clique partitioning problem, where the clique number is
minimized to reduce the switch size, according to Theorem 3.
For a switch, a graph Gps(Vps,Eps) is constructed to represent
the possible sharing relations between the core inports. The
vertex set Vps = {IPi, i = 0, · · · ,N} represents the set of core
inports. An edge, (IPi, IPj), is added to Eps if IPi and IPj can be
shared with each other according to Corollary 1. Fig.8 shows
an example of Gps and its two clique-partitioning. The solid
edges are the clique edges. In Fig.8.(a), two inports are required,
respectively corresponding to a 2-vertex clique and 3-vertex
clique. In Fig.8.(b), three inports are required, respectively
corresponding to two 2-vertex cliques and one 1-vertex clique.
Because the clique partitioning problem is an NP-hard prob-
lem, we propose a heuristic to find the clique partitioning of
Gps. Algorithm 1 shows the key steps of the heuristic for
port sharing. Firstly, we find a maximum clique QVq,Eq in Gps
using an ILP based method [42]. If |Vq| is greater than 2,
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 8
1 2
5
3
4
(a) (b)
(a)
1 2
5
3
4
(b)
Fig. 8: An example of Gps.
we merge the inports in Vq into a single port and remove
QVq,Eq from Gps to update a new Gps. The operation of finding
the maximum clique is repeated until |Vq| ≤ 2, where the
clique partitioning problem can be solved by finding maximum
cardinality matching. Secondly, we find maximum cardinality
matching in the rest of Gps, where the edges in the maximum
cardinality matching correspond to two-port sharing.
Algorithm 1 PS on a switch (sn)
Require: fault tolerant paths of communication requirements
Ensure: results of port sharing within a switch
Construct a possible port-sharing graph Gps for sn;
for each pair of inports(or outports) (IPi,IPj) on sn do
for each path p
f im
1 through IPi do
for each path p
f
j
n
1 through IPj do
Construct a bipartite graph IG;
Calculate C(IG) of IG;
if C(IG) >= K then
goto cont;
end if
end for
end for
Add an edge (IPi, IPj) to Egs;
cont: Continue;
end for
repeat
Find a maximum clique Q(Vq,Eq) in Gps and the IPs
denoted by Vq share a single port;
Remove Q(Vq,Eq) from Gps;
until |Vq| ≤ 2
Find maximum cardinality matching on Gps;
C. Port Sharing on Multiple Switches
The conditions in Section V-A ensure the fault tolerance when
port sharing is considered on one switch only. However, port
sharing on multiple switches perhaps causes conflicts of routing
path selection. In this section, we present a method for removing
some port sharing to maintain the fault tolerance when port
sharing on all switches are considered.
To select routing paths for communication flows, we can
construct a graph Gpc(Vpc,Epc) to represent the conflict re-
lations between routing paths and solving an independent
set problem on Gpc. Let p
k
i, j be the k-th routing path
of communication flow (i, j). Vpc = {p
k
i, j|(i, j) ∈ Ecc,0 ≤
k ≤ K} represent the routing paths of all communication
flows. Epc includes two types of edges. One represents
two routing paths from the same communication flow and
the other represents port-sharing relations between two rout-
ing paths. Epc = {(p
k1
(i, j)
, pk2
(i, j)
)|0 ≤ k1,k2 ≤ K and k1 6= k2} ∪
{(pk1
(i1, j1)
, pk2
(i2, j2)
)| (i1, j1) 6= (i2, j2),0 ≤ k1,k2 ≤ K, and p
k1
(i1, j1)
and p
k2
(i2, j2)
go through a common core inport or core outport}.
The selection of routing paths can be achieved by finding
a maximum independent set of Gpc, denoted as IND(Gpc). If
there is no port sharing, that is, there is no second type of
edges in Epc, we may choose any correct routing path for
each communication flow, and, hence |IND(Gpc)| = |Ecc|. If
port sharing is considered on one switch only, the conditions in
Section V-A ensure |IND(Gpc)|= |Ecc| for any K switch faults.
However, |IND(Gpc)| < |Ecc| could happen if we have two or
more switches with shared core inports/outports, that is, we
cannot find enough routing paths for the communication flows.
Fig.9 shows an example. We have three communication flows
f1, f2, and f3 (|Ecc|= 3), and each flow has two switch-disjoint
routing paths, p
fi
1 and p
fi
2 , for one-fault tolerance. As shown in
the figure, the routing paths have two shared ports respectively
on the switches Sm and Sn, and p
f1
2 and p
f3
2 go through a
common switch Sl . The corresponding Gpc is also shown in
the figure. When sl is broken, we have four correct paths after
removing p
f1
2 and p
f3
2 . Obviously, |IND(Gpc)|= 2 < 3= |Ecc|,
which cause that one flow has no routing paths. The conflict
can be solved by removing the port sharing on any switch.
smIP1!
"
#
Ci
Cj
s
#
IP2!
"
#
Ck
!"
#$
!%
#$
!"
#&
!%
#&
!"
#'
!%
#'
sl
!%
#$ !"
#$
!%
#& !"
#&
!"
#' !%
#'
Fig. 9: Example for the conflicts of routing paths with port
sharing on two switches.
Here, we propose a heuristic to deal with conflicts of port
sharing for all the switches. Let V f be the set of faulty routers.
Algorithm 2 shows the key steps.
In Algorithm 2, we first generate the core inport/outport
sharing for each switch by calling Algorithm 1. Second, for
each subset of possible K faulty switches, the fault-tolerance is
verified by solving a maximum independent set problem, where
|IND(Gpc)| is computed by solving an ILP-based formulation.
When |IND(Gpc)|< |Ecc|, a port-sharing edge is considered to
be removed for increasing the size of maximum independent set.
Basically, we consider removal of a port-sharing edge between
core outports while keeping the sharing edges between core
inports as many as possible, because an input port generally
has a flit buffer with large area costs and power overhead.
Fig.10 shows the port sharing of the network topology in Fig.
3; the corresponding Gcc in Fig.1. Fig.11 shows the graph Gpc
corresponding to Fig.10. When the switch s0 is broken down,
Table II shows the available routing paths, where the faulty paths
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 9
Algorithm 2 PS on multiple switches
Require: fault tolerant paths of communication requirements
Ensure: results of port sharing
for each switch sn in Vs do
Call PS on a switch(sn) to generate core inport sharing;
Call PS on a switch(sn) to generate core outport sharing;
end for
Construct a conflict relation graph Gpc between routing paths;
for each subset of switches V f ⊂Vs and |V f |= K do
Temporarily remove from Gpc the routing paths going
through any switch si ∈V f ;
Find a maximum independent set of Gpc;
if |IND(Gpc)|= |Ecc| then continue;
end if
repeat
Choose a communication flow (i, j) having no routing
path in IND(Gpc);
Permanently remove one of the port-sharing edges
related to (i, j).
Find a maximum independent set of Gpc;
until |IND(Gpc)|= |Ecc|
Restore Gpc, except for the permanently removed edge;
end for

 




	





































































 
 
 

Fig. 10: Result of Port Sharing.
TABLE II: Communication paths for one-switch-fault tolerance
with s0 broken down
Flows Default Path Alternative Path
c1 → c0 s0 → s0 s3 → s3
c2 → c1 s0 → s0 s3 → s3
c3 → c0 s3 → s3 s2 → s0
c4 → c3 s2 → s2 s1 → s3[c]
c7 → c5 s0 → s0 s1 → s1
c7 → c6 s0 → s0 s1 → s3
c8 → c7 s1 → s0 s2 → s3
c9 → c8 s1 → s1 s0 → s2
c10 → c9 s1 → s1 s2 → s2
c11 → c8 s1 → s1 s2 → s2
c12 → c4 s2 → s2 s3 → s1[c]
c12 → c10 s2 → s2 s3 → s0 → s1
c12 → c11 s2 → s2 s3 → s1[c]
!"!!"
!"!!"
!#!!#
!$!!$
!"!!"
!"!!"
!%!!"
!%!!%
!%!!%
!%!!%
!$!!$
!$!!$
!$!!$
!#!!#
!#!!#
!$!!"
!%!!#
!%!!%
!%!!#
!$!!#
!"!!$
!$!!$
!$!!$
!#!!%
!#!!"!!%
!#!!%
&%!&"
&$!&%
&#!!"
&'!!#
&(!!)
&(!!*
&+!&(
&,!!+
&%$!!%%
&%$!!%"
&%$!!'
&%%!!+
&%"!!,
Fig. 11: Gpc for selecting routing paths in Fig.10.
are displayed in gray color and the conflict paths are displayed
in gray color and marked using c.
D. Selecting Routing Paths after Port Sharing
The selection of routing paths can be achieved by finding an
|Ecc|-size independent set of Gpc. If link faults or switch faults
occur, we can just remove the routing paths that go through the
faulty links and the faulty switches from Gpc and find a new
set of routing paths for all the communication flows by solving
an |Ecc|-size independent set problem on Gpc and update the
routing tables of the core communications.
E. Cost Analysis for Multiplexers and Demutiplexers
To develop a fault-tolerance topology, we introduce demulti-
plexers (DEMUX) and multiplexers (MUX) for the source cores
and the sink cores, respectively, of the communications flows.
The routing paths can be selected by sending to the DEMUXs
and MUXs control signals.
Because we assume source-routing strategy, where the routing
paths are stored in a routing table in the source core side of the
communication flow. Each digit of the routing information is
used in turn to select the output port at each step of the route,
as if the address itself was the routing header determined from
a source-routing table [35]. Hence, for a demultiplexer with the
input from a core (for example, the DEMUX connected to c7 in
Fig.10), the control signals can be from the routing bits, and for
a demultiplexer with the input from a switch (for example, the
DEMUX connected to switch 1 in Fig.10), the control signals
can also be generated by the switch according to the routing
digit in the head flit of packet. For a multiplexer, we can send
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 10
one-bit enable signal along with each input data, and the control
signals of the multiplexer can be generated based on the enable
signals using a simple logic circuit, which includes several logic
gates. For K-fault tolerance, we need K+ 1 enable signals. In
practical designs, K will be very small (K ≤ 3 in this work),
and hence, the power and area overhead is very small. Notice
that the MUX and DEMUX only exist at the starting point and
ending point of routing paths.
In the following, we analyze the costs of DEMUX andMUX .
We set the bit-width to 32 bits and synthesize DEMUX and
MUX with different sizes based on the 65nm process technology
using commercial logic synthesis tools. The power consumption
is shown in Table III. From the table, we can see that the
power consumption of DEMUX and MUX is at least two order
of magnitude less than that of the switch. Consequently, the
power consumption from introducing one more DEMUX and
MUX port is much less than the power consumption reduced
by removing a switch port.
TABLE III: Power consumption of DEMUX and MUX
Size Leakage power(µW ) Total Power(mW )
MUX 2 : 1 1.1269e-02 2.2274e-03
MUX 3 : 1 1.2971e-02 2.9082e-02
MUX 4 : 1 2.1191e-02 5.3280e-02
MUX 5 : 1 2.6081e-02 6.3278e-02
MUX 6 : 1 3.0442e-02 7.1425e-02
DEMUX 1 : 2 1.9218e-02 1.8354e-03
DEMUX 1 : 3 2.6286e-02 3.4385e-02
DEMUX 1 : 4 1.6736e-02 4.8588e-02
DEMUX 1 : 5 1.5607e-02 5.0520e-02
DEMUX 1 : 6 1.7771e-02 6.0453e-02
VI. LINK-FAULT TOLERANCE
In this section, we consider the generation of K-link-fault-
tolerance network topology, which is a special case of the gen-
eralized fault-tolerance topology. If only link failures between
switches are considered, the mapping from cores to switches
exhibits a many-to-one relationship. We propose an ILP based
method to simultaneously solve the core mapping problem and
the routing path allocation problem to improve the quality of
the solutions.
Here, we define a routing path graph Grp(Vrp,Erp) to repre-
sent the possible connections between the cores and the switches
and the possible physical links between the switches.
Definition 2: Routing Path Graph: Grp(Vrp,Erp) is directed,
and Vrp =Vc∪Vs and Erp =Vs×Vs∪Vc×Vs∪Vs×Vc.
In the following, we give an ILP formulation to find K+ 1
edge-disjoint routing paths in Grp for all the communication
flows. The objective is to minimize the power consumption
of the NoC topology with a switch size constraint, bandwidth
constraints, and latency constraints.
The initial number of switches nsw is determined using
method similar to the one in [34]. The binary variables x
(i, j,k)
uv
and duv are defined similar to those in the formulation (5).
The K-link-fault-tolerant topology generation problem can be
formulated as the following integer programming problem:
Min ∑
u∈Vs
Psw(ipu,opu)+Plink (8)
s.t. ∑
v:(u,v)∈Erp
x
(i, j,k)
uv − ∑
v:(u,v)∈Erp
x
(i, j,k)
vu =


1, if u= ci;
0, if u ∈Vs;
−1, if u= c j;
∀(i, j) ∈ Ecc,k ∈ [0,K] (8a)
∑
k∈[0,K]
x
(i, j,k)
uv ≤ 1,∀(u,v) ∈ Erp,(i, j) ∈ Ecc (8b)
∑
u∈Vc
duv ≤ max size− 1,∀v ∈Vs (8c)
ipu ≤ max size, opu ≤ max size (8d)
∑
(u,v)∈Vs×Vs
x
(i, j,0)
uv + 1≤ li, j,∀(i, j) ∈ Ecc, (8e)
∑
(i, j)∈Ecc,k∈[0,K]
x
(i, j,k)
uv ·wi, j ≤ BWmax,∀(u,v) ∈Vrp (8f)
∑
v:(u,v)∈Vc×Vs
duv = 1,∀u ∈Vc (8g)
∑
v:(v,u)∈Vs×Vc
dvu = 1,∀u ∈Vc (8h)
x
(i, j,k)
uv ∈ {0,1},∀(u,v) ∈ Erp,(i, j) ∈ Ecc,k ∈ [0,K] (8i)
duv ∈ {0,1},∀(u,v) ∈ Erp (8j)
Psw(ipu,opu) and Plink are computed using a method similar
to the one in Section IV-B1. The constraint (8a) defines a path
from s = ci to t = c j. The constraint (8b) ensures that the
K+ 1 paths are link-disjoint. Next, we use the (8c) constraint
to ensure that each switch has at least one port for connecting
to other switches and the constraint (8d) defines the max size
for each switch. The constraint (8e) is the latency constraint,
which means that the default path (k=0) passes through at
most li, j switches. The constraint (8f) means that the bandwidth
requirements of the communication flows going through the
physical link (u,v) must be less than the BWmax. The constraints
(8g) and (8h) ensure that each core connects exactly one switch
and the constraints (8i) and (8j) define the binary variables.
VII. EXPERIMENT
The proposed algorithms have been implemented using C++
on a Linux 64-bit workstation (Intel 2.0 GHz, 64 GB RAM). All
the ILP-based formulations are solved using Gurobi [43]. In the
first set of experiments, we analyzed the hardware consumption
of fault-tolerant topologies. The second set of experiments
show the effectiveness of the port sharing. In the third set of
experiments, we compared the proposed method for generating
link-fault-tolerant topologies with those of previous studies.
A. Hardware Cost Analysis of Fault Tolerance
In this experiment, the bandwidth constraint BWmax is set
at 3000MB/s and the maximum number of ports on switches,
max size, is set to 10. ORION 3.0 [40] was used for estimating
the switch power and the model from [41] was used for
estimating the link power.
Table IV shows the comparison between the non-fault-
tolerant topologies and the fault-tolerant topologies with K =
1, K = 2 and K = 3, respectively. The column SwitchNum,
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 11
TABLE IV: Comparison between Non-Fault-Tolerance & K-Fault-Tolerance
K Bench.
SwitchNum LinkNum Power(mW)
Time(s)
NFT FT Inc NFT FT Inc NFT FT Inc
K = 1
D 36 8 10 25.00% 18 23 27.78% 256.426 516.979 101.61% 2.908
D 43 10 13 30.00% 26 32 23.08% 330.912 622.819 88.21% 5.102
D 50 12 14 16.67% 26 34 30.77% 341.88 692.739 102.63% 8.066
D 70 16 22 37.50% 48 71 47.92% 574.434 1128.925 96.53% 34.634
Average - - 27.29% - - 32.39% - - 97.24% -
K = 2
D 36 8 15 87.50% 18 39 116.67% 256.426 810.358 216.02% 19.632
D 43 10 19 90.00% 26 49 88.46% 330.912 932.885 181.91% 36.554
D 50 12 21 75.00% 26 59 126.92% 341.880 1099.778 221.69% 68.142
D 70 16 36 125.00% 48 117 143.75% 574.434 1686.243 193.55% 674.757
Average - - 94.38% - - 118.95% - - 203.29% -
K = 3
D 36 8 19 137.50% 18 48 166.67% 256.426 1068.731 316.78% 51.054
D 43 10 26 160.00% 26 69 165.38% 330.912 1252.793 278.59% 162.595
D 50 12 30 150.00% 26 78 200.00% 341.88 1414.200 313.65% 465.502
D 70 16 47 200.00% 48 157 227.08% 574.434 2240.320 290.00% 5565.602
Average - - 170.00% - - 189.78% - - 299.76% -
LinkNum, and Power denote the number of switches, number of
links, and the power, respectively. The column Time denotes the
running time of the program. Additionally, the column NFT and
FT respectively represent the non-fault-tolerant topologies and
fault-tolerant topologies, and the column Inc shows the ratios.
As K increased from 1 to 3, the power consumption is
increased by 97.24%, 203.29%, and finally to 299.76% com-
pared to NFT (K = 0). Fig.12 shows that the increase in
the power consumption is approximately linear with K for all
benchmarks, because the power mainly comes from the switches
and communication traffic. The increase in the number of both
switches and links is also approximately linear with K.
濃
濈濃濃
濄濃濃濃
濄濈濃濃
濅濃濃濃
濅濈濃濃
濄 濅 濆 濇
濄濣
濫濙
濦澔澜
濡濋
澝
濞澟澥
濗濲濆濉 濗濲濇濆 濗濲濈濃 濗濲濊濃
Fig. 12: Power in different K-fault-tolerant topologies.
Fig.13 shows the result of floorplan and topology for the
(a) 濆
濇
濄濃
Switch 3Switch 2
Switch 1 Switch 0
濃 濈
濄
濅
濊
濋
濉
濆
濇
濄濃
濄濅
濌
濄濄
(b)
Fig. 13: One-switch-fault-tolerant topology for
MP3EncMP3Dec: (a) floorplan. (b) topology.
testbench MP3EncMP3Dec.
B. Analysis of Port Sharing
Table V shows the effectiveness of the port sharing. The
columns SwitchNum, InportNum, OurportNum, and Power
denote the number of switches, the number of input ports,
the number of output ports, and the power, respectively. The
columns NPT and PT represent the results with/without port
sharing, and the column Dec shows the reduction of PT
compared to NPT .
For one-fault-tolerance, the number of input ports, output
ports, and the power can be reduced by 13.55%, 12.37% and
18.08%, respectively. As K increases, the results show more
reduction in the switch power consumption, which demonstrates
the effectiveness of port sharing.
C. Link-fault Tolerance
1) One-link-fault tolerance: In this subsection, we compare
the proposed framework to the FTTG method in [34], which
includes a one-link-fault switch topology generation followed
by a simulated annealing based core mapping, and the de
Bruijn Digraph (DBG) based method [44] for the one-link-fault
tolerance case.
To compare our results to that of previous studies [34] [44],
the link power of the communication flow (i, j) on the edge
(u,v) ∈ Erp are evaluated by Plk(i, j,u,v) = Ebit · wi, j · Du,v.
Notice that there are no extra costs introduced for opening a
new physical link.
a. Comparison to the FTTG method: In this work, we
stipulate that physical links are directed whereas the FTTG
method uses bi-directional physical links. Hence, we degenerate
the directed graph to an undirected graph by adding to the ILP
formulation (8) the following constraint:
duv− dvu = 0,∀(u,v) ∈ Erp (9)
Additionally, the objective function in FTTG is to minimize
the energy consumption of the network topology, which is
estimated based on the shortest paths (called the default path in
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 12
TABLE V: Comparison between NonPortSharing and PortSharing with different K
K Bench. SwitchNum
Input port Output port Power(mW)
NPT PT Dec NPT PT Dec NPT PT Dec
K = 1
D 36 10 87 74 14.94% 89 77 13.48% 516.979 414.609 19.80%
D 43 13 106 91 14.15% 114 101 11.40% 622.819 506.410 18.69%
D 50 14 118 101 14.41% 124 108 12.90% 692.739 558.18 19.42%
D 70 22 187 167 10.70% 197 174 11.68% 1128.925 966.166 14.42%
Average - - - 13.55% - - 12.37% - - 18.08%
K = 2
D 36 15 135 107 20.74% 138 114 17.39% 810.358 583.529 27.99%
D 43 19 160 122 23.75% 172 142 17.44% 932.885 641.725 31.21%
D 50 21 185 140 24.32% 194 158 18.56% 1099.778 746.669 32.11%
D 70 36 291 238 18.21% 306 260 15.03% 1686.243 1277.790 24.22%
Average - - - 21.76% - - 17.11% - - 28.88%
K = 3
D 36 19 176 133 24.43% 180 143 20.56% 1068.731 717.322 32.88%
D 43 26 217 165 23.96% 233 162 30.47% 1252.793 844.043 32.63%
D 50 30 246 170 30.89% 258 186 27.91% 1414.200 844.494 40.28%
D 70 47 389 297 23.65% 409 314 23.23% 2240.32 1545.64 31.01%
Average - - - 25.73% - - 25.54% - - 34.20%
[34]). To make a fair comparison to the FTTG method, we use
the same objective function as follows.
Min ∑
(i, j)∈Ecc
{Ri, j ∗ERbit +Li, j ∗ELbit} ∗wi, j. (10)
In the formula, ERbit represents the bit energy consumption of
the switches and ELbit represents the bit energy consumption of
the unit length links [34]. Li, j = ∑(u,v)∈Erp duv is the number
of physical links used by the shortest routing path of the
communication flow (i, j) and Ri, j = Li, j− 1 is the number of
switches on the shortest routing path.
Six widely used benchmarks were used for the comparisons.
The port number of switches max size was set to four, which is
the same as that in [34]. Because we cannot obtain the value of
the bandwidth constraint in [34], the bandwidth constraint fmax
is set to 3000MB/s, which corresponds to a 32 bit physical
link operating at 750MHz. The proposed ILP-based method is
applied to generate a one-fault-tolerant topology with the switch
number varying from rmin to rmax. For fair comparisons, we use
the energy estimation model in [34]. The energy consumption
of the switches and links are set at 3.20 pJ/Kb and 4.78
pJ/Kb/nm, and the length of the links is set to 1 mm.
TABLE VI: Comparisons to FTTG [34]
Benchmark
Energy(mJ) Average Hop Count
FTTG Ours redu FTTG Ours redu
MPEG4 61.10 50.13 17.95% 1.23 1.23 0
VOPD 65.15 49.55 23.94% 0.95 0.87 8.421%
MWD 19.14 15.68 18.08% 0.58 0.54 6.897%
263Dec 0.297 0.285 4.04% 0.93 0.71 23.655%
263Enc 3.74 3.76 -0.54% 0.75 0.83 -10.667
MP3Enc 0.254 0.254 0 0.76 0.69 9.211%
Average - - 10.58% - - 6.25%
The experimental results are listed in Table VI. The first
column is the name of benchmark. Columns 2 and 3 provide
the energy consumption of the fault-tolerant network topology
generated using the FTTG algorithm and the proposed ILP
based method, respectively. The energy is calculated based on
the shortest path between the two alternative paths, which is
the default routing path [34]. The benchmark MP3Enc refers
to the MP3EncMP3Dec because of the limited space in the
table. Columns 5, 6, and 7 are the comparison of the two
methods on the average hop count (AHC). Furthermore, column
redu presents the reduction of the performance index compared
with the FTTG algorithm. Compared with FTTG algorithm,
the proposed method can reduce the energy consumption by
10.58% on average. In addition, the proposed method can also
reduce the hop count by 6.25% on an average. This is because
the switch topology and the core mapping strongly depends on
each other and we formulate the two subproblems as a single
ILP model, in which we can determine the best solution for the
whole problems.
b. Comparison to the DBG based method: The DBG method
[44] generates a link-fault-tolerance topology without consider-
ation of the position of the cores and switches. Hence, we set
the distances Duv to 1 mm to calculate the power consumption.
We implemented the 2-D topology generation algorithm
(DBG) in [44] and made a comparison. The port number of
switches max size is set to 10. Six widely used benchmarks and
three synthetic benchmarks, D 36, D 43,and D 50 are used in
the experiments.
The experimental results are listed in Table VII, where the
average hop count is equal to the number of switches along
a path plus one. The proposed method can reduce the power
and average hop count by 21.72% and 9.35%, respectively.
The synthetic benchmarks use one more switch compared to
the DBG method; however, the switch size is smaller, which
results in a power reduction. On the other hand, the number of
links can be reduced by 45.46% on an average. In the DBG
method, each switch has two links for connecting to other
switches. This results in many redundant links. Actually, there
are some pairs of communication cores allocated on the same
router which do not experience the link-fault-tolerance issue.
In the proposed method, we consider only the necessary links
between the switches. As the number of switches is increased,
the diameter of the DGB graph increases even faster, which
cause the average hop count to become much larger, especially
for the synthetic benchmarks D 36, D 43, and D 50.
Fig.14 shows the result of the floorplan and topology for the
testbench MP3EncMP3Dec.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 13
TABLE VII: Comparisons to DBG [44]
Benchmark
SwitchNum(mJ) LinkNum Power(mW) Average Hop Count
DBG Ours reduction DBG Ours reduction DBG Ours reduction DBG Ours reduction
VOPD 3 2 33.33% 6 3 50% 107.914 85.985 20.32% 2.20 2.13 3.18%
MPEG4 3 3 0 6 3 50% 107.914 83.704 22.43% 2.15 2.31 -7.44%
MWD 3 3 0 6 3 50% 107.905 83.690 22.44% 2.21 2.15 2.71%
MP3EncMP3Dec 3 3 0 6 3 50% 113.810 84.590 25.67% 2.29 2.08 9.17%
263Dec 3 3 0 6 3 50% 120.370 94.450 21.53% 2.33 2.14 8.15%
263Enc 3 3 0 6 3 50% 107.900 78.680 27.08% 2.15 2.17 -0.93%
D 36 4 5 -25% 8 5 37.5% 294.581 238.200 19.14% 2.77 2.42 12.64%
D 43 5 6 -20% 10 7 30% 355.900 288.426 18.96% 2.95 2.28 22.71%
D 50 6 7 -16.67% 12 7 41.67% 399.593 328.175 17.87% 3.62 2.39 33.98%
Average - - -6.85% - - 45.46% - - 21.72% - - 9.35%
(a)
Switch 1
!
"#
$
% &
' (
%!
%&Switch 2Switch 0
)
%%
*
(b)
Fig. 14: One-link-fault tolerance for MP3EncMP3Dec:
(a)Result of floorplan; (b)Result of topology
2) Multiple-link-fault tolerance: The proposed ILP-based
method can be applied to generate K-link-fault tolerance net-
work topologies.
Table VIII show the results. The benchmark MP3E/D refers
to the MP3EncMP3Dec. The columns SwitchNum, LinkNum,
and Power denote the number of switches, the number of
links, and the power, respectively. The columns NFT , 1FT ,
2FT , and 3FT denote the results of non-fault tolerance, one-
fault tolerance, two-fault tolerance, and three-fault tolerance,
respectively.
Compared to NFT , one-fault tolerance topologies use the
same number of switches and three times as many links on
an average, and the power overhead of one-fault tolerance
topologies is 11.9%. Compared to NFT , the power consumption
increases by 23.8% in 2FT on average while it is 52.1% in
3FT , because more switches and links are used to generate
fault-tolerance topologies. The increase in power consumption
is approximately linear with K.
VIII. CONCLUSIONS
In this paper, we presented a K-fault-tolerant topology gen-
eration method for ASNoC with physical link failures and
switch failures. First, an convex-cost flow and ILP based method
was proposed to generate a network topology in which each
communication flow has at least K+ 1 switch-disjoint routing
paths, which provide K-fault tolerance. Second, to reduce the
switch sizes, we proposed sharing the switch ports for the
connections between the cores and switches, and proposed
heuristic methods to solving the port sharing problem. Finally,
we also proposed an ILP-based method to simultaneously solve
the core mapping and routing path allocation problems when
only the physical link failures are considered. The experimental
results showed the effectiveness of the proposed method.
REFERENCES
[1] S. Borkar, “Thousand core chips: A technology perspective,” in Proceed-
ings of the 44th Annual Design Automation Conference, 2007, pp. 746–
749.
[2] W. J. Dally and B. Towles, “Route packets, not wires: On-chip intercon-
nection networks,” in Proc. 38th Annual Design Automation Conference,
2001, pp. 684–689.
[3] T. Bjerregaard and S. Mahadevan, “A survey of research and practices of
network-on-chip,” ACM Computing Survey, vol. 38, no. 1, June 2006.
[4] R. Marculescu, U. Y. Ogras, L. S. Peh, N. E. Jerger, and Y. Hoskote,
“Outstanding research problems in noc design: System, microarchitecture,
and circuit perspectives,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, vol. 28, no. 1, pp. 3–21, Jan 2009.
[5] F. Akopyan, J. Sawada, A. Cassidy, R. Alvarez-Icaza, J. Arthur, P. Merolla,
N. Imam, Y. Nakamura, P. Datta, G. J. Nam, B. Taba, M. Beakes,
B. Brezzo, J. B. Kuang, R. Manohar, W. P. Risk, B. Jackson, and D. S.
Modha, “Truenorth: Design and tool flow of a 65 mw 1 million neuron
programmable neurosynaptic chip,” IEEE Transactions on Computer-
Aided Design of Integrated Circuits and Systems, vol. 34, no. 10, pp.
1537–1557, Oct 2015.
[6] B. V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A. R. Chan-
drasekaran, J. M. Bussat, R. Alvarez-Icaza, J. V. Arthur, P. A. Merolla,
and K. Boahen, “Neurogrid: A mixed-analog-digital multichip system for
large-scale neural simulations,” Proceedings of the IEEE, vol. 102, no. 5,
pp. 699–716, May 2014.
[7] X. Liu, W. Wen, X. Qian, H. Li, and Y. Chen, “Neu-noc: A high-efficient
interconnection network for accelerated neuromorphic systems,” in 23rd
Asia and South Pacific Design Automation Conference (ASP-DAC), Jan
2018, pp. 141–146.
[8] S. Tosun and et al., “Application-specific topology generation algorithms
for network-on-chip design,” IET computers & digital techniques, 2012.
[9] S. Murali, P. Meloni, F. Angiolini, D. Atienza, S. Carta, L. Benini,
G. De Micheli, and L. Raffo, “Designing application-specific networks on
chips with floorplan information,” in Proceedings of the 2006 IEEE/ACM
International Conference on Computer-aided Design, pp. 355–362.
[10] C. Constantinescu, “Trends and challenges in vlsi circuit reliability,” IEEE
Micro, vol. 23, no. 4, pp. 14–19, July 2003.
[11] S. Borkar, “Designing reliable systems from unreliable components: the
challenges of transistor variability and degradation,” IEEE Micro, vol. 25,
no. 6, pp. 10–16, Nov 2005.
[12] J. Keane and C. H. Kim, “Transistor aging,” IEEE Spectrum, vol. 48,
no. 5, pp. 28–33, 2011.
[13] Y. Ren, L. Liu, S. Yin, J. Han, Q. Wu, and S. Wei, “A fault tolerant noc ar-
chitecture using quad-spare mesh topology and dynamic reconfiguration,”
Journal of Systems Architecture, vol. 59, no. 7, pp. 482–491, 2013.
[14] Y.-C. Chang, C.-T. Chiu, S.-Y. Lin, and C.-K. Liu, “On the design
and analysis of fault tolerant noc architecture using spare routers,” in
Proceedings of the 16th Asia and South Pacific Design Automation
Conference. IEEE Press, 2011, pp. 431–436.
[15] Y. Ren, L. Liu, S. Yin, Q. Wu, S. Wei, and J. Han, “A vlsi architec- ture
for enhancing the fault tolerance of noc using quad-spare mesh topology
and dynamic reconfiguration,” in Proc. IEEE Int. Symp. Circuits Syst.,
2013, pp. 1793–1796.
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. , NO. 14
TABLE VIII: Results for K-link-fault (K = 0, 1, 2, 3) tolerance
Benchmark
SwitchNum LinkNum Power(mW) Time(s)
NFT 1FT 2FT 3FT NFT 1FT 2FT 3FT NFT 1FT 2FT 3FT NFT 1FT 2FT 3FT
MPEG4 3 3 4 5 1 3 5 7 74.836 78.532 93.246 106.269 1.222 10.92 62.402 1091.638
VOPD 3 3 4 5 1 3 5 7 71.157 79.137 82.340 111.381 1.005 7.355 39.522 315.652
MWD 3 3 4 5 1 3 5 7 67.018 85.98 86.067 111.389 1.102 5.754 124.97 1396.625
263Dec 3 3 4 5 1 3 5 9 83.600 89.63 106.25 116.110 1.216 38.389 952.437 4314.34
263Enc 3 3 4 5 1 3 5 7 71.136 82.103 93.592 116.370 0.785 3.266 36.619 192.64
MP3E/D 3 3 4 5 3 3 5 9 77.040 82.41 89.300 114.930 0.810 6.235 134.183 3121.806
Average 1 1 1.33 1.67 1 2.25 3.75 5.75 1 1.119 1.238 1.521 - - - -
[16] N. Chatterjee, S. Chattopadhyay, and K. Manna, “A spare router based
reliable network-on-chip design,” in Proc. IEEE Int. Symp. Circuits Syst.,
2014, pp. 1957–1960.
[17] A. Hosseini, T. Ragheb, and Y. Massoud, “A fault-aware dynamic routing
algorithm for on-chip networks,” in IEEE International Symposium on
Circuits and Systems, 2008, pp. 2653–2656.
[18] P. Ren, X. Ren, S. Sane, M. A. Kinsy, and N. Zheng, “A deadlock-free
and connectivity-guaranteed methodology for achieving fault-tolerance in
on-chip networks,” IEEE Transactions on Computers, vol. 65, no. 2, pp.
353–366, Feb 2016.
[19] K. Srinivasan, K. S. Chatha, and G. Konjevod, “Linear-programming-
based techniques for synthesis of network-on-chip architectures,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14,
no. 4, pp. 407–420, 2006.
[20] S. Yan and B. Lin, “Application-specific network-on-chip architecture
synthesis based on set partitions and steiner trees,” in Proceedings of the
2008 Asia and South Pacific Design Automation Conference, pp. 277–282.
[21] C. Seiculescu, S. Murali, L. Benini, and G. De Micheli, “Sunfloor 3d: A
tool for networks on chip topology synthesis for 3-d systems on chips,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 29, no. 12, pp. 1987–2000, 2010.
[22] B. Yu, S. Dong, S. Chen, and S. Goto, “Floorplanning and topology
generation for application-specific network-on-chip,” in 2010 15th Asia
and South Pacific Design Automation Conference (ASP-DAC),, 2010, pp.
535–540.
[23] J. Cong, Y. Huang, and B. Yuan, “Atree-based topology synthesis for on-
chip network,” in 2011 IEEE/ACM International Conference on Computer-
Aided Design (ICCAD),, pp. 651–658.
[24] B. Huang, S. Chen, W. Zhong, and T. Yoshimura, “Application-specific
network-on-chip synthesis with topology-aware floorplanning,” in 2012
25th Symposium on Integrated Circuits and Systems Design (SBCCI),
2012, pp. 1–6.
[25] W. Zhong, S. Chen, B. Huang, T. Yoshimura, and S. Goto, “Floorplanning
and topology synthesis for application-specific network-on-chips,” IEICE
Transactions on Fundamentals of Electronics, Communications and Com-
puter Sciences, vol. 96, no. 6, pp. 1174–1184, 2013.
[26] W. Zhong, T. Yoshimura, B. Yu, S. Chen, S. Dong, and S. Goto, “Cluster
generation and network component insertion for topology synthesis of
application-specific network-on-chips,” IEICE transactions on electronics,
vol. 95, no. 4, pp. 534–545, 2012.
[27] K. S.-M. Li, “Cusnoc: Fast full-chip custom noc generation,” IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21,
no. 4, pp. 692–705, 2013.
[28] V. Todorov, D. Mueller-Gritschneder, H. Reinig, and U. Schlichtmann,
“Deterministic synthesis of hybrid application-specific network-on-chip
topologies,” IEEE Transactions on Computer-Aided Design of Integrated
Circuits and Systems, vol. 33, no. 10, pp. 1503–1516, 2014.
[29] J. Huang, S. Chen, W. Zhong, W. Zhang, S. Diao, and F. Lin, “Floorplan-
ning and topology synthesis for application-specific network-on-chips with
rf-interconnect,” ACM Transactions on Design Automation of Electronic
Systems (TODAES), vol. 21, no. 3, p. 40, 2016.
[30] P. Mukherjee, S. D’souza, and S. Chattopadhyay, “Area
constrained performance optimized asnoc synthesis with thermalaware
white space allocation and redistribution,” Integration, the VLSI
Journal, vol. 60, pp. 167 – 189, 2018. [Online]. Available:
http://www.sciencedirect.com/science/article/pii/S0167926017301165
[31] A. E. Zonouz, M. Seyrafi, A. Asad, M. Soryani, M. Fathy, and R. Berangi,
“A fault tolerant noc architecture for reliability improvement and latency
reduction,” in IEEE 12th Euromicro Conference on Digital System Design,
Architectures, Methods and Tools, 2009, pp. 473–480.
[32] N. Chatterjee, S. Chattopadhyay, and K. Manna, “A spare router based
reliable network-on-chip design,” in Circuits and Systems (ISCAS), 2014
IEEE International Symposium on. IEEE, 2014, pp. 1957–1960.
[33] Y. Ren, L. Liu, S. Yin, J. Han, Q. Wu, and S. Wei, “A fault tolerant noc ar-
chitecture using quad-spare mesh topology and dynamic reconfiguration,”
Journal of Systems Architecture, vol. 59, no. 7, pp. 482–491, 2013.
[34] S. Tosun, V. B. Ajabshir, O. Mercanoglu, and O. Ozturk, “Fault-tolerant
topology generation method for application-specific network-on-chips,”
IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 34, no. 9, pp. 1495–1508, 2015.
[35] W. J. Dally and B. Towles, Principles and Practices of Interconnection
Networks. San Francisco: Morgan Kaufmann, 2004.
[36] W. Zhong, S. Chen, and et al., “Floorplanning and topology synthesis for
application-specific network-on-chips,” IEICE Trans. Fund., 2013.
[37] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin, “Network flows: theory,
algorithms, and applications,” 1993.
[38] A. Schrijver, Combinatorial Optimization: Polyhedra and Efficiency.
Berlin: Springer Science & Business Media, 2002, vol. 24.
[39] Q. Xu, S. Chen, X. Xu, and B. Yu, “Clustered fault tolerance tsv planning
for 3d integrated circuits,” IEEE Transactions on Computer-Aided Design
of Integrated Circuits and Systems, 2017.
[40] A. B. Kahang, B. Lin, and S. Nath, “ORION3.0: A Comprehensive noc
router estimation tool,” IEEE Embedded Systems Letters, vol. 7, no. 2, pp.
41–45, June 2015.
[41] J. Huang, W. Zhong, Z. Li, and S. Chen, “Lagrangian relaxation-based
routing path allocation for application-specific network-on-chips,” Inte-
gration, the VLSI Journal, 2017.
[42] P. M. Pardalos and G. P. Rodgers, “A branch and bound algorithm for the
maximum clique problem,” Computers & operations research, vol. 19,
no. 5, pp. 363–375, 1992.
[43] I. Gurobi Optimization, “Gurobi optimizer reference manual,” 2015.
[Online]. Available: http://www.gurobi.com
[44] K. S.-M. Li and S.-J. Wang, “Design methodology of fault-tolerant
custom 3d network-on-chip,” ACM Transactions on Design Automation
of Electronic Systems (TODAES), vol. 22, no. 4, p. 63, 2017.
