Abstract-The Network-on-Chips based communication architecture is a promising candidate for addressing communication bottlenecks in many-core processors and neural network processors. In this work, we consider the generalized fault-tolerance topology generation problem, where the link (physical channel) or switch failures can happen, for application-specific network-onchips (ASNoC). With a user-defined maximum number of faults, K, we propose an integer linear programming (ILP) based method to generate ASNoC topologies, which can tolerate at most K faults in switches or links. Given the communication requirements between cores and their floorplan, we first propose a convexcost flow based method to solve a core mapping problem for building connections between the cores and switches. Second, an ILP based method is proposed to solve the routing path allocation problem, where K + 1 switch-disjoint routing paths are allocated for every communication flow between the cores. Finally, to reduce switch sizes, we propose sharing the switch ports for the connections between the cores and switches and formulate the port sharing problem as a clique-partitioning problem, which is solved by iteratively finding the maximum cliques. Additionally, we propose an ILP-based method to simultaneously solve the core mapping and routing path allocation problems when only physical link failures are considered. Experimental results show that the power consumptions of fault-tolerance topologies increase almost linearly with K because of the routing path redundancy for fault tolerance. When both switch faults and link faults are considered, port sharing can reduce the average power consumption of faulttolerance topologies with K = 1, K = 2 and K = 3 by 18.08%, 28.88%, and 34.20%, respectively. When considering only the physical link faults, the experimental results show that compared to the FTTG (fault-tolerant topology generation) algorithm, the proposed method reduces power consumption and hop count by 10.58% and 6.25%, respectively; compared to the DBG (de Bruijn Digraph) based method, the proposed method reduces power consumption and hop count by 21.72% and 9.35%, respectively.
I. INTRODUCTION
With the constant scaling of semiconductor manufacturing technologies, hundreds to thousands of processing cores can be easily integrated on a single chip [1] . Network-on-Chips (NoCs) have emerged as an attractive solution to the interconnection challenges of heterogeneous System-on-Chip designs [2] [3] [4] and neuromorphic computing systems [5] [6] [7] because NoCs have good scalability and enable efficient and flexible utilization of communication resources when compared to the traditional point-to-point links and buses. NoCs convey messages (in packets) through a distributed system of routers (sometime called switches in ASNoCs) interconnected by links, and these routers may include network interfaces for connecting cores to routers. In this work, we focus on ASNoCs, where the customized irregular network topologies are used because of their low energy consumption and low area overhead [8] , [9] .
With successive technology node shrinking, the transistor size on chips has been scaled down to a few nanometers, where radiation, electromagnetic interference, electrostatic discharge, aging, process variation and dynamic temperature variation are the major causes of failures in MOSFET based circuits [10] [11] [12] . It is extremely difficult for a heterogeneous system to guarantee long-term product reliability because of a combination of these factors. To maintain network connectivity and correct packet-switching operations, we consider faulttolerance issues of the network components in ASNoCs. NoC with regular topologies can achieve fault tolerance by providing alternative routing paths when messages or packets encounter faulty network components. However, in ASNoCs, the path diversity is greatly reduced for lowering the energy and area overhead of the network components. Consequently, we have to introduce structural redundancies, such as switches, ports, links, and network interface, to address these faults [13] - [16] . Then, alternative routing paths are used for the packet switching between the cores, thus bypassing the faulty region [17] . Note that, generally, fault control in NoC involves two phases: fault diagnosis and fault tolerance [18] . Our research mainly focuses on the fault tolerance.
There are many previous works addressing the synthesis of ASNoC topologies [9] , [19] - [30] . However, these works rarely consider fault tolerance in the NoC topologies. In particular, the ASNoCs have low path diversities and cannot work normally if any hardware faults occur in the switches or links. In [31] , a fault tolerant NoC architecture was proposed, where the cores were linked to two switches instead of one, and a dynamically reconfigured routing algorithm was used to bypass faulty switches. Chatter et al. [32] proposed a fault-tolerant method based on router redundancy. They allocated a spare router for each router for fault tolerance, which increased the consumed power and area. In [33] , the authors placed a spare router for each 2 × 2 router block in the mesh topology and used multiplexers to switch the faulty routers to the intact routers, which could thereby decrease the power and area overheads compared to [32] . However, this method cannot be applied to ASNoC designs.
Tosun et al. [34] proposed a fault-tolerant topology generation (FTTG) method for ASNoC, which focused on permanent link and switch port failures. The authors attempted to add a minimum number of extra switches and links and use the min-cut algorithm to ensure that each switch and link were on a cycle, which provided at least two alternative routing paths to achieve adequate fault-tolerance. The NoC topologies are generated in two phases. In the first phase, the links between the switches (switch topologies) are constructed, and in the second phase, the links from the cores to switches are built (core mapping). However, the switch topology and the core mapping strongly depend on each other; consequently, it is challenging for the FTTG method to effectively explore the design space of the network topologies. Additionally, the FTTG can only generate one-fault tolerant topologies and cannot be applied toward generating multiple-fault tolerant network topologies.
Motivated by these arguments, we propose a method for generating ASNoC topologies with consideration of both switch faults and physical link faults, when given the communication requirements between the cores, floorplan of the cores, and maximum number of tolerable faults, K. The main contributions of this work are as follows.
1)
We propose a generalized fault-tolerant topology generation method with consideration of both switch faults and link faults. A convex-cost flow based method is used to solve the core mapping problem for building connections between the cores and switches, and an ILP based method is proposed to allocate K + 1 switch-disjoint routing paths for each communication flow. 2) To reduce the switch sizes, we propose sharing the switch ports for the connections between the cores and switches, and prove the conditions for port sharing on a switch. The port sharing problem on a switch is formulated as a clique-partitioning problem and heuristically solved by iteratively finding a set of maximum cliques and solving a maximum cardinality matching problem. Moreover, we propose a heuristic method, where a series of maximum independent set problems are solved for removing the conflicts caused by port sharing on multiple switches. 3) Additionally, we also propose an ILP-based method to simultaneously solve the core mapping and routing path allocation problems when only the physical link failures are considered.
Experimental results show that the power consumptions of fault-tolerance topologies increase almost linearly with K because of the routing path redundancies (See Fig.12 ). When both switch faults and link faults are considered, port sharing can respectively reduce the average power consumptions of the fault-tolerance topologies with K = 1, K = 2 and K = 3 by 18.08%, 28.88%, and 34.20%. When considering only the physical link faults, the experimental results show that, compared to the FTTG, the proposed method reduces power consumption and hop count by 10.58% and 6.25%, respectively; compared to the DBG based method, the proposed method reduce power consumption and hop count by 21.72% and 9.35%, respectively.
The remainder of this paper is organized as follows. Section II formulates the K-fault-tolerant ASNoC topology generation problem. The overview of the proposed framework is shown in Section III. The generalized K-fault-tolerant topology generation methodology is discussed in section IV-A, IV-B and V. Section VI discusses the generation method for link-faulttolerance topologies. The experimental results are provided in Section VII, followed by the conclusions in Section VIII.
II. PRELIMINARIES & PROBLEM FORMULATION

A. NoC Architecture
In this work, the ASNoC architectures are assumed to support packet-switched communications with source routing and wormhole flow control [35] . In the application-specific design, the communication characteristics are known a priori, and hence, a deterministic routing strategy is used; that is, the routing path for the communications is preallocated, which accordingly determines the topology of the NoC. The ASNoC topology architecture consists of two main components: switches and customized electrical links. The switches are used to route packets from the source to the destination, and the routing information is included in the packet to specify the address of the output port, to which the packet should be forwarded. Given the communication characteristics of an application, this work focuses on the generation of network topologies by preallocating the routing paths for the communication flows.
B. Problem Definition
Let V c = {c i |1 ≤ i ≤ n core } be the set of cores in an application. The communication requirements (or communication flow in this work) between the cores can be represented as a directed graph, G cc , and defined as follows.
Definition 1: G cc = (V c , E cc ) is directed. An edge (c i , c j ) ∈ E cc represents the communication from c i to c j . Besides, the bandwidth requirement of the communication flow from c i to c j is given by w i, j . Fig. 1 shows an example of G cc . In ASNoCs, the switches will be shared among the cores for data communications. If only link failures between the switches are considered [34] , the mapping from the cores to the switches is a many-to-one relationship, which is the same as the clustering problem in the traditional ASNoC synthesis [26] [20] . However, the mapping from cores to switches is a manyto-many relationship if both switch failures and link failures are considered. Let V s = {s i |1 ≤ i ≤ n sw } be the set of switches. We use the Cartesian products V c × V s , V s × V c , and V s × V s to represent all possible connections from the cores to the switches, from the switches to the cores, and from switches to switches, respectively.
The problem of generalized fault-tolerance topology generation for ASNoCs is defined as follows. Problem statement.
Given a core communication graph G cc , number of switches n sw , floorplan of the cores, and number of tolerable faults K, we attempt to determine the placement of the switches and construct a K-fault-tolerant ASNoC topology with minimization of the power consumption of the ASNoC under the following constraints:
• the latency constraint l i, j (number of hops) for each communication flow (c i , c j ) ∈ E cc , • the switch size constraint max size, which is the maximum number of ports that a switch could support given the NoC operating frequency, • and the bandwidth constraint BW max for the physical links, which is the product of the NoC frequency and bit-width of the physical links. The ASNoC topology can be represented as a direct graph G NT (V NT , E NT ), where V NT = V c ∪ V s , and E NT includes two types of edges:
solving the core mapping problem, corresponding to the connections between the cores and switches, • and a subset of edges L ss ⊆ V s × V s determined by solving the routing path allocation problem, corresponding to the physical links between the switches. In G NT , there are K + 1 switch-disjoint paths for each communication flow (c i , c j ) in G cc when both switch failures and link failures are considered.
In the K-fault tolerance structures, K times more switch ports are connected to each core for introducing routing path redundancies. These switch ports greatly increase the area and power consumption of switches. To reduce the number of the switch ports, we also solve a port sharing problem for each switch, which will be discussed in details in Section V.
III. OVERVIEW OF THE PROPOSED FRAMEWORK
Given the floorplan of n core cores, their communication requirements represented by G cc , and the number of switches N sw , the placement of the switches is determined using the method in [36] .
As discussed in Section II-B, the NoC topology generation problem mainly includes two subproblems: core mapping (CM) and routing path allocation (PA). We first map the cores to the switches using a min-cost-max-flow algorithm in Section IV-A. Second, the routing path allocation is solved using an ILP-based method in Section IV-B. If we fail to find K + 1 switch-disjoint paths for all the communication flows under the given constraints, the number of switches is increased by one and the generalized fault-tolerance topology generation problem is solved again. This procedure is repeated until all the communication flows have K + 1 switch-disjoint routing paths.
To generate the K-fault-tolerant topology, we connect each core to at least K + 1 switches by K + 1 ports, which greatly increases the power and area overheads of the switches. However, for each flow of each core, only one of all the K + 1 ports work for data communication. Consequently, the switch ports connecting different cores could be shared using multiplexers. In Section V, we prove the conditions for port sharing on a switch and propose a clique-partitioning formulation for the problem, which is solved using a heuristic method. Moreover, a heuristic method is proposed to remove the conflicts of routing path selection, caused by port sharing on multiple switches. Fig.2 illustrates the overall flow of generating fault-tolerance ASNoC topologies.
Floorplaning of Switches
Are all flows solved ?
!" #$% 
Core Mapping &Path Allocation
IV. FAULT-TOLERANCE TOPOLOGY GENERATION
A. Core Mapping
To generate K-fault-tolerant topology, we connect each core to at least K + 1 switches, and many switch ports are introduced for connecting the cores, accordingly. For a switch, the area increases quadratically with the port number, and the power increases superlinearly with the switch size. Consequently, a convex-cost flow based method is used to generate a core mapping with evenly distributed core-switch connections.
In the core mapping stage, we build connections from the source cores of the communication flows to the switches and connections from the switches to the sink cores of the communication flows. Here, we consider building connections from the source cores to the switches. The connections from the switches to the sink cores are built similarly.
To build a convex-cost flow model, we construct a directed graph G cs (V cs , E cs ), where
The capacity of an edge (b, c i ) ∈ {b → V c } is set to K + 1 if there is an outgoing communication flow from c i and is set to 0 otherwise. The capacities of the edges in {V s → t} are set to N cs = ⌊n core * (K + 1)/n sw ⌋ + 1, which is close to the average number of input ports that is used to connect cores on a switch. All the other edges have a capacity of 1.
The edges in {b → V c } have zero cost. we can map on one switch; For an edge (c i ,
where D c i ,s j is the distance between core c i and switch s j and E bit (set to 0.5 in this work) is the bit energy of unit wire length (1mm).
To make the connections from cores to switches evenly distributed among the switches, the costs of the edges in {V s → t} are defined to be a function of the number of flows x on the edges c (s j ,t) (x), which corresponds to a piecewise linear and convex function for the flow costs. Let 0
denote the breakpoints of the piecewise function and the costs vary linearly in the interval
In this work, the edge cost function c s j ,t (x) = 10x and the interval between adjacent breakpoints is set to 1. Consequently, the flow cost is calculated as 10x 2 . Such a convex-cost flow problem can be easily transformed into a traditional min-cost flow problem [37] .
According to the solution to the convex cost flow model, the edges in V c × V s that have non-zero flow will be selected as the connections from cores to switches.
After we map the cores to switches, we have determined the connections between the cores and the switches, denoted as
we define the switch ports connected to the source core c i as core inports and the switch ports connected to the sink core c j as core outports.
An example of core mapping for G cc in Fig.1 is shown in Fig.3 , where K = 1. Each source core is connected to multiple core inports and multiple core outports respectively through a demultiplexer (DEMUX) and a multiplexer (MUX).
B. ILP based Path Allocation
To generate K-fault tolerant topologies, we have to find K + 1 alternative switch-disjoint (node-disjoint) routing paths in a complete graph of switches G s (V s ,V s ×V s ) for each communication flow considering the costs of switches and links. To reduce the internally node-disjoint paths problem to an edge-disjoint paths problem [38] , which is easily formulated as a constrained min-cost multi-flow problem, we perform node splitting on G s and extend the graph for routing path allocation. Each switch node u ∈ V s is split into two nodes u and u ′ . A directed graph G pa (V pa , E pa ) is constructed as follows.
V s ∧ u ′ is the corresponding split node of u} and
If there is a directed edge from u to v in V s × V s , a corresponding directed edge from u ′ to v is added in E pa . In the following, we discuss how to find K + 1 edge-disjoint routing paths in G pa for all the communication flows. [39] .
Then, the size of a switch u, including the input port number ip u and output port number op u , can be calculated as follows.
cip u and cop u respectively represent the number of core inports and core outports, which have been determined in the core mapping stage. Consequently, the power consumption of the switches are estimated using Orion 3.0 [40] as follows [29] .
where T sw is a table mapping the input port number to power consumption, and C sw is a constant. The link power of a communication flow (i, j) on an edge (u, v) ∈ E link is determined by the communication requirement w i, j and the physical distance between u and v, D u,v . Considering the cost of opening new physical links, an extra cost C pl is introduced if the physical link between u and v does not exist. Therefore, the link power P sw (i, j, u, v) is calculated as follows [41] .
where E bit represents the bit energy of the electrical link.
2) ILP Formulation for Routing Path Allocation: To simplify the discussion, the communication flows in G cc are relabeled as (i, j) = (c i , c j ) ∈ E cc (Definition 1). The routing path allocation for ASNoCs with K-fault tolerance can be formulated as a constrained multiple flow problem G pa as follows.
In the formulation above, the set of constraints (5a) defines K + 1 unit flows (paths) for each communication flow. The constraint (5b) ensures that the K + 1 paths of any communication flow (i, j) are edge-disjoint. The constraints (5c) ensure that the latency constraint is satisfied for all K + 1 routing paths of each communication flow. The set of constraints (5d) represents the limited bandwidth of physical links and the set of constraints (5e) denotes the maximum port number for each switch.
The objective (5) is to minimize the total power consumption of switches and the total link power consumption of the default routing paths (k=0) for all communication flows.
The required runtime is unacceptable when directly solving the above large-scale ILP problem. To reduce the runtime of routing path allocation, we process the communication flows one by one in descending order of bandwidth requirements, which cause sub-optimal solutions but greatly reduce the runtime. Additionally, before processing each communication flow, the switch size constraint and the bandwidth constraint can be preprocessed by removing, from the graph G pa , the vertices corresponding to the switches that have a maximum size and the edges corresponding to the links that have no enough bandwidth. Therefore, the ILP formulation for the routing path allocation of a single communication flow (i, j) ∈ E cc is simplified as follows.
where the cost of links P lk (i, j, u, v) can be simply defined as follows.
3) An example: As mentioned above, we solve an ILP model for each communication flow (c i , c j ) in G cc . As shown in Fig.3 , we first allocate the routing paths of the communication flow (c 3 , c 0 ) and, a physical link from switch s 2 to s 0 is added, where the default path includes only one switch s 3 and the alternative path is from s 2 to s 0 ; After we allocate routing paths for another communication flow (c 4 , c 3 ), a physical link from switch s 1 to s 3 is added, where the default path includes only one switch s 2 and the alternative path is from s 1 to s 3 . Fig.3 shows the final network topology with one-fault tolerance, and the routing paths for all communication flows are shown in Table I . 
V. SWITCH PORT SHARING
In a K-fault tolerance structure, K + 1 core inports or/and K + 1 core outports are required for each core on no less than K + 1 switches, which greatly increase the area and power consumption of switches. Many switch ports are not used simultaneously because only one out of K + 1 routing paths is used for each communication flow at a time. To reduce the switch size, we propose the sharing of switch ports (on the same switch) between the routing paths from different communication flows, using multiplexers. Given the routing path allocation of communication flows, we prove a sufficient and necessary condition for the port sharing on a switch, and formulated the problem into a clique partitioning problem.
The port sharing aims at the sharing of core inports/core outports on the same switch. In this section, we first propose a twostage method to solve the core inport sharing problem, which is also applicable to core outport sharing. Second, we proposed a method to remove the conflicts caused by port sharing on multiple switches. Finally, an independent set based formulation is proposed for selecting routing paths for communication flows.
A. Conditions for Port Sharing on a Switch
In this subsection, given a network topology with K-faulttolerance and the corresponding routing path allocation, we derive the conditions for port sharing on one switch only. For clarity, the conditions for the inport sharing in one-faulttolerance and K-fault-tolerance are respectively discussed in Section V-A1 and Section V-A3.
1) Port Sharing in One-Fault-Tolerance Topologies: Suppose there are two core inports, IP 1 and IP 2 , on a switch s n , and two communication flows, f 1 and f 2 which are shown on the Fig.4(a) Fig.4(b) illustrates port sharing. Proof END. We construct a bipartite graph, IG(V, E), to represent the intersection relations between the routing paths of f 1 and f 2 as follows.
The vertex set includes all the routing paths of f 1 and f 2 except for p
Further, if two routing paths from P f 1 and P f 2 have a common vertex, there is an edge in E IG . That is, E IG = {(p
and they have a common vertex.}. It is obvious that IG(V, E)
is a bipartite graph. Let C(IG) be the maximum cardinality matching in IG(V, E). Then, we have the following conclusion.
Theorem 1: The two core inports on s n , IP 1 and IP 2 , respectively used by p will not violate the K-fault tolerance. Note that the routing paths from P f 1 (or P f 2 ) are vertex-disjoint (except for the source and sink core vertices). One fault causes at most two faulty paths, of which one path is from P f 1 and the other is from P f 2 and they have a common vertex. We have two situations considering p 0 are correct but only one of them can be used since they share one core inport. Let K 1 and K 2 respectively be the number of faulty paths in P f 1 and P f 2 . Without loss of generality, we suppose K 1 ≥ K 2 . When K 1 < K, there must be at least a correct routing path in both P f 1 and P f 2 and, hence, K faults are tolerant. When K 1 = K, we can conclude that 0 are faulty when s n is faulty. In this case, the paths in P f 1 and P f 2 are able to construct a K − 1 faulttolerance structure since the given topology is K-fault tolerant. Consequently, the network topology also keeps K-fault tolerance.
ONLY IF. We show that if C(IG) = K, then merging IP 1 and IP 2 will violate the K-fault tolerance. There will be perfect matching in IG(V, E) if C(IG) = K. In the perfect matching, each edge corresponds to an intersection of two routing paths from P f 1 and P f 2 . All 2K paths in P f 1 and P f 2 are faulty when K faults exactly occur on K intersected vertices. Consequently, p 
Proof END. (1). p f j 0 , j = 1, · · · , J are correct but only one of them can be used since they share one core inport. Let K j , j = 1, · · · , J, respectively be the number of faulty paths in P f j . Without loss of generality, we suppose that K j ≥ K j+1 , j = 1, · · · , J − 1. When K 1 < K, there must be at least a correct routing path in P f j , j = 1, · · · , J, and, hence, K-faults are tolerant. When K 1 = K, we can conclude that K j < K, j = 2, · · · , J. This is because K j = K 1 = K indicates that there will be perfect matching considering the intersection relations between the K paths from P f 1 and K paths from P f j and, hence, IP 1 and IP j cannot be merged according to Theorem 1, which is a contradiction. Accordingly, we have at least a correct path in P f j for communication flow f j , j = 2, · · · , J and p f 1 0 can be used for f 1 . Consequently, the network topology maintains K-fault tolerance.
(2). p f j 0 , j = 1, · · · , J are faulty when s n is faulty. In this case, the paths in P f j , j = 1, · · · , K, are able to construct a K − 1 fault-tolerance structure since the given topology is K-fault tolerant. Consequently, the network topology also maintains Kfault tolerance. Proof END.
Suppose there are J core inports, Based on the above theorem and corollaries, we conclude the following theorem.
Theorem 3: The J core inports on s n , IP j , j = 1, · · · , J, respectively used by m j , j = 1, · · · , J, communication flows, can be merged into a single inport without violating the property of K-fault tolerance if the merging relations between J inports form a clique.
B. Clique Partitioning for Port Sharing on a switch
Given a network topology and all routing paths for K-faulttolerance, we formulate the port sharing problem on a switch as a clique partitioning problem, where the clique number is minimized to reduce the switch size, according to Theorem 3.
For a switch, a graph G ps (V ps , E ps ) is constructed to represent the possible sharing relations between the core inports. The vertex set V ps = {IP i , i = 0, · · · , N} represents the set of core inports. An edge, (IP i , IP j ), is added to E ps if IP i and IP j can be shared with each other according to Corollary 1. Fig.8 shows an example of G ps and its two clique-partitioning. The solid edges are the clique edges. In Fig.8.(a) , two inports are required, respectively corresponding to a 2-vertex clique and 3-vertex clique. In Fig.8.(b) , three inports are required, respectively corresponding to two 2-vertex cliques and one 1-vertex clique.
Because the clique partitioning problem is an NP-hard problem, we propose a heuristic to find the clique partitioning of G ps . Algorithm 1 shows the key steps of the heuristic for port sharing. Firstly, we find a maximum clique Q V q ,E q in G ps using an ILP based method [42] . If |V q | is greater than 2, we merge the inports in V q into a single port and remove Q V q ,E q from G ps to update a new G ps . The operation of finding the maximum clique is repeated until |V q | ≤ 2, where the clique partitioning problem can be solved by finding maximum cardinality matching. Secondly, we find maximum cardinality matching in the rest of G ps , where the edges in the maximum cardinality matching correspond to two-port sharing.
Algorithm 1 PS on a switch (s n )
Require 
C. Port Sharing on Multiple Switches
The conditions in Section V-A ensure the fault tolerance when port sharing is considered on one switch only. However, port sharing on multiple switches perhaps causes conflicts of routing path selection. In this section, we present a method for removing some port sharing to maintain the fault tolerance when port sharing on all switches are considered.
To select routing paths for communication flows, we can construct a graph G pc (V pc , E pc ) to represent the conflict relations between routing paths and solving an independent set problem on G pc . Let p k i, j be the k-th routing path of communication flow 
and p k 2 (i 2 , j 2 ) go through a common core inport or core outport}. The selection of routing paths can be achieved by finding a maximum independent set of G pc , denoted as IND(G pc ). If there is no port sharing, that is, there is no second type of edges in E pc , we may choose any correct routing path for each communication flow, and, hence |IND(G pc )| = |E cc |. If port sharing is considered on one switch only, the conditions in Section V-A ensure |IND(G pc )| = |E cc | for any K switch faults. However, |IND(G pc )| < |E cc | could happen if we have two or more switches with shared core inports/outports, that is, we cannot find enough routing paths for the communication flows. Fig.9 shows an example. We have three communication flows Here, we propose a heuristic to deal with conflicts of port sharing for all the switches. Let V f be the set of faulty routers. Algorithm 2 shows the key steps.
In Algorithm 2, we first generate the core inport/outport sharing for each switch by calling Algorithm 1. Second, for each subset of possible K faulty switches, the fault-tolerance is verified by solving a maximum independent set problem, where |IND(G pc )| is computed by solving an ILP-based formulation. When |IND(G pc )| < |E cc |, a port-sharing edge is considered to be removed for increasing the size of maximum independent set. Basically, we consider removal of a port-sharing edge between core outports while keeping the sharing edges between core inports as many as possible, because an input port generally has a flit buffer with large area costs and power overhead. Fig.10 shows the port sharing of the network topology in Fig.  3 ; the corresponding G cc in Fig.1. Fig.11 shows the graph G pc corresponding to Fig.10 . When the switch s 0 is broken down, Permanently remove one of the port-sharing edges related to (i, j).
Find a maximum independent set of G pc ; until |IND(G pc )| = |E cc | Restore G pc , except for the permanently removed edge; end for are displayed in gray color and the conflict paths are displayed in gray color and marked using c.
D. Selecting Routing Paths after Port Sharing
The selection of routing paths can be achieved by finding an |E cc |-size independent set of G pc . If link faults or switch faults occur, we can just remove the routing paths that go through the faulty links and the faulty switches from G pc and find a new set of routing paths for all the communication flows by solving an |E cc |-size independent set problem on G pc and update the routing tables of the core communications.
E. Cost Analysis for Multiplexers and Demutiplexers
To develop a fault-tolerance topology, we introduce demultiplexers (DEMUX) and multiplexers (MUX) for the source cores and the sink cores, respectively, of the communications flows. The routing paths can be selected by sending to the DEMUXs and MUXs control signals.
Because we assume source-routing strategy, where the routing paths are stored in a routing table in the source core side of the communication flow. Each digit of the routing information is used in turn to select the output port at each step of the route, as if the address itself was the routing header determined from a source-routing table [35] . Hence, for a demultiplexer with the input from a core (for example, the DEMUX connected to c7 in Fig.10) , the control signals can be from the routing bits, and for a demultiplexer with the input from a switch (for example, the DEMUX connected to switch 1 in Fig.10) , the control signals can also be generated by the switch according to the routing digit in the head flit of packet. For a multiplexer, we can send one-bit enable signal along with each input data, and the control signals of the multiplexer can be generated based on the enable signals using a simple logic circuit, which includes several logic gates. For K-fault tolerance, we need K + 1 enable signals. In practical designs, K will be very small (K ≤ 3 in this work), and hence, the power and area overhead is very small. Notice that the MUX and DEMUX only exist at the starting point and ending point of routing paths.
In the following, we analyze the costs of DEMUX and MUX. We set the bit-width to 32 bits and synthesize DEMUX and MUX with different sizes based on the 65nm process technology using commercial logic synthesis tools. The power consumption is shown in Table III . From the table, we can see that the power consumption of DEMUX and MUX is at least two order of magnitude less than that of the switch. Consequently, the power consumption from introducing one more DEMUX and MUX port is much less than the power consumption reduced by removing a switch port. VI. LINK-FAULT TOLERANCE In this section, we consider the generation of K-link-faulttolerance network topology, which is a special case of the generalized fault-tolerance topology. If only link failures between switches are considered, the mapping from cores to switches exhibits a many-to-one relationship. We propose an ILP based method to simultaneously solve the core mapping problem and the routing path allocation problem to improve the quality of the solutions.
Here, we define a routing path graph G rp (V rp , E rp ) to represent the possible connections between the cores and the switches and the possible physical links between the switches.
Definition 2: Routing Path Graph:
In the following, we give an ILP formulation to find K + 1 edge-disjoint routing paths in G rp for all the communication flows. The objective is to minimize the power consumption of the NoC topology with a switch size constraint, bandwidth constraints, and latency constraints.
The initial number of switches n sw is determined using method similar to the one in [34] . The binary variables x (i, j,k) uv and d uv are defined similar to those in the formulation (5) . The K-link-fault-tolerant topology generation problem can be formulated as the following integer programming problem:
P sw (ip u , op u ) and P link are computed using a method similar to the one in Section IV-B1. The constraint (8a) defines a path from s = c i to t = c j . The constraint (8b) ensures that the K + 1 paths are link-disjoint. Next, we use the (8c) constraint to ensure that each switch has at least one port for connecting to other switches and the constraint (8d) defines the max size for each switch. The constraint (8e) is the latency constraint, which means that the default path (k=0) passes through at most l i, j switches. The constraint (8f) means that the bandwidth requirements of the communication flows going through the physical link (u, v) must be less than the BW max . The constraints (8g) and (8h) ensure that each core connects exactly one switch and the constraints (8i) and (8j) define the binary variables.
VII. EXPERIMENT
The proposed algorithms have been implemented using C++ on a Linux 64-bit workstation (Intel 2.0 GHz, 64 GB RAM). All the ILP-based formulations are solved using Gurobi [43] . In the first set of experiments, we analyzed the hardware consumption of fault-tolerant topologies. The second set of experiments show the effectiveness of the port sharing. In the third set of experiments, we compared the proposed method for generating link-fault-tolerant topologies with those of previous studies.
A. Hardware Cost Analysis of Fault Tolerance
In this experiment, the bandwidth constraint BW max is set at 3000MB/s and the maximum number of ports on switches, max size, is set to 10. ORION 3.0 [40] was used for estimating the switch power and the model from [41] was used for estimating the link power. Table IV shows the comparison between the non-faulttolerant topologies and the fault-tolerant topologies with K = 1, K = 2 and K = 3, respectively. The column SwitchNum, LinkNum, and Power denote the number of switches, number of links, and the power, respectively. The column Time denotes the running time of the program. Additionally, the column NFT and FT respectively represent the non-fault-tolerant topologies and fault-tolerant topologies, and the column Inc shows the ratios.
As K increased from 1 to 3, the power consumption is increased by 97.24%, 203.29%, and finally to 299.76% compared to NFT (K = 0). Fig.12 shows that the increase in the power consumption is approximately linear with K for all benchmarks, because the power mainly comes from the switches and communication traffic. The increase in the number of both switches and links is also approximately linear with K. Table V shows the effectiveness of the port sharing. The columns SwitchNum, InportNum, OurportNum, and Power denote the number of switches, the number of input ports, the number of output ports, and the power, respectively. The columns NPT and PT represent the results with/without port sharing, and the column Dec shows the reduction of PT compared to NPT .
B. Analysis of Port Sharing
For one-fault-tolerance, the number of input ports, output ports, and the power can be reduced by 13.55%, 12.37% and 18.08%, respectively. As K increases, the results show more reduction in the switch power consumption, which demonstrates the effectiveness of port sharing.
C. Link-fault Tolerance 1) One-link-fault tolerance:
In this subsection, we compare the proposed framework to the FTTG method in [34] , which includes a one-link-fault switch topology generation followed by a simulated annealing based core mapping, and the de Bruijn Digraph (DBG) based method [44] for the one-link-fault tolerance case.
To compare our results to that of previous studies [34] [44], the link power of the communication flow (i, j) on the edge
Notice that there are no extra costs introduced for opening a new physical link.
a. Comparison to the FTTG method: In this work, we stipulate that physical links are directed whereas the FTTG method uses bi-directional physical links. Hence, we degenerate the directed graph to an undirected graph by adding to the ILP formulation (8) the following constraint:
Additionally, the objective function in FTTG is to minimize the energy consumption of the network topology, which is estimated based on the shortest paths (called the default path in [34] ). To make a fair comparison to the FTTG method, we use the same objective function as follows.
In the formula, E R bit represents the bit energy consumption of the switches and E L bit represents the bit energy consumption of the unit length links [34] . Six widely used benchmarks were used for the comparisons. The port number of switches max size was set to four, which is the same as that in [34] . Because we cannot obtain the value of the bandwidth constraint in [34] , the bandwidth constraint f max is set to 3000MB/s, which corresponds to a 32 bit physical link operating at 750MHz. The proposed ILP-based method is applied to generate a one-fault-tolerant topology with the switch number varying from r min to r max . For fair comparisons, we use the energy estimation model in [34] . The energy consumption of the switches and links are set at 3.20 pJ/Kb and 4.78 pJ/Kb/nm, and the length of the links is set to 1 mm. The experimental results are listed in Table VI . The first column is the name of benchmark. Columns 2 and 3 provide the energy consumption of the fault-tolerant network topology generated using the FTTG algorithm and the proposed ILP based method, respectively. The energy is calculated based on the shortest path between the two alternative paths, which is the default routing path [34] . The benchmark MP3Enc refers to the MP3EncMP3Dec because of the limited space in the table. Columns 5, 6, and 7 are the comparison of the two methods on the average hop count (AHC). Furthermore, column redu presents the reduction of the performance index compared with the FTTG algorithm. Compared with FTTG algorithm, the proposed method can reduce the energy consumption by 10.58% on average. In addition, the proposed method can also reduce the hop count by 6.25% on an average. This is because the switch topology and the core mapping strongly depends on each other and we formulate the two subproblems as a single ILP model, in which we can determine the best solution for the whole problems.
b. Comparison to the DBG based method:
The DBG method [44] generates a link-fault-tolerance topology without consideration of the position of the cores and switches. Hence, we set the distances D uv to 1 mm to calculate the power consumption.
We implemented the 2-D topology generation algorithm (DBG) in [44] and made a comparison. The port number of switches max size is set to 10. Six widely used benchmarks and three synthetic benchmarks, D 36, D 43,and D 50 are used in the experiments.
The experimental results are listed in Table VII , where the average hop count is equal to the number of switches along a path plus one. The proposed method can reduce the power and average hop count by 21.72% and 9.35%, respectively. The synthetic benchmarks use one more switch compared to the DBG method; however, the switch size is smaller, which results in a power reduction. On the other hand, the number of links can be reduced by 45.46% on an average. In the DBG method, each switch has two links for connecting to other switches. This results in many redundant links. Actually, there are some pairs of communication cores allocated on the same router which do not experience the link-fault-tolerance issue. In the proposed method, we consider only the necessary links between the switches. As the number of switches is increased, the diameter of the DGB graph increases even faster, which cause the average hop count to become much larger, especially for the synthetic benchmarks D 36, D 43, and D 50. (a) Table VIII show the results. The benchmark MP3E/D refers to the MP3EncMP3Dec. The columns SwitchNum, LinkNum, and Power denote the number of switches, the number of links, and the power, respectively. The columns NFT , 1FT , 2FT , and 3FT denote the results of non-fault tolerance, onefault tolerance, two-fault tolerance, and three-fault tolerance, respectively.
Compared to NFT , one-fault tolerance topologies use the same number of switches and three times as many links on an average, and the power overhead of one-fault tolerance topologies is 11.9%. Compared to NFT , the power consumption increases by 23.8% in 2FT on average while it is 52.1% in 3FT , because more switches and links are used to generate fault-tolerance topologies. The increase in power consumption is approximately linear with K.
VIII. CONCLUSIONS
In this paper, we presented a K-fault-tolerant topology generation method for ASNoC with physical link failures and switch failures. First, an convex-cost flow and ILP based method was proposed to generate a network topology in which each communication flow has at least K + 1 switch-disjoint routing paths, which provide K-fault tolerance. Second, to reduce the switch sizes, we proposed sharing the switch ports for the connections between the cores and switches, and proposed heuristic methods to solving the port sharing problem. Finally, we also proposed an ILP-based method to simultaneously solve the core mapping and routing path allocation problems when only the physical link failures are considered. The experimental results showed the effectiveness of the proposed method. 2FT 3FT NFT 1FT 2FT 3FT  NFT  1FT  2FT  3FT  NFT 
