I. INTRODUCTION
Modern device scaling results in deep sub micron noises, which cause interconnect errors to be more dominant and harder to predict [1, 2, 3, 4, 5, 6, 10, 11] , and also gives rise to new error sources [2, 3] . The need for efficient low-power design techniques, along with aggressive voltage scaling and higher integration make interconnects even more susceptible to errors [1, 6, 11] . In this paper, we focus on efficient solutions for interconnect reliability in the context of Networks-on-Chip (NoCs).
Traditional designs enhance interconnect reliability at the physical layer, using worst-case design margins such as aggressive inter-wire spacing, insertion of repeaters, and shielding of link wires [5, 7, 10] . Unfortunately, all these techniques incur high area and power costs [3, 9] . Moreover, they require knowledge of the circuit layout, thus inflicting design complexity [3, 6] . Furthermore, in novel technologies, the efficiency of these techniques decreases because transient errors are becoming harder to predict [10] .
A promising alternative to the traditional physical layer solutions is to add reliability at the data-link layer of the NoC, using error detection codes, as suggested in [6, 11] . Whereas error protection at the physical layer involves circuit design techniques that rely on specific device parameters, data link solutions are technology-independent [6] .
Coding methods add redundant parity bits to the packet, which increase the NoC's cost by requiring either additional wires or extra transmissions. Our goal is to provide any desired level of error detection, while reducing the number of redundant bits, as we specify in Section 2.
In Section 3, we present Parity Routing (PaR), a novel method for error protection in NoC. The main idea behind our approach is to take advantage of the multiplicity of routing paths between nodes. Path diversity was exploited in the past in order to achieve load-balancing, by routing some traffic XY and remaining traffic YX [8] . Here, we use it for the first time for error detection, and achieve better load balancing as a favorable side effect of this approach. For example, if one bit error detection is required, then the traditional approach is to add a single parity bit to the packet. In PaR, we save the redundant bit by selecting the routing path according to the parity of the data. As in [6] , errors are detected at every hop: routers along the path can identify parity errors by observing that the packet is on the wrong path. We illustrate this in Fig. 1 , where a packet is transferred from a source node, S, to a destination node, D, on a regular mesh NoC. The data parity determines the routing path: 0 for XY routing and 1 for YX. In Fig. 1 , the data is 0101, so the parity bit is 0, which indicates XY routing. While transferring the packet from S to the adjacent horizontal node (according to XY routing), one error occurs, changing the data to 0111. At the receiving node, the calculated parity bit is then 1 which indicates YX routing. Since the edge the packet arrives on is not on the expected path, an error is deduced.
A single parity bit can be saved whenever there are more than two available paths between the source and destination nodes. However, this may not always be the case if we wish to employ shortest-path routing: if the source and the destination nodes share one coordinate (either X or Y) there is only one shortest routing path. In such cases, PaR adds an extra parity bit to the packet.
In the general case, where the reliability demand is r redundant parity bits, we expand this method for error protection using the multiple routing paths between S and D. Some of the paths share edges, and therefore we save redundant bit transmissions on some of the edges within the routing paths, but not all. We have verified the correctness of PaR using exhaustive state exploration for all source and destination pairs on NoC grids of up to 5x5 hops, and reliability requirements of 1 to 10 parity bits.
In Section 4, we analyze and simulate the saving achieved by PaR. Our analysis shows that for a reliability demand of one redundant parity bit, we save 50% of the redundant bit transmissions on a 2x2 mesh NoC, and 75% on a 4x4 mesh NoC. For a reliability demand of 2 parity bits, we save ~40% on 4x4 mesh NoC, and ~60% on an 8x8 mesh NoC. For any number of desired parity bits, the savings increase asymptotically to 100% with the size of the network. In addition, PaR can yield power saving as it saves bit transmissions and simplifies the error detection decoding process.
II. GOAL AND DEFINITIONS
We tackle the problem of hop by hop error detection. The required reliability level is expressed as the number r, of redundant bits. An externally provided function (or circuit), parity(data), returns r parity bits for protecting data. Any parity function can be used, e.g., CRC [4] . We denote the r redundant parity bits as 1 , 2 ,..., p p p r .
A. Problem Definition
Our goal is to design an error detection algorithm, which reduces the transmission of redundant bits, yet with low encoder and decoder circuit overheads and a low design complexity. Consider a packet sent from a source node S to a destination node D, in a regular mesh NoC. We require that the routing from S to D will be on one of the shortest paths. A Coding solution consists of two components, an encoder and a decoder. The encoder and decoder circuits are placed at each node, providing hop-by-hop error detection. We denote the current node where encoding/decoding occurs as H. The encoder and decoder's functions are defined as follows: 1. Encoder: Given H, S, D, and the packet's data, the encoder decides which edge is next on the packet's routing path, and whether there is a need to add redundant parity bits to the packet. 2. Decoder: Given H, S, D, the packet, and the incoming edge, the decoder determines whether an error had occurred.
The flow of information among the different components is shown in Fig. 2 . We denote concatenation by commas, e.g., data,p [1] represents the data with one parity bit appeared at the end. If pack=data,p[1] then we denote data=pack p [1] . Note that since the encoder determines the routing path, this approach is applicable for transmission units that carry the source and destination addresses. In case the NoC employs wormhole routing [1] , typically only the header flit carries these addresses. In such cases, our scheme can be used either for the entire packet (with checking at the destination node) or only for the header flit, which is the most important flit. For the remainder of this paper, we simply refer to the protected transmission unit as a packet.
B. Definitions
We now introduce some notations that will be used throughout the paper.
We denote For an edge e, the orientation orient(e) is h if e is horizontal, and v if it is vertical. We define the diagonal distance, , min , , , 1 
III. PARITY ROUTING ALGORITHM
In this section, we develop the PaR algorithm. For clarity of the exposition, we first present the special case of a reliability demand of one parity bit, called PaR-1, and then expand the algorithm for r redundant parity bits.
A. PaR-1: One-bit Error Protection
Consider the case of a reliability demand of one bit error detection. We use the given parity function to calculate the parity bit of the packet. If the parity bit is 0, PaR-1 routes the packet XY, and in case the parity bit is 1, the routing is YX, as shown in Fig. 4 . If S and D are located on the same row or column, then the parity bit is sent along with the packet.
The pseudo-code of PaR-1 encoder is shown in Fig. 5 . The pseudo-code of PaR-1 decoder is shown in Fig. 6 .
(1) PaR-1_Decode (S, D, packet, incoming_edge) (2) The property that allows us to detect the parity bits according to the routing path is the fact that the XY and YX paths between every S and D that do not share a coordinate are edge-disjoint.
B. PaR-r: r-bit Error Protection
Generally speaking, in order to provide a detection level of r-parity bits without sending redundant bits, we need to distinguish between 2 r edge-disjoint routing paths. Since there are at most 2 edge-disjoint paths between every pair of nodes, PaR-r strives to achieve disjointedness on as many edges as possible, by choosing paths with minimal overlap. If one parity bit is missing (i.e., r-1 are sent with the packet), then H should be either on the same diagonal with S, or one hop away from such a diagonal, or on the same row or column with S or D. In the first case, the missing parity bit is 0 in case a message arrives on a vertical edge, and otherwise it is 1. In the second and third cases, it is 0 for a horizontal edge, and 1 otherwise (see paths in Fig. 8 distance on the Y axis to S for a horizontal edge, or from the the distance on the X axis to D for a vertical edge. When more then one parity bit is missing in the packet then the missing parity bits are deduced according to the binary representation of the distance from H or to S or the distance from H to D, according to the orientation of the incoming edge. When all parity bits had been decoded, we compare them to the parity bits which are calculated from the received data using the given parity(data) function. In case of mismatch between the parity bits, we detect an error.
IV. ANALYSIS
PaR achieves savings in two elements: first, it saves network traffic and interconnects dynamic power due to the reduced redundant bit transmission, and second, it saves dynamic power by avoiding the need to operate the original error protection decoder block (which is likely to grow with exponential complexity while the growth of the parity bits number is linear). We now analyze the savings in redundant bits transmission. We begin, in Section 4.1, by analyzing PaR-1, and then generalize the analysis to PaR-r in Section 4.2. Finally, we present an example of the power reduction archived by PaR-1 in Section 4.3.
For simplicity, our analysis assumes a uniform traffic model, where an equal number of messages are transmitted between all source-destination pairs. We measure the percentage of redundant bit transmissions on an edge-byedge basis. For example, if a parity bit is sent on two edges in a four-hop path, the savings on this path are 50%. We analyze the average savings over all paths.
A. PaR-1 Analysis
Consider 
In case the network is symmetric, i.e., N=M, we get:
For example, in case of a 4x4 network, the cost reduction is 75% of the redundancy bits. We observe that as we increase the network (in both dimensions equally) the In order to show this, we simplify the analysis and prove that the percentage of paths on which no parity bits are sent is asymptotically zero. Since these paths are, on average, shorter than paths where parity bits are sent (as shown above), this simpler result implies that the savings increase asymptotically to 100%. We observe that the percentage of pairs for which we save the redundant bit transmissions is:
PaR-r Analysis
We now analyze the general case of r parity bits, PaR-r. Consider an NxM NoC, a reliability demand of r redundant bits, and two given nodes S and D. Without using the PaR algorithm, we have to transmit r redundant parity bits on all edges in the path, that is, on , ,
Assume that PaR-r can transmit the packet without the redundant r parity bits, i.e., To compute the average savings percentage in a given NoC, we ran a numeric computation, which iterates over all S-D pairs, and sums the savings, and then divides them by the number of pairs. The results for different r requirements are shown in Fig. 10 . bits, savings are over 50%, (more than one bit), and for 3 bits, more than 30% on small NoCs.
C. Power Reduction Example
We demonstrate the power saving achieved by PaR-1 with NxN regular mesh NoC with 5mm long, 8-bit width links. Hardware design is implemented on 0.18 TOWER process, and synthesized by Synopsis's design compiler. Interconnect power consumption is measured by SPICE model, assume random data and traffic patterns.
Measurements of the redundant bits switching power, along with the parity circuits' power and PaR circuits' power are referred as power consumption and shown at Fig. 11 . The measurements were made on 2x2, 3x3 and 4x4 regular mesh NoCs. We can observe increased power saving with the size of the NoC. We expect the savings to grow as more parity bits are used because of less redundant network traffic as well as avoiding the need to use more complex error protection decoders. 
V. CONCLUSIONS
Achieving interconnect reliability is already a difficult task facing chip designers and manufacturers today, and can be anticipated to become an even more serious problem in years to come. A key challenge in this context is providing high reliability at a low power cost. While error detection codes provide a promising approach towards achieving reliability, they do expend additional power in redundant bit transmissions. In this paper, we have tackled the problem of ensuring error detection, while reducing the need for redundant transmissions.
We presented PaR -parity routing, a low-overhead error detection solution for networks on chip. PaR can be used to provide any predefined error protection requirement. It exploits NoC path diversity, and selects routing paths based on parity bits. It thus saves actual transmissions of these bits, along with the associated power penalty. PaR uses simple, low-complexity encoding and decoding circuits. We have analyzed the savings achieved by PaR, and have shown that it yields significant savings even on small NoCs, (for example, saving 75% of redundant bit transmissions on a 4x4 NoC mesh), and its savings asymptotically converge to 100% with the size of the NoC. We showed that PaR can yield power savings (for example, saving 35% of redundant power consumption on a 3x3 NoC mesh NoC).
We believe that our novel parity routing approach opens interesting opportunities that may be explored in future work. One such interesting future direction is related to wire (capacity) allocation. By eliminating the need to transmit redundant parity bits most of the time, PaR may allow for wire reductions in NoC design. For example, if a parity bit is sent with every packet, the NoC designer is likely to add a wire for parity bits to all the links in the NoC. On the other hand, if less than 20% of transmissions carry redundant bits (as occurs, e.g., with PaR-1 on a 6x6 NoC), then it might be more cost-effective not to add a parity wire, and transmit the parity bit after the data when needed. A study of the optimal wire allocation for NoCs that use PaR is an interesting topic for future research. Beyond this example, another interesting question for future research is how to extend the PaR approach to also allow for error correction. Though in current day VLSI technology the bit error rates render error detection and retransmission more power-efficient than error correction [2] , this situation may change in future technologies, where one may therefore wish to employ error correction.
