With the shrink of the technology into nanometer scale, network-on-chip (NOC) has become a reasonable solution for connecting plenty of IP blocks on a single chip. But it suffers from both crosstalk effects and single event upset (SEU), especially crosstalk-induced delay, which may constrain the overall performance of NOC. In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work, joins crosstalk avoidance code (CAC) and error correction code (ECC) together through codeword selection from an original CAC codeword set. It can handle possible error caused by either crosstalk effects or SEU. When designing a reliable NOC, data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation.
Introduction
Network-on-chip (NOC) has been proposed as a reasonable framework for multi-core system. However, the overall performance of an NOC is constrained by the delay on long interconnects. The International Technology Roadmap for Semiconductors (ITRS2007) [1] gives the RC delay on global wires and intermediate wires in terms of 1 mm length Cu wire, assumed as widthdependent scattering in near-term years, as shown in But the increasing trend of RC delay will continue. Moreover, because wire width shrinks continuously, engineers have to enlarge the height/width ratio on wires to constrain wire resistance [2] . The coupled capacitance among wires hence grows rapidly, which results in serious crosstalk effects. And due to crosstalk effects, the real delay on an interconnect will be several times of the RC delay.
Usually the clock cycle should cover the worst case, so delay under the case of the worst crosstalk on interconnects has been a constraint for the system clock. Therefore, if there is no effective solution, the clock cycle has to be enlarged, and the system performance degrades.
Encoding schemes have been widely used for bus problems [3−5] , and they can also be used for crosstalk effects on bus. The encoding method for crosstalk was firstly presented in [6] , which shows that bus delay can be constrained through forbidding certain signal transitions. And then a crosstalk avoidance code (CAC) was proposed in [7] , which generates a codeword set, and can be used to encode message. With this method, there are no two adjacent wires with inverse signal transitions, and large delay induced by crosstalk will never happen. Another method was proposed in [8] , which avoids large delay induced by crosstalk by forbidding some bit patterns in codewords. Finally, since some signal transition combinations can also speed up data transmission, bus encoding method in [9] , which only permits these signal transition combinations, was proposed for high speed interconnect. In conclusion, bus encoding methods are suitable for dealing with delay on long interconnects, because they can constrain delay under certain upper limit instead of the worst case.
Besides crosstalk effects, energetic particles will interact with the silicon substrate and cause single-event upsets (SEUs), which may cause error on both packages and NOC controlling registers. When crosstalk effects and SEU happen together, it is difficult to make a network-on-chip reliable. Although a former method [10] deals with crosstalk effects and SEU on transmitted data though delayed triple sampling (TS) and Hamming code respectively, it brings the decline of the overall frequency.
In this paper, we introduce a reliable NOC design using a code with the capability of both crosstalk avoidance and single error correction. Such a code, named selected crosstalk avoidance code (SCAC) in our previous work [11] joins CAC and error correction code (ECC) together through codeword selection from an original CAC codeword set. Compared with the method of directly adding Hamming code on CAC [12] , SCAC does not have independent checking bus, so it does not need shielding wires for checking bus and avoids the second crosstalk effect caused by late transition on checking bus [11] . When SCAC is used to design a reliable router in NOC [13] , experimental results show it can save area overhead and reduce power dissipation. The most important is that it can relieve frequency constraint caused by delay on interconnects. This paper is a systematic combination of our previous work [11, 13] , and gives more details on the design of SCAC as well as the design of the reliable NOC. This paper is organized as follows. Section 2 covers related work about CAC and NOC fault tolerance methods. Section 3 introduces the main idea of selected CAC method. Reliable NOC design with selected CAC is brought in Section 4. Section 5 analyses the performance of selected CAC with delay upper limit of 1+2λ, and shows the experimental results in the design of a reliable NOC. Finally we conclude this paper in Section 6.
Related Work

Crosstalk Avoidance Code
Due to coupling capacitance among adjacent wires, when signal transition happens, there are two types of crosstalk effects on interconnects, delay and glitch [14] , and we focus on the major effect, delay, in this paper. Because crosstalk-induced delay can easily result in many errors on data, it could not be dealt with by single ECC. For example, assume signal "0101" is to be sent out, and the former pattern is "1010", the large delay on the second and third bits will cause two errors. Therefore, it is necessary to design a new code for crosstalk-induced delay. Table 1 . Relative Delay for Transition Patterns [15] Group Delay Transition Pattern
As shown in Table 1 , different transition patterns result in distinct delays, while λ is the ratio of coupled capacitance to bulk capacitance, and ↓, ↑ − stand for falling, rising transition and stable signal respectively [15] . Delay of groups 1 and 2 may cause delay faults, called large delay in this paper. Delay of group 3 is often used as a reference, and is called normal delay. The left groups are speedup cases. According to Table 1 , researchers designed many crosstalk avoidance codes (CAC), which ensure bus delay under certain upper limit through forbidding relevant signal transitions in Table 1 .
CAC with D = 1 + 3λ. This type of CAC demands that the upper limit of delay on bus is 1 + 3λ. The code should fulfill this need by forbidding signal transition (↓, ↑, ↓) and (↑, ↓, ↑) on three adjacent wires. Because it only avoids the worst crosstalk-induced delay effect on bus, it can be used for low performance requirement.
CAC with D = 1 + 2λ. This type of CAC demands that the upper limit of delay on bus is 1 + 2λ. The code should fulfill this need by forbidding signal transitions (↓, ↑, ↓), (−, ↑, ↓), (↓, ↑, −), (↑, ↓, ↑), (−, ↓, ↑) and (↑, ↓, −). There are two methods proposed, forbidden transition code (FTC) [7] and forbidden pattern code (FPC) [8] . When using these methods, their performance is the same to that of using shielding wires.
CAC with D = 1 + λ. This type of CAC demands that the upper limit of delay on bus is 1+λ [9] . The code can only permit a few of signal transitions. When the victim wire has a signal transition, only six types of signal combinations, (↑, ↑, ↑),
, may happen on the three adjacent wires. Due to crosstalk speedup effect, adjacent wires make the victim wire turnover quickly. Therefore, it can speed up signal transition and get better performance.
Crosstalk-and SEU-Tolerance Method
In an NOC system, long interconnects between two routers may suffer from crosstalk effects, especially crosstalk-induced delay. Meanwhile, memory cells in a router may be affected by SEU. Both these effects may result in package-routing errors or payload errors [10] in NOC, which change the target address or data in a package. When SEU takes place on state or controlling registers, there may be router crash errors, so that the router cannot work properly and cannot recover except of reset. In order to make NOC work reliably, a fault tolerance scheme is necessary to handle all these possible errors.
Delayed Triple Sample and Hamming Code
Generally, state and controlling registers should be protected by triple modular redundancy (TMR), because any fault on them may cause NOC fails. And researcher introduced a TS-HC-TMR hardware scheme [10, 16] , as shown in Fig.2 . Besides the TMR on state and controlling registers, delayed triple-sampling registers (TSs) are set at data inputs to remove crosstalk effects from NOC channels. They sample data three times at every interval d. And then the sampled results are checked by a majority voter. Subsequently, the output of the voter is encoded into Hamming code (HC), and stored in the input buffer. Since TS and HC are used to deal with crosstalk faults and SEU independently, even if crosstalk faults and SEU happen together, the NOC can also transmit the right data.
However, since interconnect delay in high speed circuit will be a major constraint for system frequency, and TS has to sample data lately, it is inevitable to degrade the overall performance. Specifically, supposing the first register samples data at the normal delay T n = (1 + 2λ), the second must sample data at the worst delay T w = (1+4λ), which assures that the voter give right output even at the worst case. The third register then samples data at 2T w − T n , so the clock period should be no less than 2T w − T n to make sure data can arrive at the third register. Besides, TS samples input signals three times within one clock cycle, and it requires three registers, it thus will consume a lot of power and need much area overhead. Fig.2 . TS-HC-TMR scheme [10] .
CAC with Error Correction
If a crosstalk avoidance code has the error correction capability, it can also tolerate both crosstalk effects and SEU, even if they happen together. Because the code can avoid major crosstalk effects, SEU error can be corrected before data are sent into interconnects. Two types of enhanced CAC with error correction capability have been proposed. One is the joint code [12] ; the other is the selected CAC [11] . As shown in Fig.3 , the joint code method directly adds k checking bits on the CAC codeword to compose a Hamming code for single error correction. Since crosstalk-induced delay on checking bus may exceed the upper limit of CAC, extra wires are used to protect checking bits, which further constitutes the LXC code [12] . In conclusion, the joint code, denoted as CAC + HC(c, l + t), needs (l + t − c) extra wires, where c, l and t are bit width of message bus, bit width of CAC bus, and bit width of the protected checking bus.
In the joint code method, checking bits lag behind the message bits, for they are generated based on the message bits as shown in Fig.3 . As a result, the clock period needs to be enlarged, which lowers the system performance. Moreover, because two buses have signal transitions at different time, it may cause a second crosstalk effect on certain message wires, which may affect the performance of CAC. The selected CAC method combines CAC and ECC together through codeword selection from an original CAC codeword set, as shown in Fig.4 . This method can find a maximum CAC subset with the capability of single error correction using a checking matrix. This code, denoted as SCAC(c, n), needs (n − c) extra wires. Compared to the joint code method, it has no separate checking bus and therefore does not bring different switching time among code bits. It also requires less wire overhead as shown in the experiment results.
This paper uses SCAC to design a reliable NOC. The next section will discuss the SCAC method in detail.
Selected Crosstalk Avoidance Code
Theorem 1 [17−18] .
A code can provide k-bit error correction if and only if its distance is no less than
The codeword selection method demands to find a fault-tolerant subset from the original CAC codeword set, and then builds mapping between message and codeword. For single error correction, the code distance of the subset should be equal to 3. The procedure of codeword selection is illustrated in the following subsections.
Codeword Selection Example
The codeword selection procedure can be divided into three steps. Firstly, a CAC codeword set will be found. Secondly, a uniform checking matrix is built, which can generate a group code with code distance of 3. Finally, according to the matrix, codewords in CAC are checked to generate a codebook.
As shown in 
Verification
In this subsection, it is proved that the codeword selection method can select a set with a code distance of 3 from a CAC set. Firstly, two arguments are introduced about the checking matrix. Then, necessary properties of the checking matrix are presented.
Matrix Demonstration
As we know, the weight of a codeword X, denoted as W (X), is the number of "1" bits in the codeword. Because the sum of two codewords, X and Y , in a group code is still a codeword Z in the group code, the Hamming distance of X and Y is equal to W (Z). And we have the following theorem.
Theorem 2 [17−18] . The code distance (dis code ) of a group code C n , i.e., the minimum Hamming distance of any two codewords, d min (C n ), is equal to the minimum weight of non-zero codewords in C n , as shown in (1) .
If a code can provide single error correction, its code distance is 3, and the minimum weight of codeword is also 3. All these definitions and theorems are borrowed from [17] [18] .
Corollary 1. There exists a group code with code distance of 3, if and only if there are 3 column vectors in the uniform checking matrix whose sum is 0
T , and any sum of less than 3 column vectors is not 0 T . If a codeword belongs to a group code, it should satisfy {X|H • X T = 0 T }. Assuming that codeword X has the minimum weight in that group code, namely 3. So due to multiply operation (•), 3 column vectors of H are chosen, and their sum must be 0 T . On the contrary, if there is other column group whose sum is 0 T , which has less than 3 vectors, there must be a codeword with less weight in C n , which conflicts with the assumption. Therefore, if a checking matrix satisfies Corollary 1, it can be used to generate a single error correction subset from a CAC codeword set.
Matrix Property
According to Corollary 1, in order to make sure that the code distance must be no less than 3, the matrix should satisfy the following properties. Property 1. There are no same column vectors and no 0
T column vector in this matrix.
Property 2.
There is a group of 3 column vectors in the matrix, whose sum is 0 T .
Properties 1 and 2 can make sure that the code distance is equal to 3, because any group of 1 column vector, or 2 column vectors, cannot have the sum of 0 T . According to these properties, there must be n different column vectors for an n-bit code. Consequently, dimension m of the column vector must be no less than log 2 (n + 1).
Algorithm Design
There are many matrixes satisfying the two properties, and different H results in different amount of codewords. It is desirable to find a check matrix that can generate as many selected codewords as possible, so that there are enough codewords to map the original message. If there are not enough codewords selected, the bit width of the CAC should be increased to get a large original codeword set. For example, Fig.6 shows that different matrixes may generate codeword sets with different amount of codewords for 7-bit CAC. The line with squares stands for CAC using FTC, whose maximum amount is 6 and the minimum is 3. The line with points stands for CAC using FPC, whose maximum is 10 and the minimum is 4. Therefore, if a good matrix is found, less wire overhead will be required.
Therefore, the SCAC method should find a suitable matrix for codeword selection, and gain an optimized codebook. Fig.7 shows the framework of codeword selection algorithm. For M -bit message, the program produces N -bit CAC at first. Then it judges whether there is a new matrix for codeword selection. If no, N pluses 1, and the program returns to the first step. Else, a codebook is selected from CAC by that matrix. Next, the program makes sure whether the codebook is enough for mapping the M -bit message. If it is enough, the codebook is found. Otherwise, a new matrix satisfying the matrix properties is generated by arraying column vectors in matrix again or selecting a new vector to form the matrix. And the matrix is used to select codeword again. After a proper checking matrix is found, an encoder is designed to build a mapping between the message and the codebook.
Error Correction
Unlike Hamming code, which separates checking bits to store error message, there is no checking bits in the SCAC code. Error correction should thus still depend on the uniform checking matrix. For example, supposing an error occurs on the last bit of the codeword 00111 during transmission, resulting in 00110 at the receiver. If this codeword is calculated according to {X|H • X T = 0 T } again, h 3 and h 4 in H are selected, and the sum of them is (0, 0, 1)
T . The sum is the same as the last vector in the matrix H, as shown in Fig.5(b) . Therefore, if there is an error on any bit of this codeword, the sum according to {X|H • X T = 0 T } can indicate the error position.
Therefore, the error correction circuit should include two units, as shown in Fig.8 . Firstly, the Predict Matrix unit is used to calculate codeword and generate the result. Then, the Correct Decoder unit is used to translate the result into error message, according to which error on the relevant bit should be corrected. For example, in the previous case, if 00111 is received, the result of the Predict Matrix unit is 0 T . So the codeword is correctly received, and no bit should be turnover. If 00110 is received, the result will be (0, 0, 1)
T , which is the same to the last column vector. So the signal on the last wire is turnover to correct that error. 
Reliable Network-on-Chip Design with SCAC
A package-based NOC interconnect architecture [19] is used in this paper, which is based on a worm-hole switching approach and a deterministic rotate sourcebased routing method. As shown in Fig.9(a) , it contains five I/O ports and one switch control unit, while one port (local port) for communication between the router and the local core, and the left for communication between adjacent routers. In every I/O port, the input port includes a buffer, which has a handshake state machine and is responsible for storing input data and handshaking with other units. And the router has a switch control unit, which has an arbitration state machine and is responsible for allocating channels for every input requirement. Finally, in a worm-hole based NOC, there are two types of packages, head package and payload package, which are used for routing and data transmission respectively. So buffers should handshake with them separately. There are two parts of work for designing a reliable NOC based on the SCAC-TMR scheme [13] . Firstly, in order to protect NOC from package-routing errors and payload errors, crosstalk effects on channels and SEU in buffers should be handled. As shown in Fig.9(b) , an SCAC encoder/decoder is built in the local network interface (NI) ports, and SCAC correction circuits are added on output ports of a router. When a payload package is to be transmitted, it is firstly translated to SCAC code on the local NI, and then moves across network. The CAC method guarantees the delay on a channel is acceptable and cannot cause crosstalkinduced delay fault. If there is SEU or crosstalkinduced glitch on an SCAC code, the correctors on output ports of a router will correct it. When it reaches the target core, the code is decoded back to a package at the local NI. Since the local port is near to network interface, error correction circuit on the output of local port is added on decoder. Meanwhile, before sending a head package, a router should send requirement signal h, and the channel is set to block mode till the response is received. Subsequently, while sending payload packages, it should send requirement signal ack av firstly, and the channel is also in block mode. So a head package between the two requirement/ack signals does not suffer from crosstalk effects, and cannot affect the following data transmission. Therefore, a head package can be encoded just as Hamming code to avoid adding extra SCAC decoder in the switch control unit. Finally, the switch control unit contains an HC decoder which can correct single error in a head package. In conclusion, packages can be en/decoded only in NI when they are sent into or received from NOC, instead of en/decoding at every channel, that is a very useful character for high speed circuit.
Secondly, in order to avoid router crash errors, SEU on state or controlling registers should be handled. In this NOC architecture [19] , there are handshake state machine in the buffer and arbitration state machine in the switch control unit. In the buffer, handshake state machine contains a state register, pointer registers (first and last), a counter register and an output signal register, and all of them need to be protected with triple modular redundancy. In the arbitration state machine, two state registers, the source and target registers for ports, and the output signal registers are also protected by TMR.
Suppose in an NOC, the head package has 7 bits, while 4 bits are used for the target address and 3 bits for payload size. For functional verification, an SCAC-TMR router [13] and the local NI for cores are designed, while a TS-HC-TMR router is also designed for comparison. When using the SCAC-TMR scheme, as shown in Fig.10(c) , an en/decoder is added at the local NI. If h = 1, a head package is to be sent, which is translated to Hamming code by HC (7, 11) and sent into network. Since 11 bits are enough to encode a 5-bit message to SCAC, the payload package is designed to include 5 bits. If h = 0, a 5-bit payload package is to be sent. Payload packages are translated to 11-bit SCAC code through encoder SFPC (5, 11), and then are sent to network. At the target core, they are changed back by decoder SFPC (11, 5) . Finally, 11-bit SFPC and HC correctors are set at the output channels and the switch control unit in routers for error correction respectively.
When using the TS-HC-TMR scheme, a 7-bit delay triple-sampling register is added before every input buffer of a router, and then the output of TS is encoded into Hamming code through encoder HC (7, 11) . The input package is stored in the 11-bit buffer with depth Both of these two schemes preserve their state and controlling registers by triple modular redundancy.
Experimental Results
Performance Analysis of SCAC with
In this subsection, selected CAC are designed with the delay upper limit of D = 1 + 2λ. Wire overhead, power dissipation and performance of delay avoidance are considered. The technology library SMIC 130 nm is used for experiment and V DD is used as shielding wires. According to that library, the ratio of coupled capacitance to bulk capacitance is between 3 and 5.
Wire Overhead
Codeword selection results from FTC and FPC codewords sets (SFTC and SFPC) are presented in this subsection. Compared to joint code, the SFTC and SFPC methods can save extra shielding wires and has low wire overhead.
Selected FTC. Every codeword of FTC is generated from the basic codewords with alternant 1 and 0, such as 10101. If the former bit in the new codeword is different from that in the basic codeword, the next bit should be the same. And if the former bit is the same, the next bit can be different or same. Therefore, there are no two adjacent wires with reverse signal transitions, and thus 1 8  2 2  2 3  2 5  2 6  2 7  2 9  3 0  3 2  3 3  SFTC  6  9  11  13  14  17  18  20  22  23  24  26  28  29  30  33  10  21  13  18  6  18  13  12  12  11  10  7  9  9   FPC+HC  9  1 0  1 3  1 5  1 6  1 8  1 9  2 3  2 4  2 6  2 7  2 8  3 0  3 1  3 3  SFPC  5  7  9  11  13  15  17  19  21  22  23  25  27  28  30  44  30  31  27  19  17  11  17  13  15  15  11  10  10  9 will not happen. So the crosstalk-induced delay will be no more than 1 + 2λ. Then a good matrix, which can generate a group code with a code distance of 3, is used to select codewords from the FTC set. Fig.10(a) compares the wire overhead of SFTC using a good matrix or a bad matrix, with that of the joint code method (FTC+HC). SFTC with a good matrix always achieves the fewest wire overhead. And compared to joint code, this method can save extra wires from 6% to 33% for different number (> 2) of message bits in Table 2 . The wire overhead, that SFTC saves, approximates to the shielding wire overhead in the joint code. For example, if encoding a 16-bit bus, it requires a 30-bit code, denoted as SFTC (16, 30) .
Selected FPC. Any codeword with bit pattern "101" or "010" is removed from codebook of FPC.
Without these bit patterns, signal transition (↓, ↑, ↓), (−, ↑, ↓), (↓, ↑, −), (↑, ↓, ↑), (−, ↓, ↑) and (↑, ↓, −)
will not happen, so the delay will not be over 1 + 2λ. Then a good matrix, which can generate a group code with code distance of 3, is used to select codewords from the FPC set. Fig.10(b) compares the wire overhead of SFPC using a good matrix or a bad matrix, with that of the joint code. When using FPC, the performance of matrix is more critical. Compared to joint code, the SFPC method can save extra wires from 9% to 44% for different number (> 2) of message bits in Table 2 . If encoding a 16-bit bus, it requires a 30-bit code too, denoted as SFPC (16, 30) .
There are other methods for avoiding crosstalk and providing error correction, such as shielding Hamming code (SHC), which adds shielding wires on Hamming code and requires much more wire overhead.
Power Dissipation
In this subsection, we will calculate average power dissipation when signals on bus change from one code to another, according to a bus power model in [20] . All power dissipation values in this paper are relative power, which is independent of voltage and capacitance.
Specifically, power dissipation of every CAC is calculated as the ration of the sum of power consumption for all possible signal transitions to the amount of all signal transitions. And power of every signal transition is calculated based on (2), where e i is the i-th column vector of identity matrix E and C T is the capacitance matrix as given below [20] .
When λ is set to 3, Fig.11(a) shows average power consumed on every signal transition when using joint code and codeword selection method for different number of message bits. Compared to joint code, when using FTC, the proposed method can save power consumption from 5% to 18%; when using FPC, this method can save power consumption from 10% to 20%. Therefore, the codeword selection method can save nearly 10% of power consumption when encoding bus.
When λ is set to different values, Fig.11(b) shows the power dissipation for 16-bit message. Compared to joint code, the SFTC can save more than 7% power consumption, while the SFPC can save 8% power consumption.
Delay Analysis
The performance of delay avoidance is evaluated through HSPICE, for the joint code method (FTP+HC) and the SFTC method respectively. In this experiment, wires are routed on metal2. According to the character size of technology lib SMIC 130, physical sizes of wires are set as shown in Fig.12 . And the dielectric constant of the top is set as 2.5, and the down is 10.0. The wire length is 1 mm. According to the formula C = × S/(4πkd), the ratio λ of coupled capacitance to bulk capacitance is 3. The physical parameter is abstracted by Field Solver in the experimental environment, which can be used in the following experiment.
In this experiment a five-wire model with apparent crosstalk effects is built, in which the third wire is a victim wire for falling delay, and the register threshold of signal zero is 0.4μ. Then we consider the worst crosstalk effect on message bus. When using SFTC, the pattern 01110, 11011 can achieve the worst crosstalk effect.
When using FTC+HC, the transition time on checking bus is later than that on message bus. Such an interval between transition times can cause a second crosstalk on the victim wire. In order to observe the second crosstalk effects, the interval set in the experiment varies from 0.05 ns to 0.2 ns. And since the last wire on message wires has the worst delay effect, it is set as the third wire. Besides the fourth wire is a shielding wire (V DD ), the fifth is a checking wire. The pattern 01110, 11011 can still achieve the worst crosstalk effect. And the delay caused by pattern 01110, 11011 , without any strategy for crosstalk avoidance, is used as a reference model. As shown in Fig.13 , due to the second crosstalk effects, all output signals of FTC+HC are later than those of SFTC, and the falling delay in the worst case increases 30 ps compared to that of SFTC. Therefore, due to the late signal transition on checking bits, the performance of delay avoidance on message bits is seriously affected. So the SFTC method can achieve higher performance of delay avoidance.
The total delay for signal propagation on interconnects is analyzed furthermore. As shown in (3), it contains the wire delay and the CODEC delay, considering the effect of driver strength and driver capacitance (load capacitance). The drive resistance and driver capacitance are set as 3 kΩ and 100 pf in this paper. As shown in (4) [21] , the wire delay not only depends on physical parameters, which are gained by the former experiment, but also depends on the delay upper limit caused by signal transitions, labeled as λ i and μ i . Their values are derived through running sweeps with the circuit analyzer SPECTRE [21] , as shown in Table 3 [21] . Since the original bus has a delay upper limit of 1 + 4λ, λ i and μ i are set as 1.51 and 2.20, and the max delay of 1 mm length is 8.82 ns. The λ i and μ i are set as 0.57 and 0.65 for SCAC, CAC+HC, and SHC, as these methods can avoid large delay and their delay upper limit is 1 + 2λ. So their max delay is 4.92 ns. Three types of bus systems, FC, SFPC and SHC, are designed to analyze the real performance of FPC+HC code, SFPC code and shielding Hamming code, while every bus system contains an encoder, 1 mm length bus and a decoder with error correction. The wire overhead, CODEC area, CODEC delay and total delay of three methods are shown in Table 4 respectively. Considering wire overhead, SFPC can save those wires used for protecting checking bus in FPC+HC, so it requires fewer wires. SHC has to shield every wire, so it requires 40%∼57.1% more wires than SFPC does. Considering the CODEC overhead, the SFPC en/decoding logic is much easier to optimize than the en/decoder of FPC, so SFPC cuts down about 31.4%∼80.7% CODEC area compared with FPC+HC. Since the SFPC method only needs to correct errors on some output wires, the error detection logic can be optimized and reuse logic in the decoding circuit. So the SFPC decoder requires a little area, even less than an HC decoder. As shown in Table  4 , the CODEC area of SFPC is nearly equal to that of SHC. Considering the delay overhead, since most of CODEC's delay overhead results from the decoder with error correction, and SFPC has a compacted decoder, the SFPC CODEC brings less delay than others. In Table 4 , the CODEC delay of SFPC can save about 90∼130 ps than that of SHC when bit width changes. In conclusion, SFPC performs much better than other methods from area overhead to delay reduction.
Evaluation of the Reliable Router
In this subsection, area overhead, max delay and power dissipation of two reliable 11-bit routers (called SCAC and TS router in short) are evaluated, and a primary 7-bit router without any fault tolerant circuit (primary router) is set as a reference. The NOC circuits of these three schemes are synthesized by Design Compile according to technological lib SMIC 130. The circuit of the TS scheme includes the router without the new clock generating logic, while the circuit of the SCAC scheme contains the SCAC-TMR router and the SFPC encoder/decoder in the network interface of a core. As shown in Table 5 , the area overhead of the primary router, the SCAC scheme and the TS scheme are 15300 μm 2 , 39243 μm 2 and 42270 μm 2 respectively. In the SCAC scheme, the fault tolerant circuit for data transmission, which consists of an encoder/decoder in local NI, four error correctors on the four output ports except for the local port of the router, and an HC corrector in the switch control unit, requires the area of 3671 μm 2 . In the TS router, five TS registers, five HC encoders/decoders and an HC corrector in the switch control unit are used for protecting data, and they occupy 7153 μm 2 . So the SCAC scheme only requires half of the area that the TS scheme uses for protecting data from errors. In both reliable routers, triple modular redundancy circuits for state and controlling registers cause too much area overhead, increasing 20272 μm 2 and 19817 μm 2 area respectively, which are about 130% that of a primary router. In conclusion, the SCAC-TMR scheme can save about 3027 μm 2 compared with the TS scheme, i.e., about 20% area overhead of the primary router.
Subsequently, exciting files for simulation are generated, in which head packages can make sure every channel at work, payload packages are produced randomly and the clock period is set to 10 ns. And then routers (with a local NI in the case of SCAC scheme) after synthesis are simulated using exciting files to generate VCD files. Finally, VCD files and routers after synthesis are loaded into Prime-Power for evaluating power overhead. As shown in Table 5 , the dynamic power consumptions of the normal router, the SCAC scheme and the TS scheme, are 0.69 mw, 1.61 mw and 1.95 mw respectively. Compared with the TS router, the SCAC scheme can save about 49% power dissipation (0.34 mw), while the power dissipation of the primary router is set as a reference. Assuming length of interconnects among routers is 1 mm, and interconnects between a router and its local core is very short, the minimum clock period of the three routers are analyzed, which must cover the max delay on both channels and the router. As shown in Table 5 , the delays of the primary routers, SCAC router and TS router, are 0.64 ns, 1.41 ns and 1.44 ns respectively. The channel delay depends on the longest channel and the path delay on additional circuits. The primary router has no additional fault tolerant circuit, so its channel delay is equal to that of the original bus, 8.82 ns. For the SCAC scheme, the channel delay contains both the delay on an SFPC corrector and the delay in the channels. Since delay on the corrector is 0.67 ns and delay in the channels is 4.92 ns, so the total delay is 5.59 ns. For the TS scheme, assuming the first register samples data at the normal delay T n (4.92 ns), the second register must samples data at the worst delay T w (8.82 ns) to assure getting right output, so the sample interval is T w − T n (3.90 ns). And the third register samples data at 2 × T w − T n (12.72 ns). In order to make sure data can arrive at the third register, its clock period must be no less than 2×T w −T n (12.72 ns). In conclusion, the TS scheme has to degrade 44.2% frequency of the primary bus due to delayed data sampling, so it cannot fulfill for high speed circuit. On the contrary, the clock period of the SCAC router is even smaller than that of the primary router, with 36.6% speed-up. So the SCAC is suitable for design of high speed and reliable network-on-chip.
Conclusion
In this paper, we introduce a reliable NOC design using selected crosstalk avoidance code, with the capability of both crosstalk avoidance and single error correction. Data are encoded to SCAC codewords and can be transmitted rapidly and reliably across NOC. State and controlling registers in routers are protected by TMR. Experimental results show that the NOC design with SCAC achieves higher performance and is reliable to tolerate single errors. Compared with previous crosstalk avoidance methods, SCAC reduces wire overhead, power dissipation and the total delay. When SCAC is used in NOC, it can save 20% area overhead and reduce 49% power dissipation compared with the previous method, the TS-HC-TMR scheme. The SCAC router can even speed up 36.6% frequency of a primary router. The TMR circuits for protecting controlling and state registers cause the most area overhead in the reliable router, which may be replaced by a more effective method in the future.
