component of this work in other works. Area-Efficient Synthesis of Fault-Secure NoC Switches by Atefe Dalirsani et al.
General Copyright Notice
This article may be used for research, teaching and private study purposes. Any substantial or systematic 
reproduction, re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to 
anyone is expressly forbidden.
Preprint
Area-Efficient Synthesis of Fault-Secure NoC Switches
A. Dalirsani, M.A. Kochte and H.-J. Wunderlich
Proc. of the 20th IEEE International On-Line Testing Symposium 
(IOLTS'14), Platja d'Aro, Catalunya Spain, July 7-9
This is the author's "personal copy" of the final, accepted version of 
the paper published by IEEE. 
IEEE COPYRIGHT NOTICE
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, 
in any current or future media, including reprinting/republishing this material for advertising or promotional 
purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted 
component of this work in other works. Area-Efﬁcient Synthesis of Fault-Secure NoC Switches
Atefe Dalirsani, Michael A. Kochte, Hans-Joachim Wunderlich
Institut f¨ ur Technische Informatik, Universit¨ at Stuttgart, Germany
email: fdalirsani,kochteg@iti.uni-stuttgart.de, wu@informatik.uni-stuttgart.de
Abstract—This paper introduces a hybrid method to synthesize
area-efﬁcient fault-secure NoC switches to detect all errors
resulting from any single-point combinational or transition fault
in switches and interconnect links. Firstly, the structural faults
that are always detectable by data encoding at ﬂit-level are
identiﬁed. Next, the fault-secure structure is constructed with
minimized area such that errors caused by the remaining faults
are detected under any given input vector.
The experimental evaluation shows signiﬁcant area savings
compared to conventional fault-secure schemes. In addition, the
resulting structure can be reused for test compaction. This
reduces the amount of test response data and test time without
loss of fault coverage or diagnostic resolution.
Index Terms—Network-on-Chip, self-checking, fault-secure,
online testing, concurrent error detection.
I. INTRODUCTION
A Network-on-Chip (NoC) is a communication alternative
for many-core System-on-Chips. In current deep sub-micron
technologies, latent defects, wear-out, soft errors, cross-talk,
power supply noise and radiation effects affect the reliability
of the system [1], [2] as well as the NoC structure. Having a
reliable communication through the NoC is essential because
an erroneous message transfer over the NoC may lead to
an erroneous data delivery to cores and consequently wrong
system operation. Furthermore, it may generate spurious trafﬁc
leading to a deadlock for example. Therefore, it is crucial to
concurrently detect a fault as soon as it causes an erroneous
operation of NoC switches or interconnect links. This way
fault tolerance mechanisms can be activated immediately to
avoid error propgation in the system.
Concurrent error detection (CED) techniques are used in
safety-critical applications to detect errors caused by perma-
nent and transient faults during the operation of the circuit
[3]. Conventionally, duplication with comparison has been
used to detect any single- and multi-bit error. In order to
decrease duplication overhead, Error Detecting Codes (EDC)
are employed. Fig. 1 presents the general structure for CED
using EDC. Check bits over output bits of the circuit are
computed in the checker. The checker compares the check
bits to those generated by the prediction logic and signals
an error in case of a mismatch. A circuit is fault-secure for
Checker
Output 
bits
Check
bits
Error
Input 
bits
Circuit
Prediction logic
Fig. 1: General structure of Concurrent Error Detection (CED)
a set of faults if for every considered fault that produces an
error at the circuit outputs, wrong check bits are generated [4].
Moreover, a circuit is self-testing, if for every considered fault,
there exists at least one input vector for which the error at the
outputs produces wrong check bits. Finally, a circuit is Totally
Self-Checking (TSC) if it is self-testing and fault-secure.
To conduct CED in NoCs, hardware redundancy schemes
(such as duplication or triplication) have been examined [5]–
[7]. Since the NoC may integrate hundreds of switches in a
single chip, these methods impose a huge area overhead. Error
detecting codes can also be exploited to synthesize a fault-
secure NoC switch. However, considering the switch without
regard to its datapath elements also leads to an unacceptable
area cost.
In NoCs, data encoding using error detecting codes in
combination with data retransmission is typically used to
detect and correct errors during the system operation [6],
[8]–[13]. Check bits are computed and appended to data bits
which are transported over the network. Several researchers
have investigated error detecting/correcting codes in terms of
silicon area, encoder/decoder delay, performance, and energy
consumption to trade-off costs and reliability [11]–[13]. Up to
now, these methods are mainly devoted to detect faults in the
inter-switch links and intra-switch datapath elements such as
multiplexers.
Faults in the datapath elements mainly cause data corruption
which may be detected and corrected by an error correcting
code [14] even at the system-level. However, a part of the
switch logic is dedicated to manage the ﬂow of data, for
example the routing algorithm, scheduling, and congestion
control. Faults in this part can be even more severe and have
uncorrectable effects leading to a system crash. For example,
a fault in the routing logic may cause a misrouting of packets
leading to a deadlock.
This paper proposes a hybrid method to synthesize an
area efﬁcient fault-secure NoC switch for any single point
combinational or transition delay fault in the switch or inter-
connect links, irrespective of its temporal nature (permanent,
transient or intermittent fault). The method incorporates error
detecting codes for data ﬂits and a low-area concurrent error
detecting structure to handle faults not covered by the ﬂit
encoding. Firstly, structural faults that are always detectable
by data encoding at the ﬂit-level are identiﬁed. Next, the fault-
secure structure is constructed such that the rest of the faults
become detectable under any given input vector using a parity
based code. A Boolean satisﬁability (SAT) based approach
examines the fault-secureness property of the data encoding
scheme as well as the ﬁnal fault-secure structure. The methodis architecture independent and can be applied to arbitrary
switches of any desired NoC topology with arbitrary routing
function. To enable efﬁcient online and ofﬂine testing of the
switch, the parity trees of the fault-secure switch are reused
to reduce the test response data volume and the test time.
The rest of the paper is organized as follows: Section
II describes the fault-secure architecture and introduces the
synthesis ﬂow. Sections III and IV explain the SAT model
construction and the synthesis algorithm in detail. Section V
describes the structure reuse for test compaction. Experimental
results are presented in section VI, followed by a conclusion.
II. OVERVIEW
A. Fault-Secureness
A circuit is fault-secure for a set of faults F if and only if:
8f 2 F : 8i 2 In : C(i) = Cf(i) _ (C(i)) 6= (Cf(i)): (1)
For In being the set of possible input vectors of n bits, C(i)
denotes the circuit response for input vector i in the fault
free case, while Cf(i) is the response under fault f.  is a
function that computes the check bits of the circuit response
for the selected error detecting code. The formula expresses
that for any input vector, the fault either is not propagated to
the circuit outputs, i.e. C(i) = Cf(i), or it is detected by the
check bits, i.e. (C(i)) 6= (Cf(i)) [4]. If a fault propagates
to the circuit outputs but it is not detectable by the check bits,
Silent Data Corruption (SDC) occurs.
B. Fault-Secure Architecture for Error Detection at Flit-Level
The NoC switch incorporates multiple input/output ports,
crossbar multiplexers and control logic to manage the dataﬂow
between input and output ports. Each switch is connected to its
neighboring switches and a network interface via interconnect
links. Each interconnect link consists of parallel data bits in
the width of a ﬂit (ﬂow control unit) and handshake signals.
The fault-secure switch presented here uses data encoding at
the ﬂit-level. As depicted in Fig. 2 for a sample ﬁve port
switch of a 2D mesh topology, ﬂit checkers (Flit chk) are
positioned at output ports of the switch over the data bits.
Port 03
P
o
r
t
 
0
2
P
o
r
t
 
0
4
Port 00
F
l
i
t
 
c
h
k
Fault-secure
critical region
Port 01
Flit chk
Critical 
region Prediction
... CB
Encoder Π
Comparator
Flit:
Error
Errorc
Dual Rail 
Checker
D
Error1
Flit chk
Error3
F
l
i
t
 
c
h
k
E
r
r
o
r
4
E
r
r
o
r
2
Flit chk
Error 0
Data 
bits
Check bits
Fig. 2: Fault-secure structure of the NoC switch
Error0 to Error4 are the error signals of the ﬂit checkers
of port 00 to 04. Each ﬂit includes a number of check bits
appended to its content. Check bits are generated by the sender
once a packet is injected into the network. Upon passing a ﬂit
through the ﬂit checker, the encoder recomputes the check bits
and compares them against the check bits stored in the ﬂit. A
mismatch causes the assertion of the error signal. Depending
on the EDC, a subset of faults in the incoming data links and
in the switch datapath are detectable by the ﬂit checkers. In
case of an end-to-end error control, e.g. [10], and omission
of the ﬂit checkers of the switch, the fault-secureness of the
switch cannot be ensured anymore because a fault effect might
be masked in upcoming switches.
Since ﬂit checkers are constructed over a subset of switch
outputs, i.e. data outputs, they are not sufﬁcient to ensure the
fault-secureness in the switch [15]. Some errors may propagate
either to internal states of the switch or to other outputs than
the data outputs. Moreover, certain faults may cause an error
on the data outputs that is not detectable by the error detecting
code (EDC) at ﬂit-level. As an example, let us consider a 2-
to-1 multiplexer as part of the crossbar in the switch datapath
and assume a single parity bit is added to the ﬂit as the check
bit. In the 2-to-1 multiplexer of Fig. 3, z = :s  x + s  y.
For  that computes the parity check bit of a ﬂit, when s is
zero, (z) = (x) and when s is one, (z) = (y). Assume
that there exist a stuck-at-1 fault at line a of the multiplexer
as shown in Fig. 3. In this case, the multiplexer function for
every bit i changes to: zi = xi+syi. When s is zero, z carries
the value of x, and the fault is not detectable. However, when
s = 1, z carries a bit-wise OR of the data on the lines x and
y (z = x + y), and therefore (z) = (x + y). In this case,
erroneous data on z is not necessarily detected by the parity
bit.
0
1
x
y
z
s
x1
y1
s
z1
a Stuck-at-1
x2
y2
z2
Fig. 3: 2-to-1 multiplexer: Stuck-at-1 fault violating fault-secureness
C. Synthesis Flow for Fault-Secureness
Figure 4 depicts the overall ﬂow of the synthesis scheme.
Initially, the switch including the ﬂit checkers is analyzed
to identify which faults propagate to data outputs and are
detectable by ﬂit checkers under any given input vector. These
faults are named covered faults. The rest of the faults are
categorized as uncovered faults, Funcov (details in Section III).
With respect to the uncovered faults, a sub-block of the
circuit is extracted, which is called critical region. It includes
only the parts that inﬂuence the propagation of errors caused
by uncovered faults. A CED structure is synthesized to make
the critical region fault-secure. The method concentrates only
on the uncovered faults in order to reduce the area overhead
(details in Section IV).Per switch, one error signal is generated by the disjunction
over the error signals of the ﬂit checkers and the error signal
of the fault-secure critical region. It is sent to neighboring
switches and the network interface via a dedicated error port
in order to invoke an error recovery mechanism. The resulting
switch, including ﬂit checkers and CED circuitry of the critical
region is fault-secure. If the original switch is non-redundant,
it is also totally self-checking.
Analysis of the faults in the switch 
with flit checkers
Covered 
faults
Uncovered 
faults
Extract critical region
Yes  No
Fault-secure synthesis of critical region 
to enable CED of the uncovered faults
Fault always detectable 
by flit checkers? 
Fig. 4: Overall ﬂow of the synthesis scheme
III. IDENTIFICATION OF UNCOVERED FAULTS USING
BOOLEAN SATISFIABILITY
The switch with ﬂit checkers is fault-secure, if the effect of
faults that propagate to switch outputs is never masked at ﬂit
checkers under any input vector. Conducting exhaustive fault
simulation to check this property for all faults in the switch is
not possible due to the huge input vector space, In, when
n = #switch input ports  (ﬂit size + #handshake signals).
However, to identify the faults that violate fault-secureness
(i.e. uncovered faults Funcov), it is sufﬁcient to ﬁnd one input
vector (i 2 In) for which the fault effect is propagated to
switch outputs and masked at ﬂit checkers. This is deﬁned as
a Boolean satisﬁability (SAT) problem. Fault f violates fault
secureness if and only if:
9i 2 In : C(i) 6= Cf(i) ^
V
0j<m (Dj(i)) = (D
f
j (i)): (2)
C(i) is the switch response (including data outputs) for input
vector i, and Cf(i) is the switch response under fault f. For
a switch with m output ports, Dj(i) are the data outputs of
port j in the fault-free case, and D
f
j (i) are the data outputs
under fault f.  computes the check bits over the data bits of
each port.
Inspired by statement (2), Fig. 5 shows the structure of
the SAT instance to identify uncovered faults. The instance
is satisﬁable if and only if statement (2) is true. The com-
binational netlist of the switch is extracted by removing
ﬂipﬂops and replacing the input/output of ﬂipﬂops by pseudo
primary output/input ports respectively. As shown in Fig. 5, the
outputs of the switch and the faulty copy, including primary
and pseudo primary outputs, are compared bit-wise 1 . The
encoder structures  compute the check bits which are then
compared to those in the faulty instance 2 .
The SAT instance f is a Boolean formula in Conjunctive
Normal Form (CNF) of the characteristic function of the
switch and the faulty copy, the encoder structures, and the
functions required for comparison:
f = CNF(S) ^ CNF(Sf) ^ (C 6= Cf)^
CNF() ^ CNF(f) ^
^
0j<m
(Dj = 
f
Dj):
It includes the characteristic equations of the gates in the fault-
free and faulty switch, CNF(S) and CNF(Sf). CNF()
and CNF(f) represent the encoder structure over the data
outputs, C and Cf represent switch outputs, and Dj and

f
Dj represent encoder outputs of each switch port j in the
fault-free and faulty copy, respectively. For each fault, a new
instance f is constructed. f is satisﬁable if there exists one
input assignment such that the outputs of the switch differ
(C 6= Cf) and the encoder outputs are equal. In this case,
the fault is an uncovered fault which is not detectable by ﬂit
checkers for at least the satisfying input vector found by the
SAT solver.
P
o
r
t
 
0
 
.
.
 
m
D
f
0
.
.
.
D
f
m
Non-data 
outputs
Encoder Π
Equal?
P
o
r
t
 
0
 
.
.
 
m
D0
Switch
Faulty 
Copy
f
I
n
p
u
t
s
n
.
.
.
Encoder Π
Dm
Non-data 
outputs
.
.
.
Encoder Π
Encoder Π
.
.
.
Not 
Equal?
f ϵ Funcov
.
.
.
.
.
.
S
f
S
2
1
Fig. 5: Schematic of the SAT instance to identify uncovered faults
IV. FAULT-SECURE SYNTHESIS FOR UNCOVERED FAULTS
We deﬁne the critical region C of the switch as the part
of the logic that may inﬂuence the propagation of uncovered
faults in the switch:
C := [f2Funcov support(f):
The support of a signal support(f) is the union of the gates
in the input cones of all outputs reachable from f.
To ensure that uncovered faults do not violate fault-
secureness by Silent Data Corruption (SDC), a multi-bit parity
code is constructed over the outputs of the critical region. The
outputs are distributed among the parity groups. A parity tree
generates the parity bit over the outputs in the same group. To
avoid masking of fault effects in parity trees, outputs in the
same parity group must not share any logic in their input cone
[16].
Figure 6 depicts the synthesis ﬂow of the critical region.
It starts with a topological analysis to construct initial parity
groups. A SAT instance is constructed to identify the remain-
ing faults in Funcov that still cause SDC. The initial parity
groups are split iteratively until all cases of SDC are resolved.
A. Topological Analysis
To reduce the area overhead of a parity-based fault-secure
circuit, the number of parity groups must be minimized.Topological analysis for initial 
parity group construction 
Group splitting
More 
SDC? Done
Yes 
No
Fig. 6: Code synthesis for fault-secureness of the critical region
Therefore, the initial parity groups are constructed based on a
topological analysis such that every uncovered fault effect is
not masked in at least one parity group.
The topological analysis only considers the uncovered
faults. If the gate of a fault f 2 Funcov is not shared among
the input cones of the outputs in at least one parity group, it is
allowed to be shared among the outputs in the other groups.
This is clariﬁed with an example in Fig. 7. The striped area
S is in the input cone of v2, v3, and v4. Let us assume S
contains an uncovered fault. Since v2 is encoded in group 1
and is the only output in group 1 containing S in its input cone,
it is allowed to put v3 and v4 in the same group, although they
share the logic in S. Considering only the uncovered faults, the
initial parity groups are constructed using the greedy algorithm
given in [17].
The topological analysis reduces the probability of SDC
and serves as a starting point to synthesize the fault-secure
structure.
Parity tree of 
Group 1
Parity tree of
Group 2
.
 
.
 
. Inputs
.
 
.
 
.
.
 
.
 
. Outputs
Error Error
v1 v2 v3 v4
Shared logic S
Input 
cone of 
output v1
Fig. 7: Topological analysis for constructing initial parity groups
B. Resolving Silent Data Corruption
The topological analysis cannot guarantee the fault-
secureness yet, since it only ensures that a fault effect is not
masked in one parity group. There may exist input vectors
that propagate the fault effect only to the groups where error
masking is not inhibited, i.e. SDC occurs.
The SAT instance, f constructed to ﬁnd the uncovered
faults, is extended to ﬁnd faults in Funcov that cause SDC in
the switch, which includes ﬂit checkers and parity trees of the
critical region. The clauses to represent the parity trees over
the outputs of the critical region in the good copy CNF(PV ),
the parity trees over outputs in the faulty copy CNF(PV f),
as well as the clauses to compare the parity bits in the good
and faulty copy, i.e. P(V ) and P(V f), are added to the SAT
instance:

f
SDC = f ^ CNF(PV ) ^ CNF(PV f) ^ P(V ) = P(V f): (3)
The SAT instance 
f
SDC is satisﬁable if there exists one input
vector if such that the fault is observable at switch outputs
(C 6= Cf, deﬁned in f), and it is not detectable by neither
the ﬂit checkers nor the parity groups of the critical region.
Fault f is simulated with input vector if to ﬁnd the outputs
and parity groups to which the fault effect propagates. Then,
using the group splitting algorithm of [17], the existing groups
are partitioned so that the number of faults carry SDC is
minimized. The splitting reﬁnes the parity groups iteratively
until fault-secureness is achieved, i.e. any SDC is resolved.
V. STRUCTURE REUSE FOR TEST COMPACTION
The parity trees of the fault-secure switch can be reused for
test response compaction to reduce the storage and bandwidth
requirements for manufacturing and in-ﬁeld testing. For the
compaction, the largest parity tree with n inputs is used.
The scan chain is restructured (Fig. 8) such that in each
shift-out cycle, the output bits included in a parity group are
compacted by the largest parity tree. To compact and shift-out
the responses of k parity groups, k cycles are required. The
scan ﬂip-ﬂops of the outputs encoded at ﬂit checkers are also
included into the scan chains. A further reduction of the shift-
out cycles is possible, if these ﬂip-ﬂops are used to balance
the length of the parallel scan chains.
.
.
.
Chain 1
Chain n
Group 1
Group 2
.
.
.
...
Flit checker groups k Groups (critical region)
Parity tree of 
the largest 
group
Fig. 8: Scan restructuring for test compaction
VI. EXPERIMENTAL RESULTS
The fault-secure synthesis scheme is evaluated for a typ-
ical NoC switch. Firstly, we introduce the characteristics of
the considered switch. Next, different error detecting codes
(EDCs) for data encoding at ﬂit-level are implemented and
compared in terms of fault coverage and area overhead. Then,
the result of the fault-secure switch is presented. Finally, we
discuss the impact of the proposed fault-secure structure on
test data compaction and diagnosis.
A. Switch Characteristics
The considered switch, designed for 2D mesh topology,
consists of ﬁve input/output ports. It implements a wormhole
XY routing and processes input ports in a round-robin fashion.
The switch is synthesized using Synopsys Design Compiler.
The target library lsi10k is constrained to basic gate primitives.
Because memory elements are usually equipped with advanced
BIST/BISR features [18], the storage elements of input ports
are not considered in the fault-secure synthesis ﬂow. Still, the
error detecting code (EDC) of the ﬂits partially protects the
memory elements. The memory elements are also excluded
from the area computations.
B. Evaluation of Flit Checkers
The choice of the code used for ﬂit checkers inﬂuences
both the number of covered faults and the area overhead. A
single-bit parity code needs only a single check bit per ﬂit and
because of the simple encoder structure, it imposes a smallhardware overhead. It detects single bit errors and multi-bit
errors of odd multiplicity. On the other hand, more complex
EDCs such as cyclic codes are better suited to detect burst
errors but impose a larger area or performance overhead.
The parity, Hamming, Berger, Hsiao, and CRC(4) (i.e.
an unrolled cyclic code with a generator of degree 4) are
implemented as the EDC in ﬂit checkers with a ﬂit width of
eight bits. The SAT instances f are constructed for each code
to compute the number of covered/uncovered faults. Fig. 9
compares the codes in terms of the area overhead of the ﬂit
checker with respect to the switch area and the portion of
uncovered faults. A structural analysis shows that 6931 faults
(56.85%) out of 12191 collapsed faults in the switch are not
propagated to data outputs. Therefore, they are in principle
not detectable by ﬂit checkers. In all cases, around 60% of
faults are recongnized as uncovered faults, that is they are not
detectable by ﬂit checkers for at least one input vector. Among
the remaining 5260 faults that are propagated to data outputs,
only the ones which are detectable by ﬂit checkers belong to
covered faults. In order to evaluate the detectability of EDCs
in ﬂit checkers, parameter  is deﬁned as:
 =
number of covered faults
number of faults propagated to data outputs
 100%:
 quantiﬁes the portion of faults that are detectable by ﬂit
checkers over the number of faults which are propagated
to data outputs. Fig. 9 compares parameter  of the se-
lected EDCs. The Berger code has the highest detectability
of 99.38%, while the other codes have a detectability of
approximately 87%. In general, the ﬂit checkers can detect
around 40% of the faults of the 8-bit switch with a maximum
area overhead of 15%.
To investigate the ﬂit width impact on the effectiveness
of ﬂit checkers, the number of covered/uncovered faults are
computed for the switch with the ﬂit width of 16, 32, 64, and
128 bits while parity is selected as the EDC. Table I depicts the
results. The second column reports the cell area of each switch
next to the ﬂit width. The third and forth column report the
number of collapsed faults and the number of uncovered faults
in each switch. The ﬁfth column shows the number of faults
which are propagated to data outputs and the next column
3,70% 3,70%
15,54% 11,10% 8,14%
62,23% 62,23%
57,12%
62,23% 62,17%
87,55% 87,55%
99,38%
87,55% 87,67%
0,00%
20,00%
40,00%
60,00%
80,00%
100,00%
Parity Hamming Berger Hsiao Cyclic (4)
Flit checker overhead (%)
Uncovered faults (%)
γ
Fig. 9: Error detecting codes: ﬂit checker overhead - uncovered faults - 
reports .
The result reveals that in the switch with the larger ﬂit
width, the portion of uncovered faults is less. This is because
more faults are propagated to switch data outputs and the
fraction of faults detected by the ﬂit checkers increases. When
the ﬂit width increases, the width of datapath elements such
as crossbar multiplexers increases. Moreover, parameter 
indicates that as the ﬂit width increases, a bigger portion of
faults which are propagated to data outputs become detectable
by ﬂit checkers. In fact, the number of faults which are not
detectable by ﬂit checkers remained unchanged (940 faults).
These faults are located at some of the control signals, for
example multiplexer select signals, and resulting errors are
propagated to data outputs. Changing the ﬂit width does not
inﬂuence the portion of control logic in the considered switch.
The time required to compute the uncovered faults and the
corresponding critical region is reported in the last column
of Table I. In bigger switches more faults are propagated to
the data outputs that must be traced by SAT for SDC. But
the time does not increases linearly because the size of the
switch, which determines the complexity of the SAT instance,
increases as well. Despite that, the method can still generate
the fault-secure switch in a reasonable time.
C. Result of Fault-Secure Synthesis
Considering parity as the EDC in ﬂit checkers and based on
the list of uncovered faults, the critical region is constructed
and the synthesis process is performed to enable concurrent
error detection of uncovered faults.
Table II summarizes the result. As shown in the ﬁrst row, by
constructing only ﬂit checkers with 3.7% area overhead, 7871
faults still cause SDC. According to the proposed synthesis
procedure, by constructing 43 extra parity groups over outputs
of the critical region, fault-secureness is ensured (i.e. no
SDCs). The resulting fault-secure switch has 52% area over-
head compared to the original switch. The overhead includes
the ﬂit checkers, prediction logic and dual rail checkers for
the critical region. Any single transient and permanent fault
that generates erroneous bits in the switch is detectable by this
fault-secure structure.
As shown in the third row of Table II, duplication with
comparison for the entire switch with 1040 primary and
pseudo primary outputs imposes a huge area overhead of
more than 370% due to the large number of outputs and the
overhead of dual rail checkers which are required to ensure
fault-secureness. Even without dual rail chackers duplication
imposes an area overhead of 204%. It reveals that for bigger
switches with even more outputs the area overhead of dupli-
cation will increase drastically.
The critical region occupies 41% of the switch area and
contains 185 outputs. As shown in the last row of Table II,
using duplication with comparison of just the critical region
imposes an area overhead of 89% (including dual-rail checkers
for output comparison). The presented fault-secure switch
saves almost 37% area cost compared to duplication with
comparison for the critical region and signiﬁcantly reducesTABLE I: Number of covered/uncovered faults with respect to the switch ﬂit width
Flit
Width
Cell
Area
# Collapsed
Faults
Uncovered
Faults
Faults propagated
to data outputs  (%) Time (s)
16-bit 7259 18901 10261 (54.29%) 9580 (50.69%) 90.19 784
32-bit 11979 32341 15061 (46.57%) 18220 (56.34%) 94.84 1924
64-bit 21419 59221 24661 (41.64%) 35500 (59.94%) 97.35 7266
128-bit 39017 112991 43871 (38.83%) 70060 (62.00%) 98.66 42963
the area overhead compared to conventional duplication with
comparison for the entire switch.
TABLE II: Result of fault-secure synthesis - 8-bit switch
Synthesis technique # groups # SDCs overhead
Switch + ﬂit checkers 5 7871 3.70%
Proposed fault-secure switch 48 0 52.01%
Duplicated switch - 0 371.29%
Duplicated critical region - 0 89.09%
D. Test Compaction and Diagnosis
Table III presents the result of test pattern generation for
the original switch and the switch with parity trees (fault-
secure switch) using a commercial ATPG tool. The result
shows that with only 9% of test response data 100% stuck-at
fault coverage is achieved in the fault-secure switch.
TABLE III: Comparison of test and diagnosis results
Switch Fault-secure switch
# Test patterns 372 394
Fault coverage (%) 100 100
Test response volume [bit] 386880 34672
Diagnostic success (%) 98.76 98.50
The compacted test responses can still be used for the
diagnosis of the switch [19]. Using the algorithm in [20],
diagnosis is performed for the switch and the switch with
parity trees. The last row of Table III compares the diagnostic
success (i.e. the top-ranked fault suspect corresponds to the
defect [20]) of the uncompacted response of the original switch
with the diagnosis of the parity streams in the fault-secure
switch targeting stuck-at faults. Even with the compacted test
response, the diagnostic success rate is approximately the same
as the uncompacted response. Indeed, without loss of fault
coverage and diagnostic success, the parity trees of the fault-
secure switch can be used to reduce the amount of response
data as well as the response shifting cycles by a factor of 11x.
VII. CONCLUSION
This paper presented a hybrid method to synthesize a fault-
secure NoC switch employing data encoding at ﬂit-level and
concurrent error detection with multiple parity trees. Boolean
satisﬁability is used to identify faults that cause silent data
corruption and to direct the construction of an error detecting
code. With an area overhead of only 52%, the resulting fault-
secure switch is able to detect any single combinational and
transition delay fault in the switch and interconnect links. The
structure can be reused for test compaction, which reduces the
amount of response data as well as test time without loss of
fault coverage and diagnosis success.
VIII. ACKNOWLEDGMENT
This work was supported by the German Research Founda-
tion (DFG) under grant WU 245/12-1 (ROCK).
REFERENCES
[1] G. Gielen et al., “Emerging yield and reliability challenges in nanometer
CMOS technologies,” in Proc. Design, Automation and Test in Europe
(DATE), 2008, pp. 1322–1327.
[2] S. Borkar, “Designing reliable systems from unreliable components:
the challenges of transistor variability and degradation,” IEEE Micro,
vol. 25, no. 6, pp. 10–16, 2005.
[3] M. Nicolaidis and Y. Zorian, “On-line testing for VLSI - A compendium
of approaches,” Journal of Electronic Testing – Theory and Applications,
vol. 12, no. 1-2, pp. 7–20, 1998.
[4] P. K. Lala, Self-Checking and Fault Tolerant Digital Design. Morgan
Kaufmann, 2001.
[5] Y. Zhang, H. Li, and X. Li, “Reliable network-on-chip router for
crosstalk and soft error tolerance,” in Proc. IEEE Asian Test Symp. (ATS),
2008, pp. 438–443.
[6] A. P. Frantz et al., “Dependable network-on-chip router able to simulta-
neously tolerate soft errors and crosstalk,” in Proc. IEEE Intl. Test Conf.
(ITC), 2006, pp. 1–9.
[7] A. Yanamandra et al., “Optimizing power and performance for reliable
on-chip networks,” in Proc. 15th Asia and South Paciﬁc Design Automa-
tion Conf. (ASP-DAC), 2010, pp. 431–436.
[8] A. Ghofrani et al., “Comprehensive online defect diagnosis in on-chip
networks,” in Proc. IEEE VLSI Test Symp. (VTS), 2012, pp. 44–49.
[9] T. Lehtonen et al., “Self-adaptive system for addressing permanent errors
in on-chip interconnects,” IEEE Trans. on Very Large Scale Integration
(VLSI) Systems, vol. 18, no. 4, pp. 527–540, 2010.
[10] D. Rossi, P. Angelini, and C. Metra, “Conﬁgurable error control scheme
for NoC signal integrity,” in Proc. IEEE Intl. On-Line Testing Symp.
(IOLTS), 2007, pp. 43–48.
[11] M. Palesi et al., “Data encoding schemes in networks on chip,” IEEE
Trans. CAD, vol. 30, no. 5, pp. 774–786, 2011.
[12] D. Bertozzi, L. Benini, and G. De Micheli, “Error control schemes
for on-chip communication links: the energy-reliability tradeoff,” IEEE
Trans. CAD, vol. 24, no. 6, pp. 818–831, 2005.
[13] C. Grecu et al., “Essential fault-tolerance metrics for NoC infrastruc-
tures,” in Proc. IEEE Intl. On-Line Testing Symp. (IOLTS), 2007, pp.
37–42.
[14] A. Dutta and N. A. Touba, “Reliable network-on-chip using a low cost
unequal error protection code,” in Proc. IEEE Symp. Defect and Fault-
Tolerance in VLSI Systems (DFT), 2007, pp. 3–11.
[15] M. Augustin, M. Gossel, and R. Kraemer, “Reducing the area overhead
of TMR-systems by protecting speciﬁc signals,” in Proc. IEEE Intl. On-
Line Testing Symp (IOLTS), 2010, pp. 268–273.
[16] N. Touba and E. McCluskey, “Logic synthesis of multilevel circuits with
concurrent error detection,” IEEE Trans. CAD, vol. 16, no. 7, pp. 783–
789, 1997.
[17] A. Dalirsani, M. A. Kochte, and H.-J. Wunderlich, “SAT-based Code
Synthesis for Fault-Secure Circuits,” in Proc. IEEE Symp. Defect and
Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), 2013.
[18] M. Zhang et al., “Sequential element design with built-in soft error
resilience,” IEEE Trans. VLSI, vol. 14, no. 12, pp. 1368–1378, 2006.
[19] A. Dalirsani et al., “Structural test for graceful degradation of noc
switches,” in Proc. IEEE European Test Symp. (ETS), 2011, pp. 183–
188.
[20] S. Holst and H.-J. Wunderlich, “A diagnosis algorithm for extreme space
compaction,” in Proc. Design, Automation and Test in Europe (DATE),
2009, pp. 1355–1360.