Fault-Tolerance of Robust Feed-Forward Architecture Using Single-Ended and Differential Deep-Submicron Circuits Under Massive Defect Density by Stanisavljevic, Milos et al.
Fault-Tolerance of Robust Feed-Forward Architecture Using
Single-Ended and Differential Deep-Submicron Circuits Under
Massive Defect Density
Milos Stanisavljevic, Alexandre Schmid, Member, IEEE, and Yusuf Leblebici, Senior Member, IEEE
Abstract— An assessment of the fault-tolerance properties
of single-ended and differential signaling is shown in the
context of a high defect density environment, using a robust
error-absorbing circuit architecture. A software tool based on
Monte-Carlo simulations is used for the reliability analysis
of the examined logic families. A benefit of the differential
circuit over standard single-ended is shown in case of complex
systems. Moreover, analysis of reliability of different circuits
and discussion on the optimal granularity of redundant blocks
was made.
I. INTRODUCTION
The increased prominence of embedded systems, which
are applied in safety-critical systems such as in-situ medi-
cal prosthetic microelectronic applications as well as space
applications, where component maintenance is virtually out
of question, emphasizes the need for increased system reli-
ability. At the same time, modern microelectronic systems
make use of advanced deep-submicron, and nanoelectronic
fabrication technologies, exhibiting increased rates of defect
density. The granularity of fault-tolerant “islands” must be
adapted to new rates of failure densities that occur in
nanometer-scale technologies. In this paper an architectural-
level methodology allowing significant improvement in sys-
tem reliability, which is applicable to deep-submicron as well
as nanoelectronic systems, is presented. Single-ended and
differential signaling circuits, built into the proposed feed-
forward fault-tolerant architecture, are assessed from the per-
spective of their robustness to failure. A software tool based
on SPICE Monte-Carlo (MC) simulations, which allows a-
priori data analysis of defect-sensitive deep-submicron digital
microelectronic circuits has been specifically developed.
Differential Cascode Voltage Switch (DCVS) logic [1]
is a circuit technique which has potential advantages over
conventional static CMOS NAND/NOR logic in terms of
circuit delay, layout density, power dissipation, and logic
flexibility. In this paper we demonstrate key advantages
in terms of reliability of DCVS logic in comparison with
standard CMOS logic, both used in a four-layer robust
architecture developed for error absorption in high-density
of failure environment. Moreover some insight is given into
optimal redundant block sizing and design methodology.
In Section II, the applied defect modeling is described.
The reliability architecture as well as averaging circuits are
The authors are with the Microelectronic Systems Labora-
tory (LSM), Swiss Federal Institute of Technology (EPFL),
CH-1015 Lausanne, Switzerland (email: {milos.stanisavljevic,
alexandre.schmid, yusuf.leblebici}@epfl.ch).
presented in Section III. The tool for analysis is described
in Section IV. The fault tolerant properties of differential
signaling, and a comparative analysis of obtained results
are shown in Sections V and VI respectively. Finally, a
discussion of optimal granularity and design methodology
is presented in Sections VII and VIII.
II. DEFECT MODELING AND ANALYSIS TOOL
A major step in any design automation process consists of
simulation. In order to perform a simulation for reliability,
an accurate fault model of physical defects and fault modes
with a netlist fault description is necessary. The Stuck-At
approach which is traditionally used in fault coverage is
not sufficient to handle the analysis of various faults in
nanometer-scale devices. There are two basic approaches in
device fault modeling: Inductive Fault Analysis (IFA) [2] and
transistor level fault modeling [3], both of which are proven
to be complex problems.
The transistor-level fault modeling is applied at an ab-
straction level above physical layout. It usually incorporates
only stuck-on, stuck-off models of transistors for representing
faults. These models represent a very reduced set of possible
physical defects and therefore they are not sufficient. On the
other hand, the IFA approach has some drawbacks, mainly
high computational complexity of used tools, complete de-
pendency on geometrical characteristics and difficulty of
handling properly analog layouts.
A three-level hierarchical fault modeling is proposed in
this paper in order to overcome shortfalls of transistor level
fault modeling using some results of IFA approach and also
to cover as wide as possible range of impacts that device
faults have on the circuit behavior (Figure 1). The first
level consists of transistor model parameters (e.g. threshold
voltage Vth, oxide thickness tox, geometric parameters L,
W ) whose process-dependent variation have a significant
influence on the dynamic behavior and can lead to “dynamic”
faults, or violation of design time constraints. Here, each
parameter can be represented by its distribution function
fi(. . .) and a nominal mean value.
In the second level, models for various physical defects
such as missing spot, unwanted spot, Gate Oxide Short
(GOS) with channel, floating gate coupled to a conductor,
and bridging faults are adopted [4], [5]. These models have
been developed from structural and lithography defects,
and each defect model is described in terms of electrical
parameters of its components. Thus, for simulation purposes,
LV3
interconnect defects
global signals
LV2
macro transistor
replacement including
16 error models
LV1
)(1 VfV thth =
)(2 tft oxox =
)(3 LfL =
)(4 WfW =
mean3σ
parameter fluctuations
affecting time constants
Fig. 1. Three layer fault modeling.
physical defects are translated into equivalent electrical linear
devices such as resistors, capacitors, and nonlinear devices
such as diodes and scaled transistors. A total of sixteen
possible defects are considered for each transistor, which are
listed in Table I.
The third layer of the fault model represents the mapping
of interconnection defects into their electrical models (open
spots and bridging faults) [6]. The actual model is highly
dependent on geometrical characteristics of layout, where
maintaining correspondence between physical and electrical
parameters remains as a problem that needs to be solved. In
the transistor level simulations this layer can be excluded,
considering that more than 80% [7] of signal errors in
modern circuits are due to global signals stuck-at supply or
ground.
III. RELIABILITY ASSESSMENT APPROACH AND
AVERAGING CIRCUITS
The fault-tolerance property of the proposed redundant
four-layer feed-forward architecture has been applied pre-
viously to the case of single Boolean gates [8]. An array
arrangement has been proposed to fabricate a multiple-input
NOR slice, with fault-absorption capability [9]. Single-ended
and differential circuits realizing the critical third layer are
proposed in this Section.
The fault-tolerant architecture depicted in Figure 2 consists
of four layers in which the data is strictly processed in a feed-
forward manner. The first layer is denoted as the input layer,
accepting conventional Boolean (binary) signal levels. The
core operation is performed in the second layer, which con-
sists of a number of identical, redundant units implementing
the desired logic function. Fault immunity increases with the
TABLE I
LIST OF LOW-LEVEL FAILURES MODELED IN LEVEL LV2.
Acronym Failure Type
DHO Drain Hard Open, resulting in stuck-off fault
DSO Drain Soft Open, resulting in partial stuck-off fault
SHO Source Hard Open, resulting in stuck-off fault
SSO Source Soft Open, resulting in partial stuck-off fault
FLG FLoating Gate resulting in disconnected input
DSHS Drain Source Hard Short, resulting in stuck-on fault
DSSS Drain Source Soft Short, resulting in
partial stuck-on fault
DGHS Drain Gate Hard Short, resulting in input-output
bridging fault
DGSS Drain Gate Soft Short, resulting in partial
input-output bridging fault
GSHS Gate Source Hard Short, resulting in input
stuck-at fault
GSSS Gate Source Soft Short, resulting in partial
input stuck-at fault
DBHS Drain Bulk Hard Short, resulting in excessive current
flowing through the substrate
DBSS Drain Bulk Soft Short, resulting in partial excessive
current flowing through the substrate
GOS Gate Oxyde Short, resulting in an excessive current
flowing through the gate oxyde insulator
number of redundant units, yet the operation is quite different
from the classical majority-based redundancy. The third layer
receives the outputs of the redundant logic units in the second
layer, creating a weighted average with re-scaling. Note that
the output of the third layer becomes a multiple-valued logic
level. Finally, the fourth layer is the decision layer where a
binary output value is extracted using a variable threshold.
input layer logic layer averaging layer decision layer
weighted
average 
blocks
threshold
decision 
block
identical
logic 
blocks
y
k1
x1
xn kn
xi
ki
y  =   
Vfs
k i
i
i
k i xi
z
y<Vth    z=VDD
y>Vth    z=GND
LY1 LY2 LY3 LY4
adaptive 
learning
Fig. 2. Four-layer, feed-forward fault-tolerant architecture, with adaptive
final decision stage.
If we measure the output of the third layer for all input
values we can the construct circuit’s transfer function surface.
The acceptance condition for a transfer function surface to
be considered as operating correctly, despite of any errors
in the circuit, can be limited to critical intervals dictated by
the input noise margin of the next stage, as later depicted
in Figure 3. The condition for accepting or rejecting the
transfer function surface is dictated by the possibility to place
a threshold value Vth and its tolerance interval in a way that
permits a correct separation of Logic 1 and Logic 0 outputs.
The variable threshold is clearly necessary in order to
select the appropriate threshold level. Also, a method allow-
ing the auto-adjustment of the threshold voltage is highly
desirable. Incorporating adjustment mechanisms into every
fault-tolerant Boolean gate would require a large amount
of extra hardware. Possible ways to explore include local
malfunction detection, and report to a central control unit,
which selectively applies learning algorithms inspired from
artificial neural network theory to adapt the threshold and
restore correct operation.
Fig. 3. Faulty transfer function surface showing significant distortion at
the output of LY3 and correctly set threshold.
Fixed-weight, single-ended and differential realization of
the third-layer which execute weighted average and rescaling
of their inputs are shown in Figure 4. The circuits have been
designed to limit the circuit area. Generally speaking, the
single-ended realization should be selected to conform to
area constraints, whereas the differential realization should
be selected where higher linearity of the transfer function is
demanded. This will be discussed in detail in Section VI.
IV. STATISTICAL ANALYSIS TOOL
The proposed redundancy scheme (explained in Section II)
does not allow to extract a simple reliability rule, such
as a majority rule applied in Triple Modular Redundancy
(TMR) systems. In our case, every system state corresponds
to an individual combination of transistor states that manifest
themselves as degenerated DC transfer function surfaces,
some of which still operate correctly. In order to perform
analytical assessment of reliability of four-layer architecture
it is necessary to extract a rule-set which describes the combi-
nation of transistor states that allow correct circuit operation.
For each state, knowing probabilities of the transistors to be
in that state allows to calculate the probability of correct
system operation as a sum of products of probabilities. The
rule-set should be disjunctive and provide full coverage for
all cases of correct circuit operation.
The described method is very complex. Every constituting
element of the block, such as a MOS transistor, can be in
a number of states dictated by the fault occurring. Calling ε
the number of faults, and n the number of transistors under
consideration, the total number of system states is given as
(ε + 1)n. For a full statistical coverage it is possible to
consider a limited number of cases, given that the redundancy
in the logic layer does cause a number of cases to appear as
R
VX1 VX2 VX3 VX4
VOUT
VDD
(a) CMOS realization
R
VX1 VX2 VX3
__
VX3
__
VX2
__
VX1
VOUT
___
VOUT
VDD
(b) DCVS realization
Fig. 4. Averaging layer LY3.
identical in their DC transfer function, and also taking into
account that faults are not totally statistically independent.
Nevertheless, the actual number of states is exponentially
dependent on the number of transistors. Moreover, the ex-
traction of a rule-set that describes correct functioning turns
out to be intricate considering that the rule-set should cover
the whole state space and that the number of rules is also
exponentially dependent on the number of transistors. Above
all, the task of mapping rules into actual probabilities is
trivial.
In this case, the use of Monte Carlo based approach offers
a relevant solution. The Monte Carlo approach implies state
sampling. In this technique, a subset of states (sample) is
randomly chosen from the set of all possible states. The
states in a sample are simulated and the ratio of states where
correct operation can be extracted (working states) over all
states in the sample is used as an estimate of reliability in
the complete state set. The accuracy (or error bound) of the
estimated coverage depends on the absolute number of states
in the sample. This number is known as sample size (in our
case the necessary number of Monte Carlo iterations). The
error bound of the estimate can be reduced by increasing
the sample size. The total number of states, Np, is called the
population size. We want to determine the population fraction
R that represents working states and randomly collect a
sample of Ns states (sample size = Ns). If r is a random
variable representing probability of correct operation and
x is an estimated value of R determined by Monte Carlo
simulation, than the number of ways to obtain sample states
Nw is given in Equation 1.
Nw =
(
RNp
xNs
)(
(1−R)Np
(1− x)Ns
)
(1)
The probability of a state sample giving a value x for the
random variable r is given in Equation 2.
p(x) = Prob(r = x) =
(
RNp
xNs
)(
(1−R)Np
(1− x)Ns
)
(
Np
Ns
) (2)
This represents the hypergeometric probability density func-
tion of a discrete-valued random variable r. When Ns
is large, r can be treated as a continuous variable and
Equation 2 is conventionally approximated by a Gaussian
probability density function with mean, ε(r) = R, and
variance σ2 as expressed in Equation 3.
p(x) = Prob(x ≤ r ≤ x + dx) = 1
σ
√
2π
e−
(x−R)2
2σ2 (3)
Here R represents the true probability of correct operation,
as the mean (or an unbiased estimate) of r. The variance of
r is given according to [10] in Equation 4
σ2 =
R(1−R)
Ns
(1− Ns
Np
) ≈ R(1−R)
Ns
(4)
The actual probability of x being within the 3-sigma range
is given in Equation 5.
|x−R| = 3
√
R(1−R)
Ns
(5)
For example for R = 0.5 we need only 1000 Monte Carlo
iterations (Ns = 1000) to guarantee an error smaller than
1.5%. This makes the Monte Carlo based approach very
suitable.
A software tool using MC, SPICE and MATLAB has
been developed in order to automate the reliability analysis
process. First, a netlist is acquired from the appropriate
schematic acquisition tool. Then, in each of the applied MC
iterations, a faulty pattern is generated. Standard BSIM3
models of the transistors that are affected by faults are re-
placed with the appropriate fault models according to the er-
ror model described in Section II. A multi-variable DC sweep
analysis for the acquired circuit netlist is then executed,
thus forming transfer function surfaces for the considered
block under failure. Subsequent Monte Carlo iterations are
run applying different failure patterns, and performing sweep
analysis in the transistor fault probability space. In a next
step, simulation results are processed and discrimination of
the transfer function surfaces where proper operation can
possibly be recovered is performed. The related probability
of correct operation with respect to the probability of fault
of a single transistor can be calculated in the next stage. In
the final step, this probability for a unit block is compared to
additional trials in order to select an appropriate redundancy
factor, a circuit topology, and a transistor size.
The acceptance condition for a transfer function surface
to be considered as correctly operating, despite of any errors
in the circuit, can be limited to critical intervals dictated by
the input noise margin of the next stage.
The described steps are shown in Figure 5.
Circuit Development
Schematic Circuit Acquisition Tool 
Netlist
Main 
Process 
Script
Daughter 
Process 
Scripts
Daughter 
Process 
Scripts
Transfer 
Function 
Surfaces
Statistic Analysis Results
Daughter 
Process 
Scripts
SPICE 
Transistor 
Model
Reliability Simulations
SPICE Circuit simulator,
 Monte Carlo analysis (C code)
Components 
Models        
Library
Results Analysis  - Statistics
MATLAB analysis of TFSs
Failure Model with 
embedded failure 
probability support
Fig. 5. Synthetic flow-graph of the tool for the a-priori analysis of
reliability.
Fault distribution models adapted for nanometric technolo-
gies require monitoring the failure behavior of actual devices
in mass production. Also, the computational load shows an
exponential dependency with the number of input variables
as well as faulty states, using conventional models.
However, in the case of the Monte Carlo based approach
computational load is exponentially dependent only on the
number of input variables, and not on the number of faulty
states and fault modeling parameters. Moreover, faulty states
and fault modeling parameters have a limited impact on
single iteration time in order of logarithmic proportion.
This is an additional advantage of the Monte Carlo based
approach over any analytical approach. The total time for
simulations to be run is expressed in Equation 6.
Tsim = N (Nvar−1)sp NitNprobTit (6)
with Tit ≈ Nvar log(ε)
Here, Nsp is the number of sweep points for each variable,
Nvar the number of input variables, Nit the number of MC
iterations, Nprob the number of probability iterations, Tit
is the time of one iteration and ε the number of different
simulated fault states. Results of this reliability analysis
approach will be shown in Section VI.
V. FAULT-TOLERANT PROPERTIES OF DIFFERENTIAL
SIGNALING CIRCUITS
In differential circuits, information is processed and trans-
mitted in a redundant and complementary way, intrinsically
offering an increased resistance to failures. In case of failure,
the correct output signal can still be recovered, if i) the
complement signal is available, and ii) the circuitry which
can decode this state is available.
The logic decision is based on the possibility to de-
fine a decision interval centered around a threshold value
[Vth −∆Vth, Vth + ∆Vth] separating the complementary
output line voltage values representing the Logic High and
Logic Low. The exact values of Vth and ∆Vth are dictated
by the circuit construction of the next layer input stage.
Nevertheless, some conclusions with significant practical
impact can be drawn out of the theoretical rationale.
VDD
-VDD
Vin
Vout
VSS = Vth10 [V]
Vth2
Vin
Fig. 6. Effect of stuck-at errors on the transfer function, and corresponding
adaptive value of Vth: 1) failure free (plain line, Vth1), 2) stuck-at-zero
(circled line, Vth2).
The impact of a stuck-at fault is perceived on the transfer
function surface as a compression of the analog output range,
as depicted in Figure 6. Consequently, a variable decision
threshold Vth is mandatory in order to handle all possible
combinations of failure distribution. Let the output of layer
three (LY3) be complementary lines a and b. Calling the
values on the lines a and b corresponding to Logic 1 and
Logic 0, a1, b1 and a0, b0 respectively, the value of Vth which
is appropriate to handle errors is the arithmetical average of
differential signals, taking actual values into account, i.e. any
stuck signal is assigned its actual stuck voltage value, and is
expressed as:
Vth =
(a1 − b1) + (a0 − b0)
2
A thresholding circuit complying with this property is
applied in the four-layer architecture and used in the analysis
in the following Section.
VI. COMPARATIVE ANALYSIS OF OBTAINED RESULTS
Different types of circuits (basic gates, Boolean, as well
as more complex ones as full adders) and different circuit
topologies (standard CMOS Logic, static DCVS-Differential
Cascode Voltage Switch Logic) with various redundancy
factors have been extensively analyzed, using the presented
software tool. The reliability figure has been defined as
the probability of correct operation with respect to the
probability of failure of each transistor. Some basic DCVS
gates are depicted in Figure 7.
VDD VDD
   
VA VB
__
VA
__
VB
_
Q Q
_
Q Q
__
VA
__
VB
VA
VB
(a) NOR2/OR2 (b) NAND2/AND2
Fig. 7. DCVS realization of Boolean gates.
The analysis of CMOS and DCVS logic realizations of a
2-input NOR gate is depicted in Figure 8. In both topologies,
blocks for averaging and thresholding (decision) were not
affected by induced faults (this condition is referred in the
following as fault-free averaging circuit). No significant
difference in circuit reliability can be observed under the
aforementioned conditions. The reliability of the NOR2
function remains very close to 1 even for device failure rates
exceeding 10%.
0  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=3
Std. CMOS fault-free averaging circuit
DCVS faulty averaging circuit
Fig. 8. Comparative analysis of the 2-input NOR gate in DCVS and
standard CMOS logic with fault-free averaging circuit.
If faults are induced in the averaging circuit to represent
more realistic conditions, however, the overall system re-
liability drops more rapidly, and the curve is not showing
saturation for low fault density. Instead, it becomes quasi-
linear in the critical working range, i.e. for transistor failure
probability lower than 0.3, as seen in Figure 9(a) and (b)
for different redundancy factors. In this case, the use of a
differential averaging circuit shows improved reliability in
comparison to standard CMOS, with significant difference
in case of a larger redundancy factor. Moreover, there is no
improvement in reliability with respect to redundancy for
standard CMOS, which is due to the averaging circuit.
0  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=3
Std. CMOS faulty averaging circuit
DCVS faulty averaging circuit
(a) Redundancy factor R=3.
0  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=5
Std. CMOS faulty averaging circuit
DCVS faulty averaging circuit
(b) Redundancy factor R=5.
Fig. 9. Comparative analysis of the 2-input NOR gate in DCVS and
standard CMOS logic with faulty averaging circuit for a redundancy of
R=3 and R=5.
It can be seen that an overall reliability of >99% can be
maintained with the DCVS solution under device failure rates
of up to 10%, while the reliability of the CMOS solution
drops rapidly below 90% for the same device failure rate.
When larger blocks are used as redundant units, the
probability of correct operation is reduced with respect to the
size of a block. This is depicted in Figures 10 and 11 where
a gate realizing the complex 4-input function, described in
figure caption, and a full adder cell are used respectively as
redundant blocks. Nevertheless, a clear advantage of DCVS
logic realization in comparison with standard CMOS is
observed in both cases. In the 4-input function case there
is an improvement in reliability with respect to increase of
redundancy for both configurations (Figures 10(a) and (b)).
In the full adder case advantage of differential configuration
is emphasized in the Sum signal output, whose path has an
increased logic depth and complexity in case of standard
CMOS when compared to DCVS. The benefits of the dif-
ferential architecture are even more obvious in case where
faults are induced into the averaging circuit (Figure 11(b)),
when compared to fault-free averaging circuit (Figure 11(a)).
The simulation conditions and results, reported in Table II,
0 0.05 0.1 0.15 0.2
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=3
Std. CMOS faulty averaging circuit
DCVS faulty averaging circuit
(a) Redundancy factor R=3.
0   0.05 0.1 0.15 0.2 
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=5
Std. CMOS faulty averaging circuit
DCVS faulty averaging circuit
(b) Redundancy factor R=5.
Fig. 10. Comparative analysis of the 4-input gate realizing a function
f(x1, x2, x3, x4) = x1x4′+(x2x3)′+x1(x2x3)′+x1′x2x3x4 in DCVS
and standard CMOS logic with faulty averaging circuit for a redundancy of
R=3 and R=5.
demonstrate the extensive use of computational resources.
Simulation time grows exponentially with the number of
input variables, which is however dedicated to library devel-
opment, and as such does not affect the end user simulations
in phase two described in Section VIII. Multiprocessor
systems, which optimally support parallel operation of the
Monte-Carlo algorithm can be used to limit simulation time.
VII. DISCUSSION ON OPTIMAL GRANULARITY OF
REDUNDANT BLOCKS
The granularity at which the cell size is to be considered
must be adapted to new rates of failure densities that occur in
nanometer-scale technologies. Besides acquiring information
related to the probability of correct operation of a block, the
optimal size of a block is also an important factor in design
methodology for unreliable architectures. Insight in optimal
sizing of redundant blocks is provided in following.
Due to the fact that all analysis are performed at transistor
level, optimization is only possible according to the transistor
count and not block area. However, this is not a drawback
taking into account that this information should be used for
0   0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=3
DCVS full adder Carry
CMOS full adder Carry
DCVS full adder Sum
CMOS full adder Sum
(a) Fault-free model of averaging circuit.
0   0.05 0.1 0.15 0.2 0.25
0
0.2
0.4
0.6
0.8
1
Probability of failure for each transistor
P
ro
ba
bi
lit
y 
of
 c
or
re
ct
 d
ec
is
io
n
redundancy=3
DCVS full adder Carry
CMOS full adder Carry
DCVS full adder Sum
CMOS full adder Sum
(b) Faulty model of averaging circuit.
Fig. 11. Comparative analysis of the full adder block in DCVS and
standard CMOS logic for a redundancy of R=3 in case of fault-free and
faulty averaging circuit models.
choosing an optimal subset of library of reliable components
for synthesis in a given technology as explained in the
Section VIII.
The cost function CF for the optimal size analysis in case
of a given defect density is chosen as:
CF = Pcor/FOH
where Pcor is a probability of correct operation for a given
defect density acquired using the tool and FOH is the
normalized overhead function given by:
FOH = NTtot/NTblock
with NTtot representing the total number of transistors in
a cell realized using four-layer architecture and NTblock
representing the number of transistors in a single redundant
unit. The analysis is performed for various circuits (NOR2,
NAND2, NOR3, NAND3, 1b FA, 2b FA, 3b FA, 4b FA,
XOR2, MUX42, ISCAS-C17, AOI21), in case of redundancy
factor R=3, in order to get broad coverage. Results are
depicted in Figure 12.
The optimal number of transistors in a block NTopt is
found as the block size for which the cost function reaches
TABLE II
SIMULATION CONDITIONS
system ULTRASPARC III+ 900MHz 16GB RAM
software Solaris 9, Cadence 5.0.32 SPECTRE
technology UMC 0.18 µm logic, 1.8V, 1P6M
Boolean NOR: CMOS vs. DCVS for R=3,5
R 16 sweep pts 49 sweep pts transistors count
CMOS DCVS CMOS DCVS CMOS DCVS
3 72h 85h 85h 100h 17 27
5 80h 95h 100h 118h 22 43
4-bit ripple-carry full-adder: CMOS vs. DCVS for R=3
R = 3 1024 sweep pts transistors count
input variables per CMOS DCVS CMOS DCVS
cell (4) = 5 180h 165h 392 336
its maximum. Despite the fact that various circuits have
been used, an optimal block size can be observed as a local
maximum, and the second-order polynomial approximation
of the curve provides satisfying correlation between the
points. For a defect density of 5%, the optimal block size
for standard CMOS cells is equal to 30 transistors, whereas
it is limited to only 15 in the case of 10% defect density.
The reliability figure is improving using DCVS circuits, as
it can be observed on Figure 12(b).
10
15
20
25
30
35
40
0 10 20 30 40 50
Transistor Count
Probability of Correct Operation Divided by Overhead 
?  5%  transistor fault prob.
?  10%  transistor fault prob.
——   Second order
           polynomial approx.
(a) CMOS realization
15
17
19
21
23
25
27
29
31
33
35
0 10 20 30 40
Probability of Correct Operation Divided by Overhead 
Transistor Count
?  5%  transistor fault prob.
?  10%  transistor fault prob.
——   Second order
           polynomial approx.
(b) DCVS realization
Fig. 12. Optimal block size for a given technology
An increase in the defect density apparently demands a
smaller optimal block size. An excessively small optimal size
HDL
RTL
Synthesis
+
Logic
Optimization
Physical
Design
Layout
Specification
+
Additional
Reliability
Specifications
+
Constraints
Function
Libraries
Cell Layout
Libraries
Reliable Cell
Properties
Pre-designed function
blocks with built -in fault
immunity
Block Size
and
Redundancy
Level
Optimization
Pre-design operations CAD Framework Designer Space
iterative
process(RCP)
Fig. 13. Schematic flow-graph of the proposed design-flow
(less then 10 transistors) is the indication that the applied
reliability architecture is not appropriate to support the given
defect density with the required fault tolerance. Moreover,
the small size of the optimum block that is observed verifies
the correctness of the software tool addressing the analysis
of circuits of a size up to approximately 100 transistors.
VIII. DESIGN RELIABILITY EXPLORATION FRAMEWORK
In order to provide the end-user digital IC designer with a
tool allowing the exploration of the reliability design space,
an adaptation of the standard design-flow applied nowadays
is proposed according to Figure 13, which is supported by a
command line tool that is used in two distinctive phases.
The software reliability assessment tool is used in a
first phase to develop fault-tolerant standard cells forming
libraries of blocks with various levels of immunity to failure.
Every standard cell is described in a number of schematic
representations, each using a different architecture (redun-
dancy factor R), design style, and variable parameters (tran-
sistor sizes). After the analysis, results are attached to every
developed cell as an extended cell model parameter (RCP in
Figure 13). In a second phase currently under development,
the end-user designer makes use of the pre-developed cells
and combines them according to their attached performance.
Intensive reliability simulations are no longer required at this
stage, allowing the designer to experiment various scenarios.
Hence, we extend the concept of reliability by construction
to the selection of the optimal architecture.
IX. CONCLUSIONS
A circuit architecture inspired by the feed-forward artificial
neural network model is presented in order to improve
reliability of very-deep submicron and future nanoelectronic
circuits which are expected to be fabricated in technologies
prone to exhibit high density of failures. The impact of using
such failure prone fabrication technologies on the reliability
of different circuits and circuit architectures has been ex-
plored using an a-priori assessment software tool, with the
goal of constructing and delivering libraries of fault-tolerant
cells. The proposed error-absorbing, four-layer feed-forward
architecture has been demonstrated to perform better using
a differential averaging circuit. Moreover, the DCVS logic
shows benefits in comparison with standard CMOS logic in
all cases where complex gates or cells (such as a full adders)
are used, demonstrating the benefits of extending differential
signaling to the development of digital cells. Clearly, bio-
inspired systems and methodologies offer original solutions
to acute reliability issues related to emerging and future
fabrication technologies.
ACKNOWLEDGMENT
The authors acknowledge the support of the Swiss Na-
tional Science Foundation under grant 200021-112291, and
the EPFL Centre SI project NANO-PLA.
REFERENCES
[1] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, “Cascode
voltage switch logic: A differential CMOS logic family,” Proc. ISSCC
Dig. Tech. Papers, pp. 16–17, 1984
[2] J. P. Shen, W. Maly, F. G. Ferguson, “Inductive fault analysis of MOS
Integrated Circuits,” Proc. IEEE Design and Test of Computers, 2(6),
pp. 13–26, 1985
[3] T. Olbrich, J. Perez, I. Grout, A. Richardson, C. Ferrer, “Defect-
Oriented Vs Schematic-Level Based Fault Simulation for Mixed-
Signal ICs,” Proc. International Test Conference, pp. 511, 1996
[4] D. Al-Khalili, S. Adham, C. Rozon, M. Hossain, D. Racz, “Compre-
hensive defect analysis and defect coverage of CMOS circuits,” Proc.
IEEE International Symposium on Defect and Fault Tolerance in VLSI
Systems, pp. 84–92, 1998
[5] R. Rodriguez-Montanes et al., “Current vs. logic testing of gate oxide
short, floating gate short and bridging failures in CMOS,” Proc.
International Test Conference, pp. 510–519, 1991
[6] T. M. Storey and W. Maly, “CMOS bridging faults detection,” Proc.
International Test Conference, pp. 1123–1131, 1991
[7] V. Krishnaswamy, A. B. Ma, P. Vishakantaiah, “A study of bridging
defect probabilities on a Pentium (TM) 4 CPU,” Proc. International
Test Conference, pp. 688–695, 2001
[8] A. Schmid and Y. Leblebici, “Robust circuit and system design
methodologies for nanometer-scale devices and single-electron transis-
tors,” Proc. IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 12, no. 11, pp. 1156–1166, 2004
[9] A. Schmid and Y. Leblebici, “Regular array of nanometer-scale devices
performing logic operations with fault-tolerant capability,” Proc. IEEE
Conference on Nanotechnology (IEEE-NANO), pp. 399–401, 2004
[10] V. D. Agrawal, “Sampling techniques for Determining Fault Coverage
in LSI Circuits,” Journal of Digital Systems, vol. V, no. 3, pp. 189–202,
1981
