On the Design of Reliable Boolean Circuits That Contain Partially Unreliable Gates  by Kleitman, Dan et al.
File: DISTIL 153101 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 7066 Signs: 5287 . Length: 60 pic 11 pts, 257 mm
Journal of Computer and System Sciences  SS1531
Journal of Computer and System Sciences 55, 385401 (1997)
On the Design of Reliable Boolean Circuits
That Contain Partially Unreliable Gates*
Dan Kleitman, Tom Leighton, and Yuan Ma
Department of Mathematics and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
Received September, 1995; revised July 1, 1997
We investigate a model of gate failure for Boolean circuits in which
a faulty gate is restricted to output one of its input values. For some
types of gates, the model, which we call the short-circuit model of gate
failure, is weaker than the traditional von Neumann model in which
faulty gates always output precisely the wrong value. Our model has
the advantage that it allows us to design Boolean circuits that can
tolerate worst-case faults, as well as circuits that have arbitrarily high
success probability in the case of random faults. Moreover, the short-
circuit model captures a particular type of fault that commonly appears
in practice, and it suggests a simple method for performing posttest
alterations to circuits that have more severe types of faults. A variety of
bounds on the size of fault-tolerant circuits are proved in the paper.
Perhaps, the most important is a proof that any k-fault-tolerant circuit
for any input-sensitive function using any type of gates (even arbitrarily
powerful, multiple-input gates) must have size at least 0(k log k
log log k). Obtaining a tight bound on the size of a circuit for comput-
ing the AND of two values if up to k of the gates are faulty is one of the
central questions left open in the paper. ] 1997 Academic Press
1. INTRODUCTION
In this paper, we investigate a model of gate failure for
Boolean circuits in which a faulty gate is restricted to output
one of its input values. We will call this model the short-
circuit fault model. For example, such a fault might occur in
an AND gate if the wire carrying one of the inputs became
stuck at 1 or became disconnected. Then the faulty AND
gate would function as if its output were short-circuited to
the other input.
For some types of gates (such as a NOT gate), the short-
circuit model of failure is similar to the classic von Neumann
model of failure in which a faulty gate outputs the comple-
ment of the correct value [16]. For other types of gates
(such as AND and OR gates), the short-circuit model differs
from the von Neumann model. For example, if all the inputs
to the gate are identical, then the gate will produce the
correct output even if it is faulty. Hence, the short-circuit
model that we consider is, in most cases, weaker than the
classic von Neumann model.
The von Neumann fault model has been the model of
choice for several decades when it comes to the study of fault
tolerance in Boolean circuits. (For example, see [36, 810,
1215] and the references contained therein.) Although the
von Neumann model is important, it does have some draw-
backs. For example, there is no circuit that is tolerant to
even one worst-case fault. This is because a gate just before
an output could be faulty, which means that the circuit will
be faulty. (It may be possible to overcome such difficulties
by expressing the output in some coded form, but then there
could not be a fault-tolerant decoding circuit.) Similarly, in
the case of random gate failures, the reliability of the overall
circuit in the von Neumann model can be no greater than
(1&\)3(m), where \<0.5 is the probability that a compo-
nent will fail and m is the number of outputs. (After all, there
is again the chance that the last gate before any output will
fail, which will cause the entire circuit to fail.) For circuits
with one output, this bound is (perhaps) tolerable. For cir-
cuits with many outputs, the bound means that it is not
possible to build large fault-tolerant circuits within the
von Neumann model.
In addition, the von Neumann model does not accurately
model all types of failures. For example, in a real circuit, a
single inadvertent power-ground fault might wipe out the
whole circuit. For such faults, the von Neumann model is
too optimistic. Alternatively, a gate might have a more
benign short circuit or miswiring, where the output of a gate
is simply ‘‘stuck at’’ a fixed value (0 or 1) or where the out-
put of a gate is ‘‘stuck at’’ the value of one of the inputs.
Stuck-at faults differ from a von Neumann fault in that a
stuck-at fault might result in a gate producing the correct
output, whereas a von Neumann fault always produces the
wrong answer. For such faults, the von Neumann model
seems to be too pessimistic. Actually, stuck-at faults may
not always be more benign than von Neumann faults. For
example, the output of a gate that is von Neumann faulty
Article No. SS971531
385 0022-000097 25.00
Copyright  1997 by Academic Press
All rights of reproduction in any form reserved.
* Research supported by AFOSR Contract F49620-92-J-0125, DARPA
Contract N00014-91-J-1698, and DARPA Contract N00014-92-J-1799.
Yuan Ma was also supported by an NSF Mathematical Sciences Postdoc-
toral Research Fellowship. His current address is Haas School of Business,
University of California, Berkeley, Berkeley, CA 94720. The authors’
e-mail: djkmath.mit.edu; ftlmath.mit.edu; yuanhaas.berkely.edu.
File: DISTIL 153102 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6735 Signs: 5727 . Length: 56 pic 0 pts, 236 mm
can be easily corrected by inverting the signal, but the same
is not true for stuck-at faults.1 Indeed, somebut not
allof the results known for von Neumann failures can be
shown to hold for stuck-at failures.
In practice, stuck-at and power-ground failures resulting
from short-circuits or broken connections are much more
common than von Neumann failures. However, the analysis
of von Neumann failures is still important since it does lead
to the design of circuits that can tolerate some types of
probabilistic failures with reasonable upper bounds on loss
of performance [10]. In this paper, we consider an alter-
native model of gate failure that focuses on short-circuit and
miswiring faults. Like the von Neumann model, our short-
circuit model is also restrictive in that it considers only a
specific kind of fault (but, at least, one that does commonly
occur in practice or that can be made to occur as part of a
procedure to intentionally short-circuit gates with more
severe faults).
Perhaps, the most compelling argument in support of the
short-circuit model of gate failure is the fact that the model
is robust enough to allow us to design Boolean circuits that
can tolerate an arbitrary number of worst-case faults, as
well as circuits that work with arbitrarily high probability in
the case of random faults. Hence, it may be argued that the
short-circuit failure model that we propose in the paper may
potentially be at least as useful in practice as the more tradi-
tional von Neumann model. Indeed, even if the gates in a
real circuit suffer more devastating faults than are allowed
in the short-circuit model, the work in this paper motivates
the development of a posttest procedure that would
‘‘isolate’’ each faulty gate from the circuit by disconnecting
the gate and then directly connect any input of the gate to
its output(s). In this fashion, any gate failure can be
automatically converted into a short-circuit type of failure.
The technology needed to perform such gate surgery is well
developed and has been used for many years in the context
of VLSI and wafer-scale integration [7].
We have also found the problem of designing fault-
tolerant circuits in this model to be interesting and non-
trivial from a theoretical viewpoint. Indeed, although we
prove several upper and lower bounds in this paper, we
leave many interesting questions unanswered. For example,
the following ANDOR problem seems very simple, yet it
lies at the heart of designing fault-tolerant circuits for
arbitrary functions using arbitrary gates.
The ANDOR Problem. Suppose that Charles wants
to buy an AND gate for a chip that he is building. When
Charles goes to the Gate Store to buy his gate, he is
informed that up to k OR gates were mistakenly put into the
box of AND gates. Unfortunately, Charles has no tool for
testing which gates are ORs and which are ANDs, and so to
be safe, Charles decides to buy enough gates so that he can
build a circuit that will behave like an AND gate even if up
to k of the gates in the circuit turn out to be ORs. How
many gates does Charles need to buy?
The ANDOR Problem (Alternate form). In this case,
there are no OR gates but up to k of the supposed AND
gates have a short circuit that directly connects one of the
inputs to the output.
In fact, it is fairly easy to argue that the two forms of the
ANDOR problem are equivalent. In this paper, we show
that Charles needs to buy between 0(k(log klog log k))
and O(klog2 3) gates in order to build a reliable AND gate.2
Resolving this gap is one of the central open questions in the
paper. (We also describe a simple 0(k log k) lower bound if
Charles decides to build a leveled circuit. Several researchers
have communicated to us a method for extending this
bound to nonleveled circuits, but in each case, the proposed
solution has a subtle flaw.)
Although the ANDOR problem is phrased as a ‘‘toy’’
problem (after all, Charles does not go to the Gate Store to
buy AND gates), it may be central to understanding fault
tolerance in Boolean circuits. For example, we will show in
the paper that if we need at least g(k) gates for the ANDOR
problem, then we will also need at least g(k) gates to build
any k-fault-tolerant circuit for any input-sensitive function
using any kind of gate (i.e., any function different from a
constant function and a function that always outputs one of
its inputs). For example, if we want to build a parity circuit
out of AND, OR, NOT, PARITY, and MAJORITY gates
that will tolerate k worst-case short-circuit faults, then our
result means that the circuit will need to have at least
0(k(log klog log k)) gates.
We also consider the case of random failures. Typically,
the bounds on circuit size when
(i) each gate fails independently with probability at
most \, and
(ii) the overall circuit is required to function correctly
with probability greater than 1&= can be found by setting
k=log\ = in the bounds for up to k worst-case faults. For
example, the bounds for the probabilistic version of the
ANDOR problem are 0(log\ = (log log\ =log log log\ =))
and O((log\ =)log2 3+1).
We will derive a variety of bounds for various cases,
some of which are analogues to the existing bounds for
the von Neumann faults. The results of the paper can be
summarized as follows: Theorem 4.1, Theorem 5.1,
Theorem 5.2, and Corollary 5.1. (Note that our results for
the ANDOR problem are contained as special cases in these
386 KLEITMAN, LEIGHTON, AND MA
1 This comment applies to most descriptions of the von Neumann fault
model, but it does not apply to that of [10], which contains the most recent
and perhaps the best exposition of the von Neuman fault model. 2 All logarithms in the paper are taken base 2 unless specified otherwise.
File: DISTIL 153103 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6731 Signs: 6071 . Length: 56 pic 0 pts, 236 mm
theorems.) Theorem 4.1 is similar to an upper bound of
Pippenger [8], which is currently the best upper bound for
computing an arbitrary function under the von Neumann
faults. The last equation in Theorem 5.1 is similar to a lower
bound of Ga cs and Ga l [6], which is currently the best
lower bound for the von Neumann faults. We would like to
mention that our criteria for testing the performance of a
circuit in the presence of faults is slightly different from that
traditionally used in the literature, and we believe that our
criteria are better. Details are given in the next section.
The remainder of the paper is organized into sections as
follows. We begin in Section 2 with some definitions. In
Section 3, we study the problem of building reliable AND,
OR, and NOT gates, and we provide our partial solution to
the ANDOR problem. In Section 4, we use the construc-
tions from Section 3 and adapt a theorem of Pippenger
to construct efficient fault-tolerant circuits for any func-
tion. In Section 5, we present two general lower bounds,
Theorem 5.1 and Corollary 5.1, on the size of fault-tolerant
circuits, the latter of which is established by a reduction
from lower bounding the size of a fault-tolerant circuit for
any nontrivial function to lower bounds for the ANDOR
problem (see Theorem 5.2).
2. DEFINITIONS
As defined in the Introduction, a faulty gate in the
short-circuit model always outputs one of its inputs.
Throughout the paper, we will focus on Boolean functions
with a single output, although many of the results can be
extended to functions with multiple outputs. Typically, the
output of a gate in a Boolean circuit may be input to many
other gates. Equivalently, we may imagine that a gate may
have more than one output, but all these outputs must be
identical even when the gate is faulty. Similar assumptions
have been made in the study of the von Neumann fault model.
Another way of viewing such assumptions is as follows:
each gate is assumed to contain a single output that can be
copied as many times as needed, but no fault may occur
during the copy-making procedure. The rationale of this
assumption is that only gates contain logic elements can be
faulty and copy-making can be realized by simpler hardware,
which may be assumed to be fault-free. In the von Neumann
fault model, a circuit without wire failures can be transformed
into a circuit of similar behavior that suffers wire failures [3].
However, this is not true in our short-circuit model. For
example, if wire failures are allowed, then it is not possible
to construct a worst-case fault-tolerant circuit since no
mechanism can prevent the output wire from failing.
The size of a circuit is defined to be the number of gates
in the circuit. For any Boolean function f, we use Sf to
denote the size of the smallest fault-free circuit that com-
putes f with gates from [AND, OR, NOT]. Each ANDOR
gate in the paper has two inputs unless specified otherwise.
An input instance is simply a vector consisting of all values
input to a given circuit. Given an arbitrary circuit, a fault
pattern consists of all the information about which gates in
the circuit are faulty and how they are faulty (i.e., which
input of a faulty gate is short-circuited to the output). In
other words, a fault pattern completely determines the
behaviors of all gates in a circuit, and thus the behavior of
the entire circuit on all input instances.
For worst-case faults, we use k to denote the total number
of faults allowed in a circuit. A circuit C is defined to be
k-fault-tolerant for a function f if C computes f correctly on
all input instances even when C contains k or fewer faulty
gates. We use Sf (k) to denote the size of the smallest k-fault-
tolerant circuit that computes f, with gates from [AND,
OR, NOT] unless specified otherwise.
For random faults, there are two basic models for the dis-
tribution of gate failures: (i) we can assume that each gate is
independently faulty with fixed probability \ or (ii) we can
assume that each gate is independently faulty with variable
probability upper bounded by a fixed \. For ease of
reference, we will refer to these two models as the fixed-
probability and variable-probability models of gate failures,
respectively. As pointed out by Pippenger [8], any circuit
that reliably computes a function f in the variable-probabil-
ity model must reliably compute f in the fixed-probability
model. Therefore, to obtain the strongest result, we will
prove all our upper bounds in the variable-probability
model and prove all our lower bounds in the fixed-probabil-
ity model. Consequently, we will not distinguish between
the two distribution models of gate failures in most
statements of our results.
We use \ to denote either the failure probability of each
gate (as in the fixed-probability model of gate failures) or an
upper bound on the failure probability of each gate (as in
the variable-probability model). As typical for the study of
fault-tolerant circuits, if a gate can fail either with probabil-
ity larger than 12 or with probability smaller than
1
2 , then we
will not even know if we should take majority or minority
of the outputs. As an extreme case, when the failure prob-
ability is exactly 12 , the output being produced is nothing but
a random signal and no meaningful calculation can ever be
carried out. Therefore, as standard in the study of random
faults, we will assume throughout the paper that \ is at most
a constant strictly less than 12 . When a gate G is faulty, we
further assume that an adversary can choose how to short-
circuit the inputs and output(s). We define a circuit C to be
(\, =)-fault-tolerant for a function f if a randomly faulty ver-
sion of C computes f correctly on all input instances with
probability greater than 1&=. We use Sf (\, =) to denote the
size of the smallest (\, =)-fault-tolerant circuit that com-
putes f, with gates from [AND, OR, NOT] unless specified
otherwise.
It is important to note that the probabilistic model of cir-
cuit performance that we use in this paper is somewhat
387RELIABLE BOOLEAN CIRCUITS
File: DISTIL 153104 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6392 Signs: 5503 . Length: 56 pic 0 pts, 236 mm
different from the traditional model used in papers on the
von Neumann fault model. In our probabilistic model of cir-
cuit performance, we say that a circuit works with probabil-
ity greater than 1&= if the circuit computes correctly on all
input instances with probability greater than 1&=. In the
traditional model, however, a circuit is said to work with
probability greater than 1&= if it does so on any input
instance. Another way of viewing the difference between the
two models is as follows: (i) In the traditional model, a test-
ing input instance is selected before the fault pattern is
chosen at random, and the faulty version of the circuit
determined by the fault pattern is considered to be ‘‘good’’
if it computes correctly on this particular testing input
instance (which may be selected by an adversary, who has
no control on the random choice of the fault pattern) even
if this faulty version of the circuit may fail on some other
input instance. (ii) In our model, the fault pattern is ran-
domly chosen at first, and the corresponding faulty version
of the circuit is considered to be ‘‘good’’ only if it computes
correctly on all possible input instances.
Clearly, for upper bound results, our new model of circuit
performance is stronger than the traditional model.
Moreover, it is worth noting that all our lower bounds,
except for the 0(s log1\ s) lower bound in Theorem 5.1 (see
the paragraph before the theorem for the definition of s),
hold even in the traditional model. We prefer to use our
model of circuit performance testing since a circuit used in
practice is desired to correctly compute a function on all
input instances.
It should be noted that some of the upper bounds in the
literature are readily adaptable to the stronger model of per-
formance testing [8] and, indeed, some of our upper bound
proofs are heavily based upon the techniques developed for
the weaker model of performance testing. However, the
point here is that, to the best of our knowledge, such a sub-
tle distinction in performance testing was never formally
formulated. This may be explained by the following obser-
vation: although the stronger model of performance testing
is fairly natural in our short-circuit fault model, it appears
quite unnatural in the von Neumann fault model. That is, it
is unlikely that all faulty gates in a circuit can consistently
compute wrong outputs on all possible inputs.
Finally, we remark that most of our theorems contain
two parts: one for worst-case faults and the other for ran-
dom faults. The part for random faults is typically similar to
the part for worst-case faults, with k in the case of worst-
case faults replaced by log\ = in the case of random faults.
This is not a coincidence, and some explanation is given in
the following simple claim.
Claim 2.1. If a circuit C is (\, =)-fault-tolerant in the
variable-probability model of gate failures, then C is also
(log\ =)-fault-tolerant.
Proof. If C is not (log\ =)-fault-tolerant, then there
exists a set A of at most log\ = gates, the failure (in a certain
pattern) of which will result in the failure of C. Then, if we
set each of the gates in A to fail with probability \ and set
each of the gates not in A to fail with probability 0, then C
will clearly fail with probability at least \log\ ===. (Recall
that once a gate is set to be faulty, its behavior is determined
by an adversary.) This contradicts our assumption that C is
(\, =)-fault-tolerant. K
As a consequence of Claim 2.1, most of the upper bounds
in the paper only need to be proved for random faults, in the
variable-probability model. However, it is usually easier to
present and understand a proof for worst-case faults, and
the corresponding proof for random faults is often similar
but slightly more complicated. Hence, we sometimes give a
proof for worst-case faults first and then extend it to random
faults.
As another consequence of Claim 2.1, a lower bound on
worst-case faults implies a corresponding lower bound
for random faults under the variable-probability model.
However, since we are interested in proving lower bounds
for random faults under the fixed-probability model, we
need to be careful when extending our worst-case-fault
lower-bound proofs to random faults. (Note that in the
proof of Claim 2.1, we need to set each of the gates not in A
to always work, which is not possible in the fixed-probabil-
ity model.)
3. CONSTRUCTING RELIABLE GATES FROM
UNRELIABLE GATES
In this section, we present upper and lower bounds for
constructing reliable AND, OR, and NOT gates from
unreliable gates. The lower bound in Theorem 3.2 is poten-
tially the most important result in the paper since it will be
extended in Section 5 into a general lower bound for any
nontrivial function (see Corollary 5.1). As a corollary of the
upper bound result, we show that for any given function f,
there exists a worst-case-fault-tolerant circuit that computes
f and there exists a random-fault-tolerant circuit that com-
putes f with arbitrarily high success probability.
3.1. Upper Bounds
Theorem 3.1. For f =AND, OR, or NOT, Sf (k)=
O(klog2 3) and Sf (\, =)=O((log\ =)log2 3+1).
Proof. We will first prove the theorem for worst-case
faults, and then extend the proof to random faults.
For worst-case faults, we first prove SAND(k)=O(klog2 3)
by an inductive construction of a k-fault-tolerant circuit for
AND. Without loss of generality, we assume that k+1 is an
integral power of 2, in which case, our construction has size
exactly (k+1)log2 3.
388 KLEITMAN, LEIGHTON, AND MA
File: 571J 153105 . By:XX . Date:03:12:97 . Time:11:32 LOP8M. V8.0. Page 01:01
Codes: 5539 Signs: 4213 . Length: 56 pic 0 pts, 236 mm
Base case. k=1. Our circuit contains three AND gates
arranged in two levels as in Fig. 1. At the first level, two
AND gates are applied to both inputs. The outputs of these
two gates are input to a gate at the second level whose out-
put serves as the output of the whole circuit. To show that
the circuit can tolerate any single fault, we only need to
show that the circuit outputs zero when at least one of the
inputs is zero. (If the inputs are both ones, then the restric-
tion on the fault forces the circuit to output the correct value
one.) The single fault can be either at the second level or at
the first level. In the former case, since the two inputs to the
last gate (the one at the second level) are both zeros, the
final output of the circuit must be zero. In the latter case,
since one of the two gates at the first level is correct, one of
the two inputs to the last gate is zero. Hence, the output of
the circuit must be zero since the last gate must be correct.
Inductive step. In the construction given for the base
case, replace each of the three gates with a ((k+1)2&1)-
fault-tolerant circuit for AND. It is easy to show that the
resulting circuit is k-fault-tolerant by using the argument for
the base case.
The size of the circuit thus constructed is determined by
the following recurrence with the boundary condition
Sf (1)=3,
Sf (k)=3Sf \k&12 +
for k=2i&1 with i2 being an integer. Solving the
recurrence, we get Sf (k)=(k+1)log2 3 for k+1 being an
integral power of 2. For an arbitrary k not an integral power
of 2, Sf (k)Sf ((2wlog kx+1)log2 3)3 } k log2 3. This proves
SAND(k)=O(k log2 3). The equality SOR(k)=O(k log2 3) can
be established in the same fashion.
To construct a k-fault-tolerant circuit for NOT is slightly
more complicated, and we will make use of our worst-case-
fault-tolerant circuits for AND and OR. For any constant c,
FIG. 1. A 1-fault-tolerant circuit for the AND function of two variables.
let MAJ(c) be the majority function of c variables. Given the
established results for AND and OR and the fact that
MAJ(c) can be computed using only AND and OR gates,
we obtain
SMAJ(c)(k)=O(klog2 3) (1)
for any constant c, where the constant behind the O-nota-
tion is dependent on c. We next inductively construct a
k-fault-tolerant circuit for NOT.
Base case. k2. We first apply five NOT gates to the
input. Then, by equality (1), we can use a constant-size
2-fault-tolerant circuit for MAJ(5) to compute the majority
of the five values output from the NOT gates. Since the cir-
cuit may contain at most two faults, the correct negation of
the original input should be output from at least three of the
five NOT gates. Finally, this correct negation must be out-
put from the 2-fault-tolerant circuit for MAJ(5). The entire
circuit has constant size, establishing
SNOT(k)=O(1) for k2. (2)
Inductive step. We first apply five Wk3X-fault-tolerant
circuits for NOT to the input. When there are at most k
faults, at least three of the five outputs from the five circuits
will contain the correct negation of the original input. Then,
we apply a k-fault-tolerant circuit for MAJ(5), which has
size of O(klog2 3) according to equality (1). The size of the
k-fault-tolerant circuit for NOT thus constructed is deter-
mined by the following recurrence with boundary condition
given in Eq. (2):
SNOT(k)=5 SNOT \k3 |++O(k log2 3).
Since log3 5<log2 3, this recurrence yields SNOT(k)=
O(klog2 3), which completes the proof for SNOT(k).
We have thus proved the theorem for worst-case faults.
Next, we extend the proofs to the case of random faults.
Note that we have included the ‘‘+1’’ term in the formula
for random faults so that it holds when log\ = goes to 0. To
establish the formula, however, we only need to prove it for
=\ (3)
since otherwise we can simply use a single gate as our
circuit.
For the ANDOR function, we will use our previously
described k-fault-tolerant circuit for ANDOR with some
k=O(log\ =). A proof that such a circuit is indeed (\, =)-
fault-tolerant is given below.
Let =1=\, and for i=2 l, l1, let =i be an upper bound on
the failure probability of the (i&1)-fault-tolerant circuit for
ANDOR that is constructed above. It is important to note
that we are working with a variable failure probability
389RELIABLE BOOLEAN CIRCUITS
File: DISTIL 153106 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6190 Signs: 3966 . Length: 56 pic 0 pts, 236 mm
model and, hence, =i is only an upper bound on the failure
probability, not the probability itself. We next show that
=2i3=2i &2=
3
i . (4)
Recall that we have three (i&1)-fault-tolerant circuits in
our recursive construction of the (2i&1)-fault-tolerant cir-
cuit. Let \i1 , \i2 , \i3 denote the true failure probabilities
(not just an upper bond) of the three involved (i&1)-fault-
tolerant circuits. Again, we emphasize that \ij may be dif-
ferent for different j, since we are working with a variable
probability model. Moreover, \ij cannot be solely deter-
mined by \ since we allow various failure probabilities of a
gate as long as the probabilities are upper bounded by \.
Nevertheless, \ij is completely determined by the true failure
probability of each gate in the circuit, and the existence of
\ij is enough for us to establish inequality (4). Clearly,
=2i\i1 \i2(1&\i3)+\i2 \i3(1&\i1)
+\i3\i1(1&\i2)+\i1\i2\i3 . (5)
The derivative with respect to \i3 of the right-hand side of
inequality (5) is a nonnegative term \i1(1&\i2)+
\i2(1&\i1). Hence, the right-hand side of inequality (5) is
nondecreasing in \i3 . Similarly, it is nondecreasing in \i2
and \i1 . Thus, inequality (4) follows from the fact \ij=i for
all j.
Straightforward calculation shows that
3=2&2=3= for all = 12.
This, together with inequality (4) and =1=\< 12 , implies
=2i=i for all i. (6)
Since \ is assumed to be at most a constant strictly less
than 12 , we know that =1=\
1
2&$$ for some fixed $$>0.
Without loss of generality, we may assume $$ 14
(otherwise, replace $$ with 14). Now, if
1
4=i=
1
2&$ for
any $, then by inequality (6) and the definition of $$ we have
$$$ 14 . Hence, by inequality (4), it is easy to check that
=$2i 3( 12&$)
2&2( 12&$)
3
= 12&
3
2$+2$
3
 12&
11
8 $
=i& 38$$.
The proceeding analysis means that
=i0min[
1
4 , \], (7)
where i0=1 if \< 14 , or otherwise i0=2
t0 is a constant
depending on $$. Let hi=3=i . Then, by inequalities (4) and
(7),
h2ih2i ; hi0min[
3
4 , 3\].
This recurrence yields
3=2t=h2th2
t&t0
2t0 (min[
3
4 , 3\])
2t&t0.
To achieve success probability greater than 1&=, we
need =2t<=, which can be guaranteed by letting
(min[ 34 , 3\])
2t&t0<3=. The last inequality holds if
2t2t0
log\ 3=
log\(min[ 34 , 3\])
. (8)
On the other hand, by taking the derivative with respect to
= and by using inequality (3), it is easy to check that
c log\ =
log\ 3=
log\(min[ 34 , 3\])
(9)
for some sufficiently large constant c. By inequalities (8) and
(9) for some i=2t=3(log\ =), the (i&1)-fault-tolerant
AND or OR circuit that we have constructed is indeed
(\, =)-fault-tolerant. By our result for worst-case faults,
the size of the (i&1)-fault-tolerant circuit is O(i log2 3)=
O((log\ =)log2 3).
We now construct a (\, =)-fault-tolerant circuit for the
NOT function. Let =0 be a sufficiently small constant such that
=0.12.90 <
1
20 . (10)
Note this will ensure
\53+ (=12.9)3<
=
2
for all =<=0 . (11)
If ==0 , then we construct the desired circuit as follows:
(i) Take c, a constant depending on \ and =0 , negations of
the input; (ii) Apply a constant size (\, =0 2)-fault-tolerant
majority circuit to these negations, which can be con-
structed by using the random-fault-tolerant circuits for
ANDOR that we have just described. By a standard
application of the Chernoff bound [1, Theorem A.4, p. 235]
and our assumption that \ is at most a constant strictly less
than 12 , we can easily see that when c is sufficiently large,
with probability greater than 1&=0 2, most of the outputs
from the NOT gates applied to the original input are the
correct negation of the input. Hence, the whole circuit is
(\, =0)-fault-tolerant for the NOT function, and so we have
S(\, =0)=O(1). (12)
If =<=0 , then we adapt the construction for worst-case
faults as follows. We first apply five (\, =12.9)-fault-tolerant
circuits for NOT. Then, we use a (\, =2)-fault-tolerant cir-
cuit for MAJ(5), which can be built from the random-fault-
tolerant ANDOR circuits that we have described. The size
of the whole circuit is determined by the recurrence
S(\, =)=5S(\, =12.9)+O((log\ =)log2 3),
390 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153107 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6543 Signs: 5527 . Length: 56 pic 0 pts, 236 mm
with boundary condition given in Eq. (12). This recurrence
has a solution of O((log\ =)log2 3) since log2.9 5<log2 3. The
circuit thus constructed fails to work only if (i) at least three
of the five (\, =12.9)-fault-tolerant NOT circuits fail and
(ii) the (\, =2)-fault-tolerant circuit for MAJ(5) fails. Event
(i) happens with probability less than ( 53)(=
12.9)3<=2
(by inequality (11) and our assumption =<=0), and event
(ii) happens with probability less than =2. Thus, the whole
circuit fails to work with probability less than =. K
We would like to point out that circuits (with either
worst-case or random faults) for NOT as discussed in the
above theorem can also be constructed using the so-called
r-MAJ circuits (here and in the rest of this paragraph, r= 165)
described in Lemma 4.3, whose correctness is independent
of Theorem 3.1. We first use sufficiently many (ck for worst-
case faults and c log\ = for random faults, where c is a suf-
ficiently large constant) NOT gates. Then, we use a (\, =c)-
fault-tolerant circuit described in Lemma 4.3 to compute the
r-MAJ function of the values output from the NOT gates. It
is straightforward to see by Lemma 4.3 that such circuits
have the desired sizes. For worst-case faults, the circuit is
clearly k-fault-tolerant. For random faults, we need to be
careful with the probabilistic analysis. Without loss of
generality, we assume
=\< 12 , (13)
since otherwise we can use a single NOT gate. If \(r6)2,
then Lemma 4.1 (whose correctness is independent of
Theorem 3.1) implies that with probability at least
1&(6\r)(r2) c log\ =1&\ (r4) c log\ =>1&=2, where we
obtain the second inequality by inequality (13) and by
assuming c>8r, at most an \+r2r fraction of the
values produced by the NOT gates at the first level will be
the incorrect negation of the original input. In addition, by
inequality (13), the (\, \c)-fault-tolerant circuit for r-MAJ
fails with probability less than =2. Thus, the whole circuit
will compute NOT correctly with probability greater than
1&=. However, if \ is close to 12 or even if \ is very close to
r, it is not true that with high probability, at most an r frac-
tion of the values produced by the NOT gates at the first
level will be the incorrect negation of the original input. To
overcome this problem, we first construct a constant size
(\, \0)-fault-tolerant circuit for NOT where \0=(r6)2, and
then we use it as a basic component, as a NOT gate of
failure probability at most \0 , in the overall circuit. By the
fact that \ is at most a constant strictly less than 12 and by
using the Chernoff bound [1, Theorem A.4, p. 235], it is
easy to see that the desired (\, \0)-fault-tolerant circuit for
NOT can be constructed by taking the majority (as opposed
to r-majority) of a large (constant) number of negations of
the original input, with the circuits for ANDOR that we
have described.
The simple recursive nature of the constructions used in
the proof of Theorem 3.1 seems to suggest that a stronger
upper bound might be attainable by a similar approach with
a larger base case. Unfortunately, we have not been able to
find a better base upon which to build a better bound.
Improving this bound is one of the more intriguing ques-
tions left open in the paper. Resolution of this problem will
have many applications. For example, as a direct applica-
tion of Theorem 3.1, we immediately get the following
corollary, which implies that, for any function, it is possible
to construct circuits that are tolerant to worst-case faults as
well as circuits that have arbitrarily high success probability
in the case of random faults. We will show how to improve
upon this result in Section 4.
Corollary 3.1. For any function f, Sf (k)=
O(klog2 3 Sf) and Sf (\, =)=O((log\(=Sf))log2 3 Sf+Sf).
3.2. Lower Bounds
The main result of this section is the following theorem,
which provides a partial answer to the ANDOR problem
described in the Introduction. In Section 5, we will see that
the same lower bounds can be extended to all nontrivial
functions.
Theorem 3.2. For f =AND, OR, or NOT, Sf (k)=
0(k(log klog log k)) and Sf (\, =)=0(log\ = (log log\ =
log log log\ =)).
Remark. In the proof of Theorem 3.2, unlike in most
parts of the paper, we will not assume that the unreliable
gates used in the circuit have bounded fan-in. Hence, the
theorem holds even if gates with unbounded fan-in are
allowed in the circuit.
If we make the plausible assumption that an optimal cir-
cuit for AND or OR is well-leveled (i.e., all directed paths in
the circuit from the output to any input contain the same
number of gates), then we can prove a slightly better lower
bound of 0(k log k) or 0(log\ = log log\ =) on the size of
fault-tolerant circuits for AND or OR. In what follows, we
give such an argument for computing AND with worst-case
faults; the argument for OR and the argument for the case
of random faults are similar.
By Theorem 5.2 (whose correctness is independent of
Theorem 3.2) with f =AND and s=2, we only need to con-
sider a monotone circuit. Since an optimal monotone circuit
cannot contain any gate that is connected to only one of the
two inputs by a directed path, every gate in the circuit is
dependent on both of the inputs. We consider the case
where one of the two input values is 0 and the other is 1.
Clearly, the first level (the level closest to the inputs) must
contain more than k gates since otherwise an adversary
could force all gates at this level to output 1. In general, the
i th level must contain more than ki gates since otherwise an
391RELIABLE BOOLEAN CIRCUITS
File: DISTIL 153108 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6629 Signs: 5663 . Length: 56 pic 0 pts, 236 mm
adversary could short-circuit from the one input to all gates
at the i th level by using at most k faults. On the other hand,
the circuit must have more than k levels since otherwise an
adversary could short-circuit from the one input all the way
to the final output by using at most k faults. Summing up
the numbers of gates at all levels, we get a lower bound of
k( 11+
1
2+ } } } +1k)=0(k log k).
Unfortunately, there seems to be no simple way to get
around the problem caused by the possibility that an
optimal circuit may not be well-leveled, and no simple
definition of a level seems to enable us to ‘‘wipe out’’ an
entire level with reasonably many faults. Moreover, this
technique does not seem to be applicable to proving lower
bounds for NOT. So we will be forced to use a more com-
plicated argument to prove Theorem 3.2.
Proof of Theorem 3.2. We first prove the lower bound
for worst-case faults. Assume that C is a k-fault-tolerant cir-
cuit for f with the minimum size Sf (k). The distance from a
register (i.e., an input or output wire to a gate) r in C to an
input of C is defined to be the number of gates in the shor-
test (directed) path from r to the input. The level of a register
r is defined to be the minimum of the two distances from r
to the two inputs of the circuit. The level of a gate is defined
to be the level of the output register of the gate. (Recall that
a gate is assumed to have only one output, which may be
copied many times.) For example, the level of a gate that is
directly connected to either input is equal to 1.
For any fixed ik, we set the behavior of the gates below
level i in a way such that every output at level i functions as
if it is just a copy of either input, as follows. List all of the
gates at level i (in an arbitrary order) as G1 , G2 , ..., Gni ,
where ni denotes the number of gates at level i. First, we take
the shortest (directed) path from G1 to either of the inputs
and short-circuit all the gates along this path. This causes
the output register of G1 to function like a copy of an input.
In general, after finishing this procedure for up to Gj&1 , we
proceed with Gj as follows. Take the shortest (directed) path
from Gj to either of the inputs. From the output side to the
input side along this path, we short-circuit each of the gates
one by one and stop immediately when we find a gate that
has been short-circuited by the procedure on some Gl with
l j&1. The whole procedure costs at most ini faults.
Now, every register at level i functions like a copy of an
input. By the definition of a level, the gates at level i form a
cut in circuit C that separates the output from both of the
inputs. (Note that the level of the output must be at least
k+1, which is strictly larger than i, since otherwise we
could short-circuit from an input to the output by using at
most k faults.) So the part of C beyond level i, including the
output of the whole circuit, will compute only based on the
information contained in the registers at level i. When
f =NOT, the part of C beyond level i must be a (k&ini)-
fault-tolerant circuit for NOT, since all of the registers at
level i function like copies of the original input. Hence, for
all i<k,
Sf (k)Sf (k&ini)+n1+n2+ } } } +ni , (14)
where we assume Sf ( j)=0 for any j<0. When f =AND or
OR, all the registers at level i can be divided into two sets X
and Y, the set of registers that function like copies of an
input x and the set of registers that function like copies of
the other input y, respectively. If either X or Y is empty, then
the faulty version of C with ini or fewer faults cannot com-
pute correctly since it is impossible for C to compute the
AND or OR of x and y without knowing both x and y.
Therefore, ini>k and recurrence (14) holds (again, we
assume Sf ( j)=0 for any j<0). If neither X nor Y is empty,
then the part of C beyond level i must be a (k&ini)-fault-
tolerant circuit for f. Again, recurrence (14) holds. In sum,
recurrence (14) is true for f =AND, OR, or NOT.
In order for recurrence (14) to enforce a solution that is
superlinear in k, we next show that
n1>k. (15)
We will focus on an arbitrary input x to function f.
A register r is called an x-register if r is simply a direct copy
of input x without passing any gate (i.e., r is connected with
x and is at level 0); r is called a non-x register otherwise.
Since C is the smallest k-fault-tolerant circuit, any AND or
OR gate of C having an x-register as an input must have a
non-x register as another input. (Otherwise, some gate with
all its inputs being an x-register can be replaced by a direct
copy of x, resulting in a smaller circuit of the same
behavior.) Based on this fact, we next construct two faulty
versions of C as follows. First, we disconnect input x from
every AND or OR gate containing an x-input by short-cir-
cuiting the gate with (one of) its non-x input register(s). Let
C1 be the resulting faulty version of C. Starting from C1 , we
construct C2 by short-circuiting every NOT gate whose
(unique) input is an x-register. (C2 is identical to C1 if C does
not have any NOT gate containing an x-input.) Now, the
behavior of C1 with x=0 is the same as that of C2 with x=1.
This means that either C1 or C2 does not compute correctly
since the correct value of f # [AND, OR, NOT] cannot be
obtained without knowing the value of x, at least on some
input instance. On the other hand, C1 and C2 each contains
at most n1 faults. By the assumption that C is k-fault-
tolerant, we conclude that inequality (15) holds.
Given inequalities (14) and (15), we next prove
Sf (k)
k log k
2 log log(8k)
for k1. (16)
We first observe that for any real x1,
log x
log log(8x)
=
log x
(log x+3)
}
log(8x)
log log(8x)
.
392 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153109 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 4214 Signs: 1965 . Length: 56 pic 0 pts, 236 mm
By taking the derivative, we can easily check that each of the
two terms on the right-hand side of the preceding equation
is an increasing function of x1. Thus, we immediately
have the following.
Claim 3.1. For x1, log xlog log(8x) is an increasing
function.
We now prove inequality (16) by induction on k. The
base case where k26 can be verified by the fact Sf (k)k.
In particular, by Claim 3.1, we only need to prove
1log(26)(2 log log(8 } 26)), which apparently holds.
Assuming inequality (16) holds for up to k&1, we next
prove inequality (16) for k>26. Let
:=
log log k
log k
.
There are two cases:
Case 1. nii :&1klog k for all i such that 1i<k. In
this case, we have
Sf (k)
k
log k
:
k&1
i=1
i :&1 (by recurrence (14))

k
log k |
k
1
x:&1dx (since :&1<0)

k
log k
k:&1
:
=
k
log k
log k&1
log log k
log k
(by the choice of :)

k log k
2 log log k
(since log k2)

k log k
2 log log(8k)
.
Case 2. There exists i with 1i<k such that ni<
i :&1klog k. Choose the smallest i such that ni<i :&1k
log k. Then,
Sf (k)Sf (k&ini)+ :
i
j=1
nj
(by recurrence (14))
Sf \ k& i
:k
log k |++k+
k
log k
:
i&1
j=2
j :&1
(by inequality (15) and the choice of i)
Sf (Wk(1&;)X)+k+
k
log k |
i
2
x:&1 dx
\where ;= i
:
log k+
Sf (Wk(1&;)X)+k+
k
log k
i :&2:
:
Sf (Wk(1&;)X)+k+
;k log k
log log k
&
2:k
log log k
(by the definitions of : and ;).
Let f (:, ;, k) denote the right-hand side of the above
inequality. To complete the analysis of Case 2, we need to
show that
f (:, ;, k)
k log k
2 log log 8k
. (17)
If ;>1&1k, then inequality (17) follows from the fact
that Sf (x)0 for all x and 2:2log log k. If ;1&1k,
then k(1&;)1. Moreover, for k>26, Wk(1&;)X
Wk(1&1log k)XWk(1&1k)X=k&1. Hence, we can
apply the induction hypothesis to obtain
f (:, ;, k)
Wk(1&;)X log(Wk(1&;)X)
2 log log(8Wk(1&;)X)
+k
+
;k log k
log log k
&
2:k
log log k

k(1&;) log(k(1&;))
2 log log(8k(1&;))
+k
+
;k log k
log log k
&
2:k
log log k
(by Claim 3.1)

k(1&;) log k
2 log log(8k)
+
k(1&;) log(1&;)
2 log log(8k)
+k
+
;k log k
log log k
&
2:k
log log k
(since log(k(1&;))0 and
0log log(8k(1&;))log log(8k))

k(1&;) log k
2 log log(8k)
+
;k log k
log log k
+k
+
k(1&;) log(1&;)
2 log log k
&
2:k
log log k
(note log(1&;)<0)
393RELIABLE BOOLEAN CIRCUITS
File: DISTIL 153110 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6418 Signs: 5212 . Length: 56 pic 0 pts, 236 mm
\(1&;)2 +;+
k log k
log log 8k
+\log log k&log e2e &2:+
k
log log k
\since x log x&log ee for x>0+
\(1&;)2 +;+
k log k
log log(8k)
(since k>26 and :1)

k log k
2 log log(8k)
.
This proves inequality (17) and concludes the inductive
proof of inequality (16). It is worth noting that the
3(k(log klog log k)) bound is tight for recurrence (14) with
boundary condition given in inequality (15). A proof of this
fact is given in Appendix A. Hence, improvement in the
lower bound cannot be obtained through recurrence (14).
We have thus proved the theorem for worst-case faults.
We next extend the proof to random faults. We only need to
establish: (i) an analogue to inequality (14) with k replaced
by log\ = and Sf (m) interpreted as the minimum size of a
(\, \m)-fault-tolerant circuit for f, where \ is assumed to be
a fixed value and does not appear as a parameter of Sf and
Sf (m) is assumed to be 0 for m<0; (ii) an analogue to
inequality (15) with k replaced by log\ =.
The analogue to inequality (14) follows directly from the
argument for establishing inequality (14), except that when-
ever proper, we need to replace the expressions k-fault-
tolerant, (k&ini)-fault-tolerant, and k by (\, =)-fault-
tolerant, (\, =\&ini)-fault-tolerant, and log\ =, respectively.
To establish the analogue to inequality (15), we use the
argument for establishing inequality (15), except that we
cannot use the constructions of C1 and C2 directly. In the
constructions of C1 and C2 , we have let all of the gates that
do not have an x-register as an input be correct, but this is
not possible to do with high probability under our fixed-
probability model. Instead, we define a fault pattern to be
C1-type (resp., C2-type) if the fault pattern is identical to
that of C1 (resp., C2) on all gates with an x-register being an
input. On the other hand, we can group all fault patterns
according to their requirements on all gates that do not
have an x-register as an input; two fault patterns belong to
the same group if they are identical on all gates that do not
have an x-register as an input. Assuming by way of con-
tradiction that the desired analogue does not hold, we now
show that a random fault pattern will lead to an incorrect
faulty version of C with probability at least =. It suffices to
prove that for any fault-pattern group G, a fault pattern ran-
domly chosen from G leads to an incorrect faulty version of
C with probability at least =. Clearly, a fault pattern ran-
domly chosen from group G is C1-type with probability at
least (min[\, 1&\])n1(min[\, 1&\])log\ ==, where we
have used the fact that \< 12 and that once a gate is set to be
faulty the particular behavior of the faulty gate is decided by
an adversary. Similarly, such a fault pattern is C2-type with
probability at least =. On the other hand, either the unique
C1-type fault pattern or the unique C2-type fault pattern
within group G determines an incorrect faulty version of C,
since the two corresponding faulty versions of C behave the
same on x=0 and x=1. We thus conclude that a random
fault pattern within group G leads to an incorrect faulty
version of C with probability at least =, as desired. K
4. A GENERAL UPPER BOUND
In this section, we use our results from Section 3 and
adapt a theorem of Pippenger [8] to construct efficient
fault-tolerant circuits for any function. Under the von
Neumann fault model with both \ and = being constants
(and =c } \ for some constant c), a classic theorem states
an upper bound of O(log Sf) on the redundancy needed to
construct (\, =)-fault-tolerant circuits for any function f.
This result was first argued heuristically by von Neumann
[16], then proved by Dobrushin and Ortyukov [4] by a
probabilistic construction, and later proved by Pippenger
[8] by an explicit construction. The next theorem is an
analogue to Pippenger’s theorem, and it provides general
constructions of fault-tolerant circuits for any function f. To
prove the theorem, we will construct a circuit for a certain
majority-like function in Lemma 4.3, which may be of inde-
pendent interest.
Theorem 4.1. Sf (k)=O(kSf+k log2 3) and Sf (\, =)=
O((log\(=Sf)+1) Sf+(log\ =)log2 3).
We remark that, unlike most of the theorems in the paper,
the result for random faults in this theorem is not obtained
from that for worst-case faults by replacing k with log\ =.
Also, we have included the ‘‘+1’’ term in the second expres-
sion so that the theorem holds when \<<=Sf .
To prove the theorem, we need a few lemmas. Lemma 4.1
is one of the Chernoff bounds, which we state below for ease
of reference. Lemma 4.2 states that there exists a linear-size
reliable circuit that amplifies the number of vast-majority
values among all inputs. This lemma is essentially due to
Pippenger [8], and Assaf and Upfal [2]. (Although
Pippenger [8] and Assaf and Upfal [2] worked with random
faults under the von Neumann fault model, their methods
are readily applicable to random faults under the short-cir-
cuit model. Once the result for random faults is established,
the result for worst-case faults follows immediately from
Claim 2.1.) To prove the theorem, we also need to compute
some majority-like functions of 3(k) (or 3(log\(=Sf)))
inputs. Unfortunately, as we will show in Theorem 5.1, any
394 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153111 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6971 Signs: 5870 . Length: 56 pic 0 pts, 236 mm
k-fault-tolerant (resp., (\, =)-fault-tolerant) circuit for the
majority function of 0(k) (resp., 0(log\(=Sf))) inputs must
contain 0(k2) (resp., 0(log\ = log\(=Sf))) gates. So instead
of computing the majority function, we will compute a func-
tion that outputs the majority value provided that a large
(constant much larger than 12) fraction of the inputs are all
zeros or all ones. To be more precise, for any r< 12 , let
r-MAJ(n) be a function with n inputs that outputs one if at
most rn of the inputs are zeros and that outputs zero if at
most rn of the inputs are ones (we do not care how the func-
tion behaves if neither the number of zero inputs nor the
number of one inputs is less than rn). Lemma 4.3 gives an
efficient fault-tolerant circuit for r-MAJ with r less than a
sufficiently small constant. We remark that although fault-
tolerant circuits for some r-MAJ function were designed in
[2, 8], their circuits have larger size than we need here.
Lemma 4.1. Suppose X1 , X2 , ..., Xn are mutually inde-
pendent random variables such that for all i, (i) Xi # [0, 1],
and (ii) Pr(Xi=1)\. Then, Pr(1in Xi\n++n)
(e\+)+n.
Proof. Without loss of generality, we may assume
+e\ and Pr(Xi=1)=\ for all i. Hence, by Theorem A.12
on page 237 of [1], Pr(1in Xi\n++n)e&\n(e(+
\+1))(+\+1) \n(e\+)+n+\n(e\+)+n. K
Lemma 4.2. For r= 165 , there exists an explicit O(m)-size
circuit C with m inputs and m outputs such that
(i) if C contains m or fewer faults, then C outputs at most
(r6) m zeros (resp., ones) when at most rm of the inputs are
zeros (resp., ones);
(ii) if each gate in C is independently faulty with prob-
ability at most \, then with probability greater than 1&\2m,
C outputs at most (r6) m zeros (resp., ones) on all input
instances with at most rm zero inputs (resp., one inputs).3
Proof. By Claim 2.1, we only need to prove the lemma
for random faults. We do this by adapting Pippenger’s com-
pressor-based argument for the construction of Part B (the
‘‘restoring organ’’) in the proof of Theorem 3.1 of [8].
An (n, d, :, ;)-compressor is defined to be a d-regular
bipartite multigraph of n input nodes and n output nodes
with the following property: for any set A containing at
most :n input nodes, at most ;n output nodes are connected
to at least d2 input nodes in A.
By Corollary 3.1, we may assume without loss of
generality that m is at least a sufficiently large constant.
Under such an assumption, it is easy to see that there exists
m$( 164&
1
65) m such that m+m$= p
2 for some integer p.
Let G be a ( p2, 817, 164 ,
1
512)-compressor, whose existence
is proved in Lemma 3.2 of [8]. G naturally defines a
Boolean circuit C with size O( p2)=O(m): each input node
(resp., output node) of G corresponds to an input (resp.,
output) of C; for each output node x of G, assign a circuit
(of 3(1) size) in C that computes the majority of the 817
inputs whose corresponding input nodes in G are connected
to x. (Ties may be broken arbitrarily when the number of
zeros is exactly 8172.) To make the described one-to-one
correspondence possible we need to address one technical
problem: we need C to have exactly m (as opposed to p2)
inputs and m outputs, but G has p2 input nodes and p2 out-
put nodes. To ensure that C will have the desired number of
inputs and outputs, we do the following modification in C:
(i) choose an arbitrary set of m$ inputs from the m original
inputs and make a second copy for each of them; (ii) we first
let C have p2 outputs and then abandon an arbitrary set of
m$ outputs (i.e., we simply ignore them). Step 1 ensures that,
after the copy-making, there will be exactly p2 inputs (or
input copies) in C, which can be made to correspond to the
input nodes in G, and Step 2 ensures that C will have exactly
m outputs in the end.
After the copy-making on the input side, at most
rm+m$ 164m
1
64p
2 of the inputs (or input copies) are
zeros (resp., ones) when at most rm of the original inputs are
zeros (resp., ones). Therefore, when all of the constituent
majority circuits work correctly, by the defining property of
a ( p2, 87, 164 ,
1
512)-compressor, at
1
512 p
2m500 of the out-
puts of C$ are zeros (resp., ones).
To obtain the desired fault-tolerance properties, we need
to modify C a little further. In particular, we replace each of
the majority circuits in C by a (\, \c)-fault-tolerant majority
circuit, where c is a constant greater than 16,000. Each such
circuit can be constructed to have constant size, according
to Corollary 3.1. So this modification only increases the size
of C by a constant factor.
By Lemma 4.1 and by the choice of c, with probability at
least 1&(4000e\c)m4000>1&\cm8000>1&\2m, at most
\cm+m4000m2000 of all the (\, \c)-fault-tolerant
majority circuits can fail. The failure of these circuits may
change the values of at most m2000 outputs. In addition to
the m500 or fewer zeros (resp., ones) output from C even in
the fault-free case (assuming that at most rm of the inputs
are zeros (resp., ones)), this means that with probability
greater than 1&\2m, C will output at most m500+
m2000=m400<(r6) m zeros (resp., ones) on all input
instances with at most rm 0 inputs (resp., 1 inputs).
We have thus completed the proof of the lemma. Finally,
we would like to point out that the lemma can also be
proved in a way similar to that of Theorem 4 in [2], where
Assaf and Upfal used an expander-based construction to
build a fault-tolerant comparator network that amplifies the
395RELIABLE BOOLEAN CIRCUITS
3 The assumption of this lemma in the conference version (where :,
corresponding to r in the current statement, is assumed to be ‘‘less than a
sufficiently small constant’’) is misstated: the size of the desired circuit is
dependent on :, which is not necessarily a constant according to the
assumption made in the conference version. However, the difference has no
consequence in the remainder of either version of the paper.
File: DISTIL 153112 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 5726 Signs: 4272 . Length: 56 pic 0 pts, 236 mm
fraction of vast-majority values among all inputs. This is
possible since, on Boolean inputs, an unreliable comparator
in Assaf and Upfal’s fault model can be simulated by one
AND gate and one OR gate in our short-circuit fault
model. K
In the above proof, we did not even try to optimize the
constants involved. But it is possible to improve these con-
stants by using the expanders of Lubotzky, Phillips and
Sarnak (see [11]).
Lemma 4.3. For r 165 , Sr-MAJ(k)(k)=O(k
log2 3) and
Sr-MAJ(m)(\, \m)=O(mlog2 3).
Proof. By Claim 2.1, we only need to prove the lemma
for random faults. By the definition of r-MAJ, if a circuit is
(\, =)-fault-tolerant for r-MAJ(m), then it is certainly (\, =)-
fault-tolerant for r$-MAJ(m) for any r$r. So we will
assume without loss of generality that
r= 165 .
We will first inductively construct a desired circuit for
r-MAJ(m), and then we argue that the circuit is (\, \m)-
fault-tolerant.
The base case where m is a constant can be handled
directly by a circuit of Corollary 3.1, which has size
S(3(1))=O(1). (18)
So in what follows, we will assume m29 log2 20, which
implies
\(0.12.9) m< 120 . (19)
Our recursive construction consists of the following four
steps:
Step 1. Apply a circuit of Lemma 4.2 to all of the m
inputs.
Step 2. Make a second copy for each of the first
5Wm2.9X&m outputs from Step 1. All these copies,
together with the m original outputs from Step 1, form a set
of 5Wm2.9X registers. Arbitrarily divide these 5Wm2.9X
registers into five groups of size Wm2.9X.
Step 3. To each group of Wm2.9X registers produced
in Step 2, apply a (\, \Wm2.9X)-fault-tolerant circuit for
r-MAJ(Wm2.9X).
Step 4. Apply a (\, \m+2)-fault-tolerant circuit of
Corollary 3.1 that computes the majority of the five outputs
from Step 3.
Clearly, the size of the whole circuit thus constructed is
determined by the following recurrence with boundary con-
dition given in Eq. (18):
S(m)=O(m)+5S \  m2.9 |++O(mlog2 3).
Solving the recurrence, we find that the circuit has size
O(mlog2 3), as desired.
We next show that the circuit computes correctly on all
‘‘allowable’’ input instances with probability greater than
1&\m. That is, with probability greater than 1&\m, the cir-
cuit outputs 0 (resp., 1) on all input instances with at most
rm ones (resp., zeros).
It is straightforward to argue that the whole circuit works
correctly on all ‘‘allowable’’ input instances if the following
conditions hold simultaneously: (i) The circuit used in
Step 1 works correctly (which implies that it outputs at
most (r6) m zeros (resp., ones) on all input instances with
at most rm zero inputs (resp., one inputs)); (ii) At least
three of the five circuits used in Step 3 work correctly for
r-MAJ(Wm2.9X); (iii) The circuit of Step 4 works correctly
as a majority circuit. Clearly, condition (i) fails to hold with
probability less than \2m; condition (ii) fails to hold with
probability less than ( 53) \
3m2.9; and condition (iii) fails to
hold with probability less than \m+2. Thus, the whole cir-
cuit fails with probability less than \2m+( 53) \
3m2.9+
\m+2<\m4+\m2+\m4=\m, where the inequality
follows from inequality (19) and the fact that \< 12. K
Proof of Theorem 4.1. We prove the theorem by con-
verting a fault-free circuit C of size Sf into a fault-tolerant
circuit C$ of the desired size. We will let r= 165 throughout
the proof.
For worst-case faults, the conversion from C to C$ is
extremely simple: we simply make kr copies of C and then
apply a k-fault-tolerant circuit for r-MAJ(kr) as in
Lemma 4.3 to the kr outputs of these kr copies of C. The
new circuit C$ clearly has size O(kSf+klog2 3). With k or
fewer faults, at least (1&r)(kr) copies of C contained in C$
will compute the correct value, which will then be output by
the final k-fault-tolerant circuit for r-MAJ(kr). This shows
that C$ is k-fault-tolerant, completing the proof for worst-
case faults.
For random faults, we may assume without loss of
generality that
log\
=
Sf
1, (20)
since otherwise a circuit of size Sf for computing f in the
fault-free case will contain no faulty gate with probability at
least 1&\Sf>1&=. By Corollary 3.1, we can also assume
without loss of generality that
\\r6+
3
; Sf2. (21)
We next convert C into C$ by the method used in Pip-
penger’s proof of Theorem 3.1 in [8]. We first replace each
396 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153113 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6344 Signs: 4612 . Length: 56 pic 0 pts, 236 mm
register w of C by a collection of l registers [w1 , w2 , ..., wl],
where
l=* log\
=
Sf
for some sufficiently large constant *. For ease of reference,
we refer to such a collection of registers in C$ corresponding
to a single register in C as a ‘‘cable.’’ We next replace each
gate G in C by a ‘‘module’’ M in C$ as follows. Suppose G
has input registers x and y that correspond to cables
X=[x1 , x2 , ..., xl] and Y=[ y1 , y2 , ..., yl], respectively.
(We assume for simplicity that G has two inputs (i.e., G is an
AND or an OR gate), the case where G has only one input
(i.e., G is a NOT gate) can be dealt with similarly.) Then,
module M consists of two parts: Part I comprises l copies of
G, the i th (1il) of which takes xi and yi as its inputs.
Part II is simply a circuit of Lemma 4.2 with m=l (and
r= 165) applied to all of the l outputs from Part I. Finally, we
label all of the l outputs from Part II in an arbitrary order
and consider them as the cable corresponding to the output
of G. To complete our description of C$, we apply a (\, \l)-
fault-tolerant circuit for r-MAJ(l), as in Lemma 4.3, to the
l outputs produced by the last module (i.e., the module
corresponding to the output gate in C).4
The analysis of our circuit is similar to that in the proof
of Theorem 3.1 in [8], except that we need to be more care-
ful with the probabilistic argument; we need to show that
our circuit works with probability greater than 1&= on all
input instances, as opposed to on any fixed input instance
(see Section 2 for more discussion).
Consider an arbitrary faulty version of C$. A module M is
called nice if (i) at most \l+(r2) l of the l constituent
gates in Part I of M are faulty, and (ii) the circuit in Part II
of M works correctly, on all ‘‘allowable’’ input instances.
A faulty version of C$ is nice if all of its modules are nice and
its final (\, \l)-fault-tolerant circuit works correctly for
r-MAJ(l).
By Lemmas 4.1 and 4.2, a particular module is not nice
with probability less than
\2e\r +
rl2
+\2l\ (1195) l+\2l (by inequality (21))
\l200 (by inequality (20)
and since * is
sufficiently large)
\2 log\(=Sf ) (since * is
sufficiently large)

=
2Sf
(by inequality (21)).
Since there are Sf modules in total, the probability that
there exists a module that is not nice is less than =2.
Moreover, the (\, \l)-fault-tolerant circuit for r-MAJ(l)
applied at the end of C$ fails with probability less than
(=Sf)*=2, where the inequality follows from inequality
(21) and the fact that * is sufficiently large. Hence, we con-
clude that with probability greater than 1&=, a faulty ver-
sion of C$ is nice.
It remains to show that a nice version, C$*, of C$ must
compute f correctly on all input instances. Since the final
r-MAJ(l) circuit used in C$ is assumed to be nice in C$*, it
suffices to prove that the module of C$* simulating the output
gate of C outputs at least (1&r6) l values that are identical
to what is output from the fault-free C on the same input
instance. To establish this, it suffices to show that for an
arbitrary module M of C$* simulating a gate G of C, if at
least (1&r6) l of the numbers contained in each input
cable of M are identical to the corresponding input to G,
then at least (1&r6) l of the numbers contained in the out-
put cable of M are identical to the corresponding output of
G. This fact is easily verified as follows. At most 2(r6) l
gates in M may have one of its (one or two) inputs different
from the corresponding input to G. In addition, since M is
assumed to be nice, at most \l+(r2) l(2r3) l gates in
M are faulty, where the inequality follows from inequality
(21). Hence, at most rl of the numbers output from Part I
of M are different from the output of G. Since the circuit in
Part II of M is assumed to work correctly, we now conclude
that at least (1&r6) l of the numbers in the output cable
of M are identical to the output of G.
Finally, our circuit has size
O \\1+log\ =Sf + Sf+\log\
=
Sf+
log2 3
+
=O \\1+log\ =Sf + Sf++O((log\ =)log2 3)
+O \\log\ 1Sf +
log2 3
+
=O \\1+log\ =Sf + Sf+(log\ =)log2 3+ ,
where the second equality holds since (log\(1Sf))log2 3
(log12(1Sf)) log2 3&1(log\(1Sf))=O(Sf log\(1Sf)). K
5. GENERAL LOWER BOUNDS
It would be nice if we could determine when the
redundancy used in Theorem 4.1 is indeed necessary to
obtain the desired fault-tolerance. However, this seems very
difficult in general due to the current lack of strong lower
bounds for fault-free circuits. Nevertheless, we will show in
Theorem 5.1 that the redundancy used in Theorem 4.1 is
397RELIABLE BOOLEAN CIRCUITS
4 The fault-tolerant circuit for r-MAJ(l) used in [8] essentially has size
O(l7), which is too large for our construction.
File: DISTIL 153114 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6377 Signs: 5258 . Length: 56 pic 0 pts, 236 mm
indeed necessary, at least for some particular functions
in some special cases. In many other cases, however,
Theorem 5.1 implies a size lower bound linear in k and
log\ =, which does not seem to be tight. In Theorem 5.2, we
provide a general reduction from any nontrivial function to
the AND function of a certain number of variables. This
reduction, together with Theorem 3.2, yields a general size
lower bound that is superlinear in k and log\ =. In what
follows, a function f on n inputs x1 , x2 , ..., xn is defined
to be sensitive to xi if there exists an input instance
x=(x1 , x2 , ..., xn) such that f (x){ f (xi), where xi differs
from x only at input xi .
Theorem 5.1. For any function f that is sensitive to
at least s2 of its inputs, Sf (k)=0(sk), Sf (\, =)=
0(s log\ =), and, when = is less than a fixed constant smaller
than 1, Sf (\, =)=0(s log1\ s).
Proof. A circuit C can be naturally viewed as a directed
acyclic graph, with registers considered as directed edges
and with gates (plus the original inputs and output of C)
considered as nodes. We say that a register r fully depends
on an input xi if there exists a directed path from xi to r in
the circuit and if there exists no directed path from xj to r for
any j{i. A register r input to a gate G is defined to be criti-
cal to xi if r fully depends on xi and if G has another input
register that does not fully depend on xi . Without loss of
generality, we assume that f is sensitive to x1 , x2 , ..., xs . For
each is, we define Ri to be the set of all registers that are
critical to xi . Clearly,
Ri & Rj=< for i{ j, (22)
since each register can fully depend on at most one input.
When a gate G is short-circuited to its input register r due
to a fault, we can imagine that G is disconnected from its
other input register (if such an input register exists). The
general strategy we will use in the proof is that if C does not
contain enough gates, then with at most k faults or with
probability at least =, the output of C will be disconnected
from xi for some is and therefore C cannot compute f
correctly.
We first prove the lower bound for worst-case faults.
Assuming that C contains fewer than sk2 gates, we will
show that C cannot compute f correctly with up to k faults.
Since each gate has fan-in at most two, we know that C con-
tains at most sk registers. Hence, by equality (22), there
exists is such that |Ri |k. By the definition of being criti-
cal, each gate G containing an input register r # Ri must
contain another input register r$  Ri . Therefore, by short-
circuiting G with r$, we can think of r as being removed from
the circuit. If we apply this to all the gates that contain a
register in Ri , all the registers in Ri will be removed, at the
cost of at most k faults. In the resulting faulty version of C,
the final output of C is disconnected from xi . This is because
the output of C cannot fully depend on xi alone and any
path from xi to a register that does not fully depend on xi
must contain a register in Ri . Finally, since f is sensitive to
xi , the faulty version of C, where xi is disconnected from the
output, cannot compute f correctly without knowing the
value of xi .
For random faults, the lower bound of 0(s log\ =) can be
proved in the same way as for worst-case faults. This is
because in the proof for worst-case faults, we never need to
specify the behavior (correct or faulty) of more than k gates.
We next prove the lower bound of 0(s log1\ s) for = less
than a fixed constant smaller than 1. Clearly, we only need
to consider the case where s is larger than a sufficiently large
constant. Assuming that C contains fewer than 14cs log1\ s
gates for a sufficiently small constant c to be specified later,
we will prove that C fails to compute f with probability at
least =. Since C contains fewer than 14cs log1\ s gates and
since each gate has fan-in at most two, C contains at most
1
2cs log1\ s registers. Therefore, by equality (22), there exist
at least s2 i ’s such that is and |Ri |c log1\ s. Without
loss of generality, we assume |Ri |c log1\ s for is2.
As we have seen, a register of C that is input to a gate G
may be removed by letting G fail in a particular way. For
each is2, let Ai denote the event that at least one register
in Ri remains unremoved, as an input to its corresponding
gate. By the argument for worst-case faults, if event A i hap-
pens for some is2, then C fails to compute f. Hence, we
only need to show that, with probability at least =, there
exists is2 such that A i happens. To show this, we will
actually prove the stronger result
Pr \_ i s2 such that A i+ 1 (23)
as s  +. To prove equality (23), we need to be careful
with the dependence issue since a register in Ri and a
register in Rj may be input to the same gate and therefore
they cannot be removed simultaneously.
By the definition of being critical, it is easy to see that no
two registers in the same set Ri can be input to the same
gate. Therefore, for any is2
Pr(Ai)1&\ |Ri | (24)
1&\c log1\ s
1&s&13 (25)
for sufficiently small constant c.
On the other hand, since |Ri |c log1\ s for is2,
it is easy to see that there exist i1 , i2 , ..., i- s such that
1i1<i2< } } } <i- ss2 and that no gate can contain
one input register from Riu and another input register from
398 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153115 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 6604 Signs: 5651 . Length: 56 pic 0 pts, 236 mm
Riv for u{v and u, v- s. Therefore, Ai1 , Ai2 , ..., Ai- s are
mutually independent events. Hence, by inequality (25),
Pr \ ,u- s Aiu+(1&s
&13)- s  0 (26)
as s  +. This implies the correctness of equality (23). K
The lower bounds in Theorem 5.1 are tight at least for
some functions in some special cases. Take the example of
the majority function of k variables. Since there is an O(k)-
size fault-free circuit that computes the sum of k 1-bit num-
bers, the majority of k variables can be computed with an
O(k)-size fault-free circuit. Hence, by Theorem 4.1, there
exists an O(k2)-size k-fault-tolerant circuit for the majority
of k variables. Therefore, the lower bound of 0(k2) implied
by Theorem 5.1 is tight in this special case. As another
example, Theorems 4.1 and 5.1 together imply a tight bound
of 3(n log n) on the size of (\, =)-fault-tolerant circuits for
the AND function of n variables when \ and = are both
constants.
In the von Neumann fault model with \ and = being con-
stants, it has been known for a long time that for any func-
tion f there exists a (\, =)-fault-tolerant circuit with size
O(Sf log Sf) [4, 8, 16]. A theorem stated in 1977 [3] claims
a lower bound of 0(Sf log Sf) for a certain class of
functions. The correct proofs of this lower bound result
were later given by Ga cs and Ga l [6] and Reischuk and
Schmeltz [13] (see [12] for a partial result). Our
0(s log1\ s) lower bound in Theorem 5.1 is an analogue to
the above-mentioned lower bound of [3, 6, 13]. In fact, the
lower bound of [3, 6, 13] holds even if the circuit is only
required to compute correctly on any input instance, not
necessarily on all input instances (see Section 2 for more dis-
cussion on the difference). It is interesting to observe that
our 0(s log1\ s) lower bound for the short-circuit fault
model does not hold if the circuit is only required to com-
pute correctly on any input instance. An example for this
fact is given in Appendix B.
In the next theorem, we use Mf (k) (resp., Mf (\, =)) to
denote the size of the smallest k-fault-tolerant (resp., (\, =)-
fault-tolerant) monotone circuit for f. In either case, the
monotone circuit is allowed to use ANDOR gates with
unbounded fan-in.
Theorem 5.2. Let f be a function that is sensitive to at
least s of its inputs and let AND(s) be the AND function
of s variables. Then, Sf (k)MAND(s)(k) and Sf (\, =)
MAND(s)(\, =).
We point out that the theorem holds in the strongest
sense as follows. The circuits for f are allowed to use
arbitrarily powerful gates with unbounded fan-in (say, gates
that compute NP-complete functions in a single step), and
the monotone circuits for AND(s) only use AND gates (no
OR gates are necessary), possibly with unbounded fan-in.
(We do not know if the theorem holds when the monotone
circuits for AND(s) are only allowed to use gates with at
most two inputs, unless the circuit for f is restricted to use
gates with at most two inputs as well.) To see why this is
true, notice that in the proof of the theorem, we will not
assume any knowledge on the gates used in the circuits
for f.
Proof of Theorem 5.2. We first prove the theorem for
worst-case faults. Let C be a k-fault-tolerant circuit for f
with size Sf (k). Based on C, we can build a monotone
circuit C$ that is no larger than C, as follows. Assume
without loss of generality that f is sensitive to inputs
x1 , x2 , ..., xs . First, we remove all the input registers for
xs+1 , xs+2 , ..., xn . (This is done at no cost of faults; i.e., we
simply remove them from C.) This will result in some gates
without input. Then, we inductively remove all of the gates
without input until each of the remaining gates has at least
one input. Then, we replace each of the remaining gates by
an AND gate. Some of the AND gates thus introduced may
have only one input, and we replace them with direct con-
nection from their (unique) input to their output. Let C$ be
the resulting circuit. Clearly, C$ is a circuit with s inputs. We
only need to show that C$ is k-fault-tolerant for AND(s).
Assume for contradiction that C$ is not k-fault-tolerant.
Then, we can choose C$* as a faulty version of C$ with at
most k faults such that C$* does not compute AND(s).
Assume that C$* fails to output zero on an input instance
with xi=0 for some is. (Such an input instance exists
because C$* must compute correctly on (1, 1, ..., 1) due to
the fact that no NOT gate is used in C$ and the fact that
every faulty gate is restricted to output one of its input
values.) Now, C$ can be naturally viewed as a directed
acyclic graph with registers being considered as edges and
gates (plus the original inputs and output of C$) being con-
sidered as nodes. Also, C$* corresponds to a subgraph of C$,
where a register r input to a gate G is considered to be
removed if G contains a short-circuit with another input
register. By the assumption that C$* does not output zero
on the input instance with xi=0, in the directed acyclic
graph for C$* that we have just described, the final output
of C$* must be disconnected from xi since otherwise the 0
value at xi would have been output by C$*. (Note that each
gate in C$* is either an AND or behaves like a short circuit.)
By the construction of C$, each gate in C$ corresponds to
a gate in C. For each gate of C that corresponds to a faulty
gate in C$*, we set it to be faulty in the same way that its
counterpart is faulty in C$*. Let C* be the faulty version of
C thus constructed. It is straightforward to show that in the
directed acyclic graph corresponding to C*, the final output
is disconnected from input xi . This means that with at most
k faults, C computes without knowing the value xi . This
399RELIABLE BOOLEAN CIRCUITS
File: DISTIL 153116 . By:DS . Date:09:12:97 . Time:07:51 LOP8M. V8.B. Page 01:01
Codes: 5651 Signs: 4011 . Length: 56 pic 0 pts, 236 mm
contradicts the fact that C is k-fault-tolerant and that f is
sensitive to xi , completing the proof for worst-case faults.
We next extend our proof to the case of random faults.
Given a (\, =)-fault-tolerant Sf (\, =)-size circuit C for f, we
can construct a monotone circuit C$ as in the case of worst-
case faults. We say that a faulty version C* of C, is similar
to a faulty version C$* of C$ if each faulty gate of C$*
behaves in exactly the same way as its counterpart in C*. By
the same reasoning as for worst-case faults, if C$* does not
compute AND(s) correctly, then all faulty versions of C
similar to C$* fail to compute f, and this is true independent
of the behaviors of all gates of C that either do not appear
in C$ or do appear in C$ but are correct in C$*. As a
corollary, if C$* does not compute AND(s) correctly, then
all faulty versions C* of C with the following property must
fail to compute f: each gate of C$* behaves in exactly the
same way as its counterpart in C*. Therefore, the probabil-
ity that a random faulty version of C fails to compute f is
greater than or equal to the probability that a random
faulty version of C$ fails to compute AND(s). Hence, C$ is
a (\, =)-fault-tolerant monotone circuit for AND(s) that has
size at most Sf (\, =), completing the proof of the theorem
for random faults. K
Combining Theorems 5.2 and 3.2, we have the following
general lower bound for any nontrivial function. Formally,
a function f is defined to be nontrivial if f is not a constant
and if f is not identical to one of its inputs. Note that all tri-
vial functions can be computed without using any gate. Like
Theorem 5.2, the next corollary holds even if the circuits
for f are allowed to use arbitrarily powerful gates with
unbounded fan-in (recall that Theorem 3.2 holds even if
gates with unbounded fan-in are allowed).
Corollary 5.1. For any nontrivial function f, Sf (k)=
0(k log k log log k) and Sf ( \, =)=0(log\ = log log\ =
log log log\ =).
We conclude this section by pointing out that, in light of
the proof technique for Theorem 5.2, when gates with
unbounded fan-in are allowed, the ANDOR problem is
equivalent to the following extremal graph problem:
What is the minimum size of a directed acyclic
graph with two sources and one sink such that both of
the sources remain connected to the sink when any k
or fewer vertices in the graph have all but one of their
incoming edges deleted?
Along a similar direction, one may ask some other inter-
esting extremal graph problems. For example, we do not
know whether the answer to the above problem will
change if every vertex of the graph other than the sources is
restricted to have in-degree of 2.
APPENDIX A
In this appendix, we prove an upper bound of
O(k log klog log k) on S(k) determined by the recursion
S(k)=S(k&ini)+n1+n2+ } } } +ni for 1ik, (27)
where n1=k, S(0)=1, and S( j)=0 for j<0.
Let the answer to the recursion be k ln kf (k) for some
function f to be determined. Choose the sequence
[ni | 1ik] so that it can be broken into r=r(k) sub-
sequences of the form n{i&1+1 , ..., n{i (where {0 is assumed to
be 0) each summing to k. By the boundary condition n1=k,
we immediately have
{1=1. (28)
We will also ensure
{r=k. (29)
We next choose ni in a way so that if we break off recursion
(27) at any i, then the solution will be O(k ln kf (k)), inde-
pendent of the choice of i. We do this by setting
ini=
jkf (k)
ln k
for each i in the j th group. (30)
Note that by Eq. (30), if we break off the recursion at some
i in the jth group, then the recursion becomes
S(k)S \k&jkf (k)ln k ++ jk,
which has a solution
S(k)=O \jk \ ln kf (k) j++=O \
k ln k
f (k) + , (31)
independent of the choice of j, as desired.
To ensure that the sum of each subsequence is equal to k,
we need for j>11
k= :
{j&1<i{j
ni= :
{j&1<i{j
jkf (k)
i ln k
=
jkf (k)
ln k \ln
{j
{j&1
\O(1)+ ,
where the second equality follows from Eq. (30). This can be
guaranteed by letting
ln
{j
{j&1
=
ln k
j f (k)
\O(1) for 1< jr. (32)
To ensure the boundary conditions of Eqs. (28) and (29), we
only need to make ln({r{1)=ln k. Equivalently, we need to
400 KLEITMAN, LEIGHTON, AND MA
File: DISTIL 153117 . By:DS . Date:09:12:97 . Time:07:52 LOP8M. V8.B. Page 01:01
Codes: 10118 Signs: 5567 . Length: 56 pic 0 pts, 236 mm
make the sum (over all j=1, 2, ..., r) of the right-hand side
of Eq. (32) be equal to ln k, i.e.,
ln k=ln
{r
{1
=\O(r)+
ln k
f (k)
:
1< jr
1
j
=
ln k(ln r\O(1))
f (k)
\O(r).
If we choose r=ln k(2 ln ln k), then f (k)=ln ln k
\o(ln ln k). Now, by Eq. (31), S(k)=O(k ln kf (k))=
O(k log klog log k), as desired.
APPENDIX B
In this appendix, we give an O(n(log log n)log2 3)-size cir-
cuit that computes the AND of n variables correctly with
probability greater than a constant, 1&=>0, on any (not
necessarily all) input instance, even if each gate is faulty with
probability upper bounded by a constant \<1. This exam-
ple shows that the 0(s log1\ s) lower bound in Theorem 5.1
does not hold if the circuit is only required to compute
correctly on any input instance. For simplicity, we assume
that n is an integer power of 2. In the fault-free case, the
AND function of n variables can be computed by n&1
AND gates arranged in a complete binary tree. In this fault-
free circuit, we replace each AND gate by a (\, =log n)-
fault-tolerant circuit for the AND function of two variables.
By Theorem 3.1, we can choose each of the (\, =log n)-
fault-tolerant AND circuits to have O((log log n)log2 3) size.
Hence, the size of the whole circuit is O(n(log log n)log2 3).
We next show that the circuit thus constructed computes
the AND function of n variables on any input instance with
probability greater than 1&=. If all the inputs are ones, then
the correct output value is 1 and the circuit can never make
a mistake. Now, suppose that at least one of the n inputs is
zero, say, xi=0. We need to show that with probability
greater than 1&=, the output will be zero. Clearly, there is
path from input xi to the output of the whole circuit that
passes through exactly log n (\, =log n)-fault-tolerant cir-
cuits for the AND function of two variables. Each of the
AND circuits in the path fails with probability less than
=log n. Hence, the probability that at least one of the AND
circuits in the path fails is less than =. In other words, with
probability greater than 1&=, all the circuits along the path
work correctly and the overall circuit will output 0.
ACKNOWLEDGMENTS
The authors thank Richard Ehrenborg, Nicholas Pippenger, Greg
Plaxton, and Leonard Schulman for valuable conversations. Special
thanks to Ehrenborg for many stimulating discussions in the early stage of
this research and for suggesting the story of Charles in the Gate Store. We
thank the anonymous referees for their extraordinary effort and expert
suggestions.
REFERENCES
1. N. Alon and J. H. Spencer, ‘‘The Probabilistic Method,’’ Wiley
Interscience, New York, 1991.
2. S. Assaf and E. Upfal, Fault-tolerant sorting networks, SIAM J. Disc.
Math. 4, No. 4 (1991), 472480; The conference version appeared in
‘‘Proceedings of the 31st Annual IEEE Symposium on Foundations of
Computer Science, October 1990,’’ pp. 275284.
3. R. L. Dobrushin and S. I. Ortyukov, Lower bound for the redundancy
of self-correcting arrangements of unreliable functional elements, Prob.
Inf. Trans. 13 (1977), 5965.
4. R. L. Dobrushin and S. I. Ortyukov, Upper bounds for the redundancy
of self-correcting arrangements of unreliable functional elements, Prob.
Inf. Trans. 13 (1977), 203218.
5. W. Evans and L. Schulman, Signal propagation, with application to a
lower bound on the depth of noisy formulas, in ‘‘Proceedings of the
34th Annual IEEE Symposium on Foundations of Computer Science,
1993,’’ pp. 594603.
6. P. Ga cs and A. Ga l, Lower bounds for the complexity of reliable
Boolean circuits with noisy gates, IEEE Transactions on Information
Theory 40 (1994), 579583; A preliminary version by A. Ga l appeared
in ‘‘Proceedings of the 32nd Annual IEEE Symposium on Foundations
of Computer Science, 1991,’’ pp. 594601.
7. T. Leighton and C. Leierson, Wafer-scale integration of systolic arrays,
IEEE Transactions on Computers C-34, No. 5 (1985), 448461.
8. N. Pippenger, On networks of noisy gates, in ‘‘Proceedings of
the 26th Annual IEEE Symposium on Foundations of Computer
Science, October, 1985,’’ pp. 3036.
9. N. Pippenger, Reliable computation by formulae in the presence of
noise, IEEE Trans. Inf. Theory 34 (1988), 194197.
10. N. Pippenger, Invariance of complexity measures for networks with
unreliable gates, Journal of the ACM 36, No. 3 (1989), 531539.
11. N. Pippenger, Developments in the synthesis of reliable organisms
from unreliable components, in ‘‘Proc. Symp. Pure Math.,’’ Vol. 50,
Amer. Math. Soc., Providence, RI, 1990.
12. N. Pippenger, G. D. Stamoulis, and J. N. Tsitsiklis, On a lower bound
for the redundancy of reliable networks with noisy gates, IEEE Trans.
Inform. Theory 37, No. 3 (1991), 639643.
13. R. Reischuk and B. Schmeltz, Reliable computation with noisy circuits
and decision treesa general n log n lower bound, in ‘‘Proceedings of
the 32nd Annual IEEE Symposium on Foundations of Computer
Science, 1991,’’ pp. 602611.
14. D. Uhlig, Reliable networks from unreliable gates with almost minimal
complexity, in ‘‘Proceedings of Fundamentals of Computation
Theory,’’ pp. 462469, Lect. Notes in Comp. Science, Vol. 278,
Springer-Verlag, New York, 1987.
15. D. Uhlig, Reliable networks for Boolean functions with small com-
plexity, in ‘‘Proceedings of the 4th Parcella,’’ pp. 366371, Lect. Notes
in Comp. Science, Vol. 278, Springer-Verlag, New York, 1988.
16. J. von Neumann, Probabilistic logics and the synthesis of reliable
organisms from unreliable components, in ‘‘Automata Studies’’
(C. E. Shannon and J. McCarthy, Eds.), pp. 329378, Princeton Univ.
Press, Princeton, NJ, 1956.
401RELIABLE BOOLEAN CIRCUITS
