Breaking theΘ(nlog2n) Barrier for Sorting with Faults  by Leighton, Tom et al.
File: 571J 147001 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6879 Signs: 4694 . Length: 60 pic 11 pts, 257 mm
Journal of Computer and System Sciences  SS1470
journal of computer and system sciences 54, 265304 (1997)
Breaking the 3(n log2 n) Barrier for Sorting with Faults
Tom Leighton* and Yuan Ma*
Department of Mathematics and Laboratory for Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139
and
C. Greg Plaxton-
Department of Computer Science, University of Texas at Austin, Austin, Texas 78712
Received July 29, 1996
In this paper, we study the problem of constructing a sorting circuit,
network, or PRAM algorithm that is tolerant to faults. For the most part,
we focus on fault patterns that are random, i.e., where the result of each
comparison is independently faulty with probability upper bounded by
some constant. All previous fault-tolerant sorting circuits, networks,
and parallel algorithms require 0(log2 n) depth andor 0(n log2 n)
comparisons to sort n items. In this paper, we construct:
v a passive-fault-tolerant sorting circuit with O(n log n log log n)
comparators, thereby answering a question posed by Yao and Yao in
1985,
v a reversal-fault-tolerant sorting network with O(n loglog2 3 n)
comparators, thereby answering a question posed by Assaf and Upfal
in 1990, and
v a deterministic O(log n)-step O(n)-processor EREW PRAM
fault-tolerant sorting algorithm, thereby answering a question posed by
Feige, Peleg, Raghavan, and Upfal in 1990.
The results are based on a new analysis of the AKS circuit, which
uses a much weaker notion of expansion that can be preserved in the
presence of faults. Previously, the AKS circuit was not believed to be
fault-tolerant because the expansion properties that were believed to be
crucial for the performance of the circuit are destroyed by random
faults. Extensions of our results for worst-case faults are also presented.
] 1997 Academic Press
1. INTRODUCTION
Sorting circuits have intrigued and challenged computer
scientists for decades. They have also proved to be very use-
ful for a variety of applications, including circuit switching
and packet routing [10]. In this paper, we study the
problem of constructing sorting circuits that are tolerant to
a potentially large number of faults.
The study of fault-tolerant sorting circuits was initiated
by Yao and Yao [20] in 1985. In particular, Yao and Yao
proposed a fault model in which a faulty comparator simply
outputs its two inputs without comparison (i.e., the items
are output in the same order in which they are input); we
will refer to such a fault as a passive fault. They defined an
n-input fault-tolerant sorting circuit to be a circuit that
remains a sorting circuit with probability at least 1&1n
even if each comparator is independently faulty with prob-
ability upper bounded by a constant strictly less than 1.
They observed that any sorting circuit can be made into a
passive-fault-tolerant sorting circuit if each of the original
comparators is replicated O(log n) times. (In this paper, all
logarithms are taken base 2 unless otherwise specified.) This
immediately yields a passive-fault-tolerant sorting circuit
with O(log n) depth and O(n log2 n) size. (The depth of a cir-
cuit or network is defined to be the number of levels in the
circuit or network, where each register is associated with at
most one comparator in each level. The size of a circuit or
network is defined as the number of comparators in the cir-
cuit or network.) Whether or not there is an alternative
approach to fault-tolerance that requires fewer comparators
has remained an interesting open question for several years.
In particular, Yao and Yao conjectured that |(n log n)
comparators are needed to construct a fault-tolerant sorting
or merging circuit, but no proof of this conjecture has yet
been discovered.
Since Yao and Yao, many researchers have studied fault-
tolerant circuits, networks, and algorithms for sorting-
related problems in various models. (See [4, 6, 7, 12, 13,
1820].) Despite all of these efforts, the O(log n)-gap
between the trivial upper and lower bounds has remained
open for Yao and Yao’s question for both sorting and merg-
ing. One approach to narrowing the O(log n)-gap was,
investigated by Leighton, Ma, and Plaxton [13], who con-
structed an O(n log n log log n) size circuit that sorts any
permutation with probability at least 1&1n. Yet, this
article no. SS971470
265 0022-000097 25.00
Copyright  1997 by Academic Press
All rights of reproduction in any form reserved.
* Supported by AFOSR Contract F49620-92-J-0125, DARPA Contract
N00014-91-J-1698, and DARPA Contract N00014-92-J-1799. E-mail:
ftlmath.mit.edu; yuancs.stanford.edu.
- Supported by NSF Research Initiation Award CCR-9111591 and the
Texas Advanced Research Program under Grant No. 003658-461. E-mail:
plaxtoncs.utexas.edu.
File: 571J 147002 . By:XX . Date:24:03:97 . Time:11:08 LOP8M. V8.0. Page 01:01
Codes: 6003 Signs: 5045 . Length: 56 pic 0 pts, 236 mm
circuit does not yield an answer to Yao and Yao’s question
because sorting any input permutation with high probabil-
ity is not sufficient to guarantee sorting all input permuta-
tions with high probability, and hence it is not sufficient to
guarantee that a faulty version of the circuit will be a sorting
circuit with high probability. In other words, for a randomly
generated fault pattern, there are likely to be some input
permutations for which the circuit of [13] fails to sort.
(Formally, a fault pattern completely specifies which com-
parators, if any, are faulty and how they are faulty. That is,
a fault pattern of a circuit or network contains all the infor-
mation needed to specify the functionality of all the com-
parators in the circuit or network.)
Since 1985, several other fault models have also been for-
mulated for the study of fault-tolerant sorting circuits [4, 13].
In the reversal fault model, a faulty comparator outputs the
two inputs in reversed order, regardless of their input order.
In the destructive fault model, a faulty comparator can output
the two inputs in reversed order, and it can also output one
of the two inputs in both of the output registers (i.e., the result
of a comparison between x and y can be (x, y), ( y, x), (x, x),
or ( y, y)). In order to tolerate destructive andor reversal
faults, Assaf and Upfal [4] introduced a new computational
model for the study of the sorting problem. In their new
model, more than n registers are allowed to sort n items and
replicators are used to copy the item stored in one register to
another register. We will call this model the sorting network
model to distinguish it from the classic sorting circuit model in
which only n registers are used to sort n inputs and no
replicators are allowed. For example, a 2-input sorting
network with replicators that is tolerant to any single reversal
or destructive fault is illustrated in Fig. 1.
In [4], Assaf and Upfal described a general method for
converting any sorting circuit into a reversal-fault-tolerant
FIG. 1. (a) A replicator. (b) A 2-input sorting network that tolerates
any single reversal or destructive fault.
or destructive-fault-tolerant sorting network. In particular,
given an n-input sorting circuit with depth d and size s, the
fault-tolerant network produced by the AssafUpfal trans-
formation has depth O(d ) and size O(s log n). (Asymptoti-
cally, it makes no difference whether or not the replicators
are counted toward the size since an optimal network would
make a copy of an item only if the item were to be input to
a comparator.) When used in conjunction with the AKS
sorting circuit [1], this provides a reversal-fault-tolerant or
destructive-fault-tolerant sorting network with O(n log2 n)
size.1
The AssafUpfal method proceeds by making 3(log n)
copies of each item and replacing each comparator with
3(log n) comparators, followed by a majority-enhancing
device that is constructed from expanders. As a conse-
quence, the size of the resulting network is increased by a
3(log n) factor. Whether or not there is an alternative
approach to fault-tolerance that can avoid the 3(log n) fac-
tor blowup in size (even for the much simpler problem of
merging) was an interesting open question posed in [4].
For destructive faults, this question was recently answered
by Leighton and Ma [12] with an 0(n log2 n) lower bound
for both merging and sorting. The question concerning
reversal-fault-tolerant sorting networks remained open,
however.
The problem of sorting with faults has also been studied
in the PRAM model of computation. An algorithm is called
a fault-tolerant sorting algorithm if it sorts any n items with
probability at least 1&1n even if an answer to any com-
parison query is incorrect with probability upper bounded
by a constant strictly less than 12 . (Note that when the fault
probability is equal to 12 , we cannot obtain any useful infor-
mation from a comparison.) Feige, Peleg, Raghavan, and
Upfal [7] designed a randomized fault-tolerant sorting
algorithm that uses O(log n) expected time on an O(n)-pro-
cessor CREW PRAM. They left open the question of
whether or not there is a deterministic fault-tolerant sorting
algorithm that runs in o(log2 n) steps on O(n) processors.
In this paper, we develop a fault-tolerant sorting circuit,
network, and PRAM algorithm that beats the 3(n log2 n)
barrier in each of the preceding models, thereby partially or
wholly resolving the questions posed by Yao and Yao [20],
Assaf and Upfal [4], and Feige et al. [7]. In particular, we
construct:
(i) a passive-fault-tolerant sorting circuit with
O(log n log log n) depth and O(n log n log log n) size, which
resolves the question posed by Yao and Yao [20] to within
an O(log log n) factor,
266 LEIGHTON, MA, AND PLAXTON
1 In this paper, as in [4], we assume that the replicators are fault-free.
This is not a particularly unreasonable assumption since replicators can be
hard-wired and they do not contain any logic elements. In fact, some of our
results can be extended to handle a model in which replicators are also
allowed to fail.
File: 571J 147003 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6733 Signs: 5813 . Length: 56 pic 0 pts, 236 mm
(ii) a reversal-fault-tolerant sorting network with size
O(n loglog2 3 n), which partially resolves the question of
Assaf and Upfal [4], and
(iii) a fault-tolerant sorting algorithm that runs in
O(log n) steps on an O(n)-processor EREW PRAM, which
resolves the question of Feige et al. [7].
All of these results are based on a surprisingly strong
fault-tolerance property of the AKS circuit [1]. The results
are surprising because the AKS circuit was not previously
believed to be useful in the context of fault-tolerant sorting
algorithms; when a constant fraction of the comparators fail
to work, the expansion property, which plays a central role
in the functionality of the fault-free AKS sorting circuit, is
lost and it appears that the AKS circuit cannot sort at all.
(This is perhaps the main reason that people were unable to
make progress on Yao and Yao’s question.) The novelty of
our work is to show that some loose expansion properties,
which can be preserved even in the presence of faults, are
sufficient to ‘‘approximate-sort,’’ and that approximate-
sorting can be combined with other methods to sort.
Although we will mainly focus on the study of circuits,
networks, and algorithms that are tolerant to random
faults, all of our techniques apply for constructing circuits,
networks, and algorithms that are tolerant to worst-case
faults. Our results for worst-case faults include the first
asymptotically nontrivial upper bound for the depth of
worst-case passive-fault-tolerant sorting circuits, a worst-
case reversal-fault-tolerant sorting network, and an optimal
worst-case fault-tolerant EREW PRAM algorithm for sort-
ing. The techniques in this paper can also be applied to
other sorting-related problems. For example, we will con-
struct a passive-fault-tolerant selection circuit with the
asymptotically optimal size of O(n log n).
Throughout the paper, we use n to denote the number of
input items and \<1 to denote the upper bound on the
failure probability for each comparator, unless otherwise
specified. A circuit, network, or algorithm for solving a
problem Q is defined to be (\, =)-fault-tolerant if it satisfies
the following condition: when each comparator or com-
parison is independently faulty with probability upper
bounded by \, then with probability at least 1&=, a faulty
version of the circuit, network, or algorithm remains a cir-
cuit, network, or algorithm that solves Q on all possible
input instances. When we simply refer to a circuit, network,
or algorithm as fault-tolerant, we mean that the circuit,
network, or algorithm is (\, 1n)-fault-tolerant for some
constant \<1. For any constant c, all of our constructions
can be made into constructions with success probability at
least 1&1nc with no more than a constant factor increase
in size, but we will be content to achieve a success probabil-
ity of at least 1&1n in most parts of the paper.
As an alternative, we could have defined the notion of
(\, =)-fault-tolerance by assuming that \ is exactly the
failure probability of each comparator or comparison, as
opposed to an upper bound on the failure probability. In
general, these two definitions are not equivalent, and it is
straightforward to show that any (\, =)-fault-tolerant cir-
cuit, network, or algorithm defined in the preceding
paragraph is also (\, =)-fault-tolerant according to the alter-
native definitions.2 Hence, to get the strongest possible
results, we will use the definition of the preceding paragraph
for all upper bound results, which include all but Theorems
4.1 and 4.2, and use the alternative definition for all the
lower bound results, which include Theorems 4.1 and 4.2.
Finally, we point out that all of the comparators used in
the constructions of our circuits and networks move the
small input to the top register and the large input to the bot-
tom register. Following the notation of Knuth [9], this
means that all of our circuits and networks are standard. All
of our lower bounds are proved for the general case, i.e., we
do not assume that the circuit is standard in the lower
bound proofs. In the fault-free case, it has been proved that
any nonstandard sorting circuit can be converted into a
standard sorting circuit with the same depth and size (see
Exercise 16 on page 239 of [9]). However, we do not know
if a similar result is true when the circuit is subject to passive
faults.
The remainder of the paper is organized as follows. In
Section 2, we prove that the AKS circuit has certain useful
fault-tolerance properties. In Section 3, we use the AKS cir-
cuit to construct an O(log n log log n)-depth passive-fault-
tolerant sorting circuit. In Section 4, we describe the rever-
sal-fault-tolerant approximate-sorting circuits and sorting
networks. In Section 5, we describe the fault-tolerant
PRAM sorting algorithm. Finally, in Section 6, we extend
our results to worst-case faults.
We remark that this paper is combined from the results
presented in [11, 13].
2. THE ANALYSIS OF THE AKS CIRCUIT
In this section, we show that the AKS circuit [1] has cer-
tain fault-tolerance properties under both the passive and
reversal fault models. These fault-tolerance properties will
be the cornerstone for most of the fault-tolerant sorting cir-
cuits, networks, and algorithms in the paper. We believe
that our new analysis of the AKS circuit is of separate inter-
est in its own right. The section is divided into three parts:
Section 2.1 explains why the previously known analyses of
the AKS circuit are not sufficient to establish the desired
fault-tolerance properties and highlights the major dif-
ficulties in the new analysis. Section 2.2 contains a brief
description of the AKS circuit and the relevant parameter
267BREAKING THE 3(n log2 n) BARRIER
2 A detailed discussion of this phenomenon in the context of Boolean cir-
cuits with noisy gates can be found in [17].
File: 571J 147004 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6492 Signs: 5792 . Length: 56 pic 0 pts, 236 mm
choices. Section 2.3 proves the main theorem of the section
and its corollary.
2.1. The Need for a New Analysis of the AKS Circuit
The key component of the AKS circuit is the =-halver. An
m-input circuit is called an =-halver if, on any m distinct
inputs, it outputs at most =k of the k smallest (largest)
inputs into the bottom (top) m2 registers for any km2.
For any constant =>0, a bounded-depth =-halver can be
built from an expander as follows. Take an (m2)_(m2)
d-regular (=, (1&=)=)-bipartite expander with vertex sets A
and B, where d is a constant dependent on =. (A bipartite
(m2)_(m2) graph with vertex sets A and B is called an
(:, ;)-expander if any k:m nodes in A (or B) are connected
to at least ;k nodes in B (or A). Explicit constructions of
expanders can be found in [15].) Assign each vertex in A to
a register in the top half of the circuit, and assign each vertex
in B to a register in the bottom half of the circuit. Partition
the edges of the expander into d disjoint matchings. Assign
a comparator between two registers at level i in the halver
if and only if the corresponding vertices are connected by an
edge in the ith matching of the expander.
To see why this construction yields an =-halver, we
assume for the purposes of contradiction that the circuit is
not an =-halver. Without loss of generality, we can assume
that there exist m distinct inputs and an integer km2 such
that strictly more than =k of the k smallest inputs are output
into R, a set of strictly more than =k registers in the bottom
half of the circuit. Let R$ be the set of registers in the top half
of the circuit that are connected to some registers in R. It is
not hard to show that all registers in R$ contain outputs
with rank at most k. Therefore, all of the |R|+|R$|
|R|+((1&=)=) =k=|R|+(1&=) k>k output items con-
tained in either R or R$ have rank at most k. This is a
contradiction.
It would be nice if the =-halver could tolerate random
faults automatically or if the =-halver could be made fault-
tolerant with a o(log n) factor increase in the depth. (For
example, if this were possible, Yao and Yao’s question
would have an easy answer.) As we have seen in the pre-
vious paragraph, however, the fact that |R$|(1&=) k is
critical to guarantee the =-halver property, and this fact in
turn depends on the expansion property of the expander.
Unfortunately, the following observation indicates that the
expansion is lost in the presence of faulty comparators and
that the cost of achieving fault-tolerant expansion is very
high. For example, if d=3(1) and each comparator in the
=-halver constructed above is independently faulty with con-
stant probability, then with high probability there exists a
set of k=3(m) registers in the bottom half of the circuit for
which all associated comparators are faulty. Hence, if the k
smallest inputs are all input to these registers, then the
inputs cannot be moved to the top half of the circuit. This
shows that the =-halver itself cannot withstand random
faults. Moreover, even if we increase the depth of an =-halver
by a nonconstant o(log n) factor, any constant number of
registers are connected to only o(log n) comparators, and,
with probability |(1n), these registers are not connected to
any working comparators. (Using more careful arguments,
we can actually show that with probability approaching 1,
there exists a set of |(1) registers that are not connected to
any working comparators.) Hence, if a constant number of
the smallest inputs are input to these registers, they cannot
be reliably moved to the top half of the halver.
Since expansion plays a central role for both the func-
tionality and the correctness proof of the AKS circuit, the
loss of such expansion in the presence of faulty comparators
is a fatal problem for all previously known analyses of the
AKS circuit. In fact, the loss of the expansion property even
makes the approach of using the AKS circuit to construct
fault-tolerant sorting circuits seem to be hopeless. The
novelty of our work is to show that, even without guaran-
teeing ‘‘local’’ expansion at any single expander, it is
possible to enforce a certain ‘‘global’’ expansion property
that is sufficient to guarantee that the AKS circuit functions
reasonably well.
2.2. The Description of a Modified AKS Circuit
In this section, we describe a modified AKS circuit that
will be shown to possess certain fault-tolerance properties.
In Section 2.2.1, we describe a (slight) modification of the
AKS circuit described by Paterson in [16]. In particular,
we modify some parameter choices and replace the so-called
separators of [16] by a new family of building blocks that
we call partitioners. In Section 2.2.2, we further modify the
AKS circuit into an l-AKS circuit, where l is a parameter to
be specified later that corresponds to the amount of fault-
tolerance attained by the circuit.
We will be content with proving that certain parameter
choices guarantee the desired fault-tolerance properties. No
attempt will be made to keep the involved constants small.
In particular, an extremely large constant (much larger than
the previously best known constant for the AKS circuit) is
hidden behind the O-notation for all of our circuits,
networks, and algorithms.
2.2.1. The AKS Circuit
In this section, we describe Paterson’s version of the AKS
circuit [16]. For simplicity, we closely follow the descrip-
tion in [16] whenever possible. In particular, we use the
same letters to denote the same parameters as in [16] unless
otherwise specified.
At a high level, the AKS circuit is structured around a
complete binary tree. For ease of notation, we will refer to
the tree to be defined as the AKS tree and refer to each of its
nodes as an AKS tree node. The AKS circuit works in stages
268 LEIGHTON, MA, AND PLAXTON
File: 571J 147005 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6327 Signs: 5082 . Length: 56 pic 0 pts, 236 mm
starting from stage 1. Each of the stages corresponds to a
constant number of consecutive levels of the AKS circuit.
From one stage to the next, each register of the AKS circuit
is moved from one AKS tree node to another. Within a
stage, a sorting-related device is applied to all registers
located at each AKS tree node. The AKS circuit is then
defined by specifying: (i) the structure of the AKS tree, (ii)
how the registers are moved within the AKS tree, and (iii)
what sorting-related device is applied and how it is applied
to registers residing at each AKS tree node.
The remainder of Section 2.2.1 is divided into four parts.
The first three parts address the three problems identified
immediately above. The last part deals with a particular
technical problem.
The AKS tree. The AKS tree is a complete binary tree,
with the root at the top. Each of the AKS tree nodes has a
dynamically changing (as a function of the stage) capacity
that specifies the maximum number of items that can be
stored in the node.3
To describe the capacity of an AKS tree node precisely,
we first choose parameters
A=3, &= 4348+=$, (1)
where =$ is a sufficiently small positive constant. Note that =$
has nothing to do with the parameter = of [16], and it is
used only to bound & away from 4348 . In [16], & was first
chosen to be exactly 4348 , and later was slightly increased after
the effect of ‘‘integer rounding’’ was considered. The par-
ticular choice of =$ is not important. For example, we may
choose
=$=10&2. (2)
The capacity of an AKS tree node X, denoted by cap(X ), is
determined by
cap(X )=&t&1Ad \1& 14A2+ n, (3)
where t denotes the stage and d denotes the level of X in the
AKS tree (with the root considered to be at level 0).
Intuitively, such a choice of capacity means that: (i) at any
time, the capacities of all nodes at the same level of the tree
are equal, and (ii) the capacity of a node decreases geometri-
cally with the stage and increases geometrically with the
level of the node. Note that the second property ensures that
items will be squeezed downward in the AKS tree toward
their final locations.
Let |X| denote the number of items that are actually con-
tained in an AKS tree node X. By the definition of capacity,
|X|cap(X ) for all X. We say that X is empty, full, or par-
tially full if |X|=0, |X|=cap(X ), or 0<|X|<cap(X ),
respectively. The AKS tree may be viewed to be infinite,
with most of its nodes empty. For convenience, we call each
of the nodes at the lowest nonempty level of the AKS tree a
leaf.
Register movements. Roughly speaking, the left subtree
of the AKS tree corresponds to items being sorted that are
smaller than the median, and the right subtree of the AKS
tree corresponds to the items larger than the median. Recur-
sively, both of the subtrees have a similar property. Initially,
(almost) all of the items are located in the root of the AKS
tree. As time progresses, (most) items move downward in
the AKS tree, and the goal is to move all of the items to their
correct locations within the AKS tree. It would be ideal if we
could exactly partition all of the items residing at each AKS
tree node in a constant number of steps, since then we could
sort n items in the desired O(log n) time. Unfortunately, it is
easy to see that exactly partitioning m items requires
0(log m) steps in the circuit model. Hence, any sorting cir-
cuit based on exact partitioning has 0(log2 n) depth. One
novelty of the AKS circuit is to use a certain approximate-
partition gadget at each AKS tree node to move most of the
items residing at the node to the correct child of the node in
a constant number of steps. A detailed and rigorous treat-
ment of this gadget is given in the next part of Section 2.2.1.
Assuming that the approximate-partition gadget can be
constructed, we now describe how the registers are moved
within the AKS tree. In what follows, we will specify various
sizes of sets of registers (or items) as real numbers. How to
round these real numbers to some close integers is the sub-
ject of the last part of Section 2.2.1.
Initially, an arbitrary set of (1&14A2) n registers are
located at the root of the tree (i.e., the root is filled to its
capacity), and all of the levels below the root are empty. In
general, at each stage, all registers residing in an AKS tree
node X are first partitioned into four parts (by a partitioner
to be defined in the next part of Section 2.2.1): FL (far-left),
CL (center-left), CR (center-right), and FR (far-right) so
that
|FL|=|FR|=min {*2 cap(x),
|X|
2 = (4)
and
|CL|=|CR|=
|X|
2
&|FL|,
269BREAKING THE 3(n log2 n) BARRIER
3 More precisely, since we need to construct an oblivious circuit, as
opposed to an adaptive algorithm, we can only move the registers (not the
items themselves) among AKS tree nodes. For simplicity, however, we will
not distinguish between a register and the item contained in the register, as
long as the meaning is clear from the context. For example, an alternative
(and perhaps more rigorous) definition of the capacity of a node is the
maximum number of registers (instead of items) allowed in the node.
File: 571J 147006 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6047 Signs: 4920 . Length: 56 pic 0 pts, 236 mm
where
*=
&&12A
2A&12A
(5)
=
1
8
+
6
35
_10&2. (6)
(In [16], * was first set to 18 and then slightly increased after
the effect of ‘‘integer rounding’’ was considered.) Then,
FL & FR are sent up to the parent of X, and CL and CR are
sent down to the left and right child of X, respectively.
The preceding description of register movement is almost
correct, except that the root does not have a parent yet. To
get around this boundary situation, we keep a subset of
registers at a special node, which we call the cold storage,
immediately above the root. In particular, we would like to
treat the cold storage as simulating half of the root’s parent
(in terms of capacity), one-quarter of its grandparent, one-
eighth of its greatgrandparent, and so on. (This will be use-
ful in the analysis of the AKS circuit; see Case 1 of the proof
of Theorem 2.1. Also, we only let the cold storage simulate
every other level since, in Claim 2.1, it will be proved that
the levels of the AKS tree are alternately full and empty
above the leaf level.) This requires that the capacity of the
cold storage be
cap(root)
2A
+
cap(root)
(2A)3
+ } } } =2A
cap(root)
4A2&1
at even stages,
cap(root)
(2A)2
+
cap(root)
(2A)4
+ } } } =
cap(root)
4A2&1
at odd stages.
No comparator or partitioner is applied to the registers in
the cold storage. Registers are exchanged between the root
and the cold storage as if the cold storage is the parent of the
root. More precisely: (i) from an odd stage to an even stage,
all of the * cap(root) registers in FL and FR of the root are
sent to the cold storage, where cap(root) denotes the
capacity of the root at the odd stage; (ii) from an even stage
to an odd stage, (1&*)2A cap(root) registers are sent from
the cold storage to the root, where cap(root) denotes the
capacity of the root at the even stage. (Here, we do not have
to worry about the case where the root may be partially full,
since, by our choice of capacities, the items tend to move
downward in the AKS tree and the root is always full at odd
stages and empty at even stages.)
We have thus specified the rules for moving registers
within the AKS tree. The rules would not be valid unless the
capacity constraints on all AKS tree nodes can be observed
simultaneously. To establish the validity of the rules (and
for future applications), we prove the following stronger
result.
Claim 2.1. At odd stages: (i) all the nodes at odd levels
are empty, (ii) all the nodes at even levels and strictly below
the leaf level are empty, (iii) all the nodes at even levels and
strictly above the leaf level are full, and (iv) either all the
leaves are full or all the leaves are partially full. A symmetric
series of properties holds at even stages. The cold storage is
always full.
Proof. It is convenient to consider the cold storage as
simulating half of the root’s parent, one-quarter of its
grandparent, one-eighth of its greatgrandparent, and so on.
Under such an assumption, the last sentence of the claim
follows from the rest of the claim since the cold storage
works as if it is a chain of ‘‘regular’’ nodes. This will enable
us to avoid the undesirable complication caused by the
boundary situation at the cold storage.
By Eq. (5), we have
&=2*A+
1&*
2A
.
The correctness of the claim is then easily checked by induc-
tion on stages, the preceding equation, and some algebraic
manipulations. K
In the base case of our induction proof for Claim 2.1, we
need the following facts during stage 1: (i) all registers are
either at the cold storage or at the root, and (ii) the
capacities of the root and the cold storage sum to n. The lat-
ter fact explains why we have set cap(root)=n(1&14A2)
for t=1 in Eq. (3).
As time progresses, the capacity of each AKS tree node
gets smaller and smaller. When the capacity of the root
becomes so small (i.e., smaller than a certain constant) that
we are about to split the AKS tree into two (unconnected)
subtrees, special care is needed to guarantee the per-
formance of the AKS circuit. In [16, Section 6], this was
handled using Batcher’s sorting circuit [5]. In our applica-
tion of the AKS circuit, however, we never need to run the
AKS circuit all the way to completion. Instead, we stop the
AKS circuit far before the capacity of the root gets close
to a constant. (See the last paragraph of Section 2.2.2
for more details.) Consequently we do not need to worry
about this particular problem.
A (*$, ,)-partitioner. As we have seen in the preceding
part of Section 2.2.1, a sorting-related device is needed to
partition items residing at each AKS tree node into four
parts: FL, CL, CR, and FR. Such a task was accomplished
by using the so-called separator in [16] (and by using
the so-called near-sorting circuit in [1]). Informally, a
separator is slightly more powerful than a halver in the sense
that a separator not only moves most of its inputs to the
correct half but also moves most of its ‘‘extreme’’ inputs
270 LEIGHTON, MA, AND PLAXTON
File: 571J 147007 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6636 Signs: 5487 . Length: 56 pic 0 pts, 236 mm
close to the extreme positions. Since, as discussed in Section
2.1, we cannot build =-halvers that are both efficient and
fault-tolerant, we cannot construct efficient and fault-
tolerant separators either.
In [16], =-halvers and separators are defined in terms of
their functionality. The procedure given for building an
=-halver from an expander and for building a separator from
=-halvers represents only one of many possible construc-
tions. In this paper, we will be interested in the specific con-
struction given in [16], although the construction will
likely fail to have the properties needed to be =-halvers or
separators once faults are introduced.
Throughout the paper, we choose , as a sufficiently large
constant, and we choose
*$= 18 . (7)
Note that *$ is strictly smaller than *, but the two
parameters can be arbitrarily close if =$ is arbitrarily close
to 0.
We define a ,-divider with m inputs to be a circuit con-
structed by using an (m2)_(m2) d-regular bipartite
(1(,+1), ,)-expander to connect the top half and the bot-
tom half of the m registers in the same fashion as we used to
construct the =-halver in Section 2.1.4
Comparing our ,-divider with an =-halver, we see that ,
plays a role similar to that of 1= in [16]. In [16], = was set
to a particular (small) value. In this paper, however, we will
only show the existence of a sufficiently large constant , that
will ensure certain fault-tolerance properties of the AKS cir-
cuit. Such a constant , appears to be much larger than the
value of 1= that was chosen in [16].
Given the construction of a divider, an m-input (*$, ,)-
partitioner at an AKS tree node X is then constructed by
applying ,-dividers in at most 1+log(1*$) rounds. We first
apply an m-input ,-divider to all m registers. Then, we apply
an (m2)-input ,-divider to the top m2 registers and
another (m2)-input ,-divider to the bottom m2 registers.
Next, we apply an (m4)-input ,-divider to the top m4
registers and another (m4)-input ,-divider to the bottom
m4 registers (we do not do anything to the m2 ‘‘middle’’
registers). We then apply another (m8)-input ,-divider to
the top m8 registers and another m8-input ,-divider to the
bottom m8 registers. We keep doing this until we have
applied a divider to a group with at most *$ cap(X )
registers. Altogether, we apply the dividers for at most
1+log(1*$) rounds. For ease of notation, we will refer to
the dividers applied in the i th round of a partitioner as the
ith-round dividers. For i>1, there are two ith-round
dividers, and we refer to the divider applied to the top (resp.,
bottom) m2 i&1 registers as the top (resp., bottom) ith-round
divider. We make the convention that the only divider in the
first round of a partitioner can be treated as both a top and
a bottom divider in that round.
When X is partially full, our construction of a partitioner
differs slightly from that of a separator in [16]: (i) the num-
ber of rounds of dividers used in our partitioner depends on
cap(X ), and may be less than 1+log(1*$); (ii) we do not
use the so-called ‘‘virtual elements’’ which were used in
Paterson’s separator construction to fill X to its capacity so
that the number of rounds of halvers in the separator is
always 1+log(1*$).
In place of Paterson’s use of ‘‘virtual elements,’’ we prove
the following simple fact that will be useful to analyze the
behavior of partially full AKS tree nodes.
Claim 2.2. Each of the dividers in a partitioner
associated with an AKS tree node X has at least
min[ |X|, (*$2) cap(X)] inputs.
Proof. If |X|*$ cap(X), then the partitioner consists
of exactly one divider with |X| inputs. If |X|>*$ cap(X ),
then the number of inputs to either of the last round dividers
is at least (*$2) cap(X ). K
Let FL (resp., FR) consist of all registers output to the
top (resp., bottom) half of the top (resp., bottom) divider
in the last round of the partitioner associated with X.
By the partitioner construction, we immediately have
|FL|min[(*$2) cap(X ), |X|2]. Hence, by Eqs. (4), (6),
and (7),
|FL||FL|. (8)
Let L (resp., R) denote the set of registers output to the
top (resp., bottom) half of the partitioner associated with X.
We now partition the items at X into FL, CL, CR, and FR
as follows: (i) from F (resp., R), move all of the items in FL
(resp., FR) into FL (resp., FR); (ii) move an arbitrary subset
of items remaining in L (resp., R) to FL (resp., FR) so that
FL (resp., FR) satisfies Eq. (4); (iii) let the items remaining
in L (resp., R) form CL (resp., CR).
Although our partitioner (resp., divider) construction is
(almost) the same as the separator (resp., halver) construc-
tion of [16], a partitioner (resp., divider) is conceptually
different from a separator (resp., halver) in that a separator
(resp., halver) is defined based on its inputoutput behavior
and a partitioner (resp., divider) is explicitly constructed
from bipartite expanders. Of course, a fault-free partitioner
(resp., divider) is one type of separator (resp., halver).
In the preceding description of a partitioner, we have
treated all of the constituent dividers as having an even
number of inputs. This may be a problem if the number of
271BREAKING THE 3(n log2 n) BARRIER
4 In the AKS tree, we imagine that small (resp., large) items should be
sent to the left (resp., right). In a circuit (such as a divider or partitioner),
we imagine that small (resp., large) items should be sent to the top (resp.,
bottom). Thus, top (resp., bottom) in a circuit corresponds to left (resp.,
right) in the AKS tree.
File: 571J 147008 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6477 Signs: 5502 . Length: 56 pic 0 pts, 236 mm
inputs to the partitioner is not divisible by 16. We will show
how to get around this technical problem in the next part of
Section 2.2.1.
Integer rounding. Thus far, in the description of register
movement, we have specified the ‘‘ideal’’ sizes of various sets
of registers as real numbers. In reality, we need to choose
such sizes as integers. This can be accomplished by the sim-
ple technique introduced in [16] as follows.
We first let each ideally empty AKS tree node be actually
empty. Next, we consider an arbitrary nonempty AKS tree
node X. Let TX be the subtree rooted at X. If the ideal number
of items contained in TX is #, then we let the actual number of
items in TX be 16W#16X. Let #0 denote the ideal number
of items contained in each subtree rooted at a grandchild of
X, and let x and Z(x) denote the ideal and actual numbers
of items contained in X, respectively. Then, we have
Z(x)=16  #16 |&64  #016 | . (9)
Given that x=#&4#0 , we thus have
&64Z(x)&x16. (10)
As we will see in Section 2.2.2, cap(X)- n will hold in our
applications of the AKS circuit. Thus, by Claim 2.1, Eq. (10)
implies Z(x)0 if X is not a leaf. In addition, it is easy to
check that Z(x)0 when X is a leaf. Therefore, we conclude
that we can indeed maintain the relationship of Eq. (9)
without letting X contain a negative number of registers.
We point out that the coefficients 16 and 64 appearing
in the preceding paragraph are chosen differently than in
[16]. In particular, we have defined Z(x) to be a multiple of
16. This ensures that each of the dividers in our partitioner
construction has an even number of inputs, as desired. This
should be compared with Paterson’s use of the so-called
‘‘virtual elements’’ in his separator construction when the
number of inputs to the separator is not divisible by 16 [16,
pp. 8384].
Finally, we remark that the capacities of the AKS tree
nodes are untouched in our integer rounding scheme, i.e.,
they remain fractional. Thus, an AKS tree node X may be
slightly overloaded up to an additive constant, due to the
effect of integer rounding. Similarly, the third and fourth
properties of Claim 2.1 only hold up to an additive constant.
Most importantly, however, Inequality (8) (which will be
important in our analysis of the AKS circuit) still holds after
the integer rounding. This can be easily checked by using
the following facts: (i) *&*$ is a positive constant, (ii)
cap(X)- n for all AKS tree nodes X in our applications
of the AKS circuit (see Section 2.2.2).
2.2.2. The l-AKS Circuit
If we were only interested in passive-fault-tolerant sorting
circuits, the (slightly) modified AKS circuit described in
Section 2.2.1 would be sufficient. However, to construct
reversal-fault-tolerant circuits and networks, we need to
further modify the AKS circuit into an l-AKS circuit, by
using oddeven transposition circuits as described below.
A 2l-input oddeven transposition circuit consists of com-
parators between registers 2i&1 and 2i (1i1) at odd
levels, and between registers 2i and 2i+1 (1il ) at even
levels, where the registers in the circuit are labeled from top
to bottom with 1, 2, ..., 2l. It is well known that such a
circuit of 2l levels sorts all 2l-input permutations in the
fault-free case (for example, see [9]). In this paper, we are
interested in fault-tolerance properties of various circuits
and will use oddeven transposition circuits with more than
2l levels.
For any given integer l>0, we use the following general
technique to modify any family of circuits F into another
family of circuits F$ with parameter l. In general, an m-
input circuit C$ in F$ is constructed from an (ml )-input
circuit C in F as follows. For each iml, replace the i th
register in C, ri , by a group of l registers, ri1 , ..., ril . This
group will be referred to as a block corresponding to register
ri . Replace each comparator in C that connects registers ri
and rj by a 2l-input and 4l-depth oddeven transposition
circuit that connects ri1 , ..., ril and rj1 , ..., rjl . (The reason for
doing this can be found in Lemma 2.1 and in the proofs of
Lemma 2.2 and Theorem 2.1.) Such a circuit C$ in F$ will
be referred to as the circuit constructed from C in F by
applying 2l-input and 4l-depth oddeven transposition cir-
cuits. In particular, an m-input l-AKS circuit is constructed
from an (ml )-input modified AKS circuit described earlier
by applying 2l-input and 4l-depth oddeven transposition
circuits. For example, a 1-AKS circuit is the modified AKS
circuit described earlier with each of the comparators
replicated four times. This technique is essential to obtain
reversal-fault-tolerance and will be applied again to con-
struct other circuits such as these in Lemma 2.2.
Assume that an l-AKS circuit C$ is constructed from a
modified AKS circuit C by applying 2l-input and 4l-depth
oddeven transposition circuits. In the AKS tree for C, each
node contains a set of registers R. The l-AKS tree for the
l-AKS circuit C$ is constructed from the AKS tree for C with
each register r # R replaced by the block of registers corre-
sponding to r during the construction of C$ from C. The
capacity of an l-AKS tree node X for C$ is defined to be the
maximum number of registers (not the maximum number of
blocks) allowed to be contained in X, which is equal to l
times the capacity of the corresponding AKS tree node
for C.
In all of our applications of the l-AKS circuit in subse-
quent sections, we do not run the l-AKS circuit all the way
272 LEIGHTON, MA, AND PLAXTON
File: 571J 147009 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6077 Signs: 4765 . Length: 56 pic 0 pts, 236 mm
to completion. Instead, we stop an m-input AKS circuit
immediately before the first stage where the capacity of the
root for the l-AKS tree is strictly less than - m. This guaran-
tees that the capacity of any node in the l-AKS tree is at least
- m. For ease of reference, we call such a circuit a partial
l-AKS circuit. Moreover, we only need partial l-AKS circuits
with lm18 (see Theorem 2.1). This means that the corre-
sponding capacity of each AKS tree node (as opposed to an
l-AKS tree node) is at least m38, and we do not have to
worry about the case where the capacity of a certain node is
too small.
2.3. The Main Theorem
In this section, we prove the main theorem of this paper,
which implies that the l-AKS circuit has certain
fault-tolerance properties. We first choose a few more
parameters and introduce some notation that will be useful
in the statement and proof of our main theorem.
As in [16], we choose
+= 136 . (11)
Instead of using the parameter $ as in [16], we use a
parameter _ such that
,=65 \ 4_A&(1&12A)+
2
. (12)
In a certain sense, our parameter _ corresponds to the
parameter 1$ in [16]. As we have said in Section 2.2.1, we
will only prove that a sufficiently large , is good for our pur-
poses, and we will not specify a particular choice of ,. As a
consequence, we will not specify the choice of _ either. For
now, we merely assume that _>1, which can be ensured by
a sufficiently large choice of the constant , in Eq. (12).
As in [16], we assign each m-input l-AKS tree node a
natural interval as follows: the natural interval of the root is
[1, m]; if the natural interval of a node X is [:, ;], then the
natural intervals of the left and right children of X are the
left and right halves of [:, ;], respectively. Intuitively, when
a permutation of [1, ..., m] is input to the l-AKS circuit, the
natural interval of a node represents the range of numbers
that the registers in the node ‘‘should’’ contain. The follow-
ing concepts of content, strangeness, and potential are all
dependent on which permutation is input to the circuit and
which level (time) of the circuit we are interested in, but we
do not include the permutation or time as part of the nota-
tion since we will only focus on a fixed input permutation in
the proof of Theorem 2.1 and since the time will be clear
from the context whenever we use these concepts. We define
c(r), the content of a register r at time t, as the input con-
tained in r at time t. We also define s(r), the strangeness of
c(r) (or of r at time t), to be the number of levels that c(r)
needs to move upward in the l-AKS tree from c(r)’s current
node to the first node whose natural interval contains c(r).
Equivalently, we say that c(r) (or r at time t) is s(r)-strange.
Given any constant _>1, we define the potential of a
register r as
P_(r)={_
s(r)&1,
0,
if s(r)1,
otherwise.
We define the potential of any set of registers R as
P_(R)= :
r # R
P_(r).
In particular, the potential of a node X in the l-AKS tree is
P_(X )= :
r # X
P_(r).
As far as we know, potential functions have not been
used in any previous analysis of the AKS circuit. A similar
potential function was used in [14] to prove certain
fault-tolerance properties of the multibutterfly circuit for
routing. Unfortunately, our use of the potential function
here is much more complex than that in [14]. The next
theorem provides an upper bound on the number of strange
items and as such is analogous to Inequality (2) in [16].
Recall that, as discussed in Section 2.2, the capacity of an
l-AKS tree node is the number of registers (not the number
of blocks) in the node.
Theorem 2.1. Under both the passive and the reversal
fault models, for any lm18, if , is a sufficiently large con-
stant and \>0 is less than a sufficiently small constant, then
a randomly faulty m-input partial l-AKS circuit satisfies the
following inequality with probability at least 1&\3(l log m):
For all input permutations and all nodes X in the l-AKS tree,
P_(X )+ cap(X ). (13)
In the theorem, \ is assumed to be less than a sufficiently
small constant, say, \0 . The constant \0 and the constant
behind the 3-notation in the theorem are both dependent
on , and _. Most importantly, however, \ is not necessarily
a constant even though \ is upper bounded by the constant
\0 , and the constant behind the 3-notation of the theorem
is independent of \. It should be mentioned that \ will be
quite small in all of our applications of the theorem and its
corollary (see Corollary 2.1).
To prove the theorem, we first need to prove a few lem-
mas. We define a circuit N (possibly containing faulty com-
parators) to be a 2-approximate-sorting circuit if, on all the
possible input permutations, N outputs every item to
within 2 positions of its correct position.
273BREAKING THE 3(n log2 n) BARRIER
File: 571J 147010 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6794 Signs: 5510 . Length: 56 pic 0 pts, 236 mm
Lemma 2.1. Under both the passive and the reversal fault
models, for any constant =>0, when \ is less than a suf-
ficiently small constant (depending on ), a randomly faulty
2l-input, 4l-depth oddeven transposition circuit is a
l-approximate-sorting circuit with probability at least
1&\3(l ).
Proof. We will only present the proof for reversal faults.
The same proof is also valid for passive faults. In fact,
for passive faults, our proof technique can be used to prove
that the oddeven transposition circuit is indeed a
(\, \3(l ))-passive-fault-tolerant sorting circuit rather than a
(\, \3(l))-passive-fault-tolerant l-approximate-sorting cir-
cuit. Throughout the proof, we will assume that \ is less than
a sufficiently small constant depending on . We do not know
if a similar result can be proved for \ near 12 .
Let C be the oddeven transposition circuit described in
Lemma 2.1, and let C* be a randomly generated faulty ver-
sion of C. By the 01 principle, we only need to show that
Pr(_ 01 sequence s such that C* does not l-approximate-
sort s)\3(l ).
Notice that the total number of possible 01 input sequen-
ces to C is at most 22l. Hence, when \ is less than a suf-
ficiently small constant, to prove the above inequality, we
only need to prove that for any fixed 01 input sequence s,
Pr(C* does not l-approximate-sort s)\3(l ). (14)
In what follows, we will prove Inequality (14) for a fixed
s. Assuming that C* does not l-approximate-sort s, we
prove that the behavior of the comparators in C* satisfies a
certain condition that is only satisfied with probability
upper bounded by \3(l ). Without loss of generality, we will
assume that on input sequence s, C* outputs a 0 at least l
away from its correct position. (This assumption only
affects the probability bound in Inequality (14) by at most
a factor of 2.)
Let k be the number of 0s in sequence s. We label the 2l
registers in C as r1 , ..., r2l , from top to bottom We focus on
the positions of the 0s at each level of C* as the 0s and 1s
move forward through C*. As the 0s move forward through
C*, they gradually move upward in the 1s. In particular, a
0 in ri that is correctly compared to a 1 in ri&1 at level t will
move to ri&1 . Intuitively, if most of the comparators
involved work correctly, 0s will move upward as they move
forward. The problem in analyzing the movement of the 0s,
however, is that they can block each other’s upward move-
ment. In particular, if one 0 moves the wrong way, it can
cause a ripple effect much like a multicar collision on a
highway. In the process of randomly generating C* from C,
each of the comparators in C can be faulty with probability
up to \. Hence, there are likely to be many such collisions.
In addition to slowing things down, such collisions also
introduce dependence issues in the analysis of the
probabilistic movement of the 0s.
In order to get around these difficulties, we model the
moves made by the 0s with a k_4l matrix A=(ai, j) of ran-
dom biased coins. In particular, ai, j=H with probability at
least 1&\ and ai, j=T with probability at most \. The coin
at ai, j will be used to determine whether or not the com-
parator entered by the ith 0 at level j is faulty. (We number
the 0s, starting at 1, from the top to the bottom at the outset,
and we never alter the relative order of the 0s in C*.) Note
that if two 0s enter the same comparator, the two associated
coin flips could conflict in determining whether or not the
comparator is faulty. However, we can assume that com-
parisons between two 0s are resolved according to the initial
ordering of the 0s; we do not need to refer to the coin flips
in such a case. Note that matrix A completely determines
the behavior of C* on the fixed 01 sequence s.
If at level t the ith 0 is compared to a 1 above, then the 0
moves upward one position if and only if ai, t=H. If at level
t the ith 0 is compared to a 1 below, then the 0 moves
downward if and only if ai, t=T. If at level t the ith 0 is com-
pared to a 0 above (i.e., if it is blocked from above by the
(i&1)th 0), then the ith 0 stays in the same register, and we
change the value of ai, t to Z without checking to see
whether ai, t=H. If at level t the ith 0 is compared to a 0
below (i.e., if it is blocked from below by the (i+1)th 0),
then the ith 0 stays in the same register and we change
the value of ai, t to Z$ without checking to see whether
ai, t=H.
After these modifications, matrix A now contains some
Zs and Z$s. Call the new matrix A*=(a*i, j). Note that A*
completely determines the functionality of C* on the fixed s
and vice versa. This fact makes it possible for us to prove
Inequality (14) by analyzing A*.
Define tk to be the last level where the kth 0 was blocked
by the (k&1)th 0. In other words, tk is the maximum integer
such that a*k, tk=Z. Next, define tk&1 to be the last level at
which the (k&1)th 0 was blocked by the (k&2)th 0 strictly
before level tk . In other words, tk&1 is the largest integer
such that tk&1<tk and a*k&1, tk&1=Z. Proceeding in a
similar fashion, for j=k&2, k&3, ..., 2, define tj to be the
largest integer such that tj<tj+1 and a*j, tj=Z. (It may be
that a*j, t {Z for all t<tj+1, in which case we set
tj=tj&1= } } } =t1=0.) If t2=0, let t1=0; if t2>0, let t1 be
the largest integer such that t1<t2 and the first 0 is located
at r1 immediately before level t1 (if the first 0 never reaches
r1 strictly before level t2 , then set t1=0).
Let S denote the string of coins
a1, t1+1 a1, t1+2 } } } a1, t2&1a2, t2+2 } } }
a2, t3&1 } } } ak, tk+1ak, tk+2 } } } ak, 4l .
274 LEIGHTON, MA, AND PLAXTON
File: 571J 147011 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5997 Signs: 4402 . Length: 56 pic 0 pts, 236 mm
Let nH denote the number of heads in S and nT denote the
number of tails in S. It is easy to see that S contains
4l&k&t1+1 coins, which implies that
nT+nH=4l&t1&k+1. (15)
Roughly speaking, the number of upward moves of the k th
0 is given by nT&nH . However, this bound is not accurate
because of boundary effects (i.e., caused by the 0s piling up
against r1). To be more precise, we analyze the movement of
the 0s by considering two cases.
Case 1: tj>0 for all j such that 1 jk. In this case, the
first 0 is received at r1 immediately before level t1 . On the
other hand, the total number of downward moves corres-
ponding to S is at most nT . Hence, at the end of C*, the k th
0 is at most nT positions away from rk . By our assumption
that a 0 is output at least l away from its correct position,
the k th 0 is output at least l away from its correct position
rk . Hence,
nTl.
Case 2: tj=0 for some j such that 1 jk. By definition,
t1=0. (16)
In this case, when analyzing the upward moves of 0s corre-
sponding to S, there is no boundary effect to consider.
Therefore, the number of upward moves is given by
nH&nT . Since the k th 0 can initially be at most 2l&k posi-
tions away from rk and since the k th 0 is output at least =l
away from its correct position rk , we conclude that
2l&k&nH+nTl.
Adding this inequality to Eq. (15) and using Eq. (16), we
find that
nT
(2+=)
2
ll,
where we have assumed that 2 since there is nothing to
prove for >2.
In both Case 1 and Case 2, we have proven that
nTl. (17)
We next show that for a random matrix A, the probability
that A* contains a sequence S such that Inequality (17)
holds is at most \3(l ).
Let us define ai, j to be next to au, v in A if and only if: (i)
i is equal to u or u+1; (ii) j is equal to v or v+1. According
to the construction of S, the second element of S is next to
the first element of S in A, the third element of S is next to
the second element of S in A, and so on. Hence, when the
location of the i th element of S is given in A, there are at
most three ways for the (i+1)th element in S to be located
in A. In addition, the number of ways that the first element
of S is located in A is upper bounded by k } 4l2l } 4l8l2.
On the other hand, |S|4l. Hence, the number of ways of
choosing the location of S in A is at most
8 l2 } 34l. (18)
By a standard Chernoff bound argument [3], when the
location of S in A is given, the probability that Inequality
(17) holds is at most
\3(l ) (19)
for \ less than a sufficiently small constant (depending on
). Multiplying the bounds of Inequalities (18) and (19),
and setting \ to less than a sufficiently small constant, we
find that Inequality (17) holds with probability at most
\3(l ). This completes the proof of Inequality (14), as well as
the proof of Lemma 2.1. K
In the next lemma, the circuit N is the parallel union of
s disjoint circuits, N1 , ..., Ns . Each Ni is constructed from a
,-divider by replacing each register with a block of l
registers, and each comparator with a 2l-input 4l-depth
oddeven transposition circuit. By definition, each of the ,-
dividers is constructed from a d-regular bipartite expander,
and thus has depth d, which is a constant depending on ,.
Hence, the depth of each Ni is 4dl, as is the depth of N. A
block or a register will be called a bottom (top) block or
register in N if it is in the bottom (top) half of some Ni . For
a set of bottom (top) registers R, we use N(R) to denote the
set of top (bottom) registers that are connected to at least
one register in R by some oddeven transposition circuit. In
the next lemma, ni denotes the number of inputs to Ni , and
n denotes the number of inputs to N.
Lemma 2.2. Let R be a fixed set of bottom (resp., top)
registers and b be the number of blocks that contain at least
one register in R. Under both the passive and the reversal fault
models, when , is large enough, a randomly faulty version of
N has the following property with probability at least
1&\3(bl+|R| ): On all input permutations such that each Ni
contains at most 49100ni inputs with rank at most k
49
100n
(resp., at least k 51100n), if every register in R contains an out-
put with rank at most k (resp., at least k), then at least
(,4) |R| registers in N(R) contain outputs with rank at most
k (resp., at least k).
Note that in the 1&\3(bl+|R| ) lower bound for the success
probability claimed in the lemma, we could omit the |R|
275BREAKING THE 3(n log2 n) BARRIER
File: 571J 147012 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6182 Signs: 4878 . Length: 56 pic 0 pts, 236 mm
term without affecting the meaning of the lemma since
bl|R|. However we have chosen to include the |R| term
for ease of future applications.
Proof of Lemma 2.2. The two claims of the lemma are
symmetric, and so we consider only the case where R is a set
of bottom registers. We make use of the 01 principle. Sup-
pose that each comparator is randomly set to be faulty (with
probability \ or less) ahead of time. We will show that with
probability at least 1&\3(bl ), the resulting circuit has the
following property: On all 01 input sequences with exactly
k 0s such that each Ni contains at most 49100ni 0-inputs, if all
of the registers in R contain a 0-output, then at least
(,4) |R| registers in N(R) contain a 0-output.
Let
%=1d,. (20)
We focus on the bottom blocks containing at least one
register in R. We say that such a block is dense if it contains
strictly more than d%l registers in R; we say that such a
block is sparse if it contains at least one but at most d%l
registers in R. Note that when d%l<1, there is no sparse
block. Let bi1 be the number of dense blocks in Ni , and let
bi2 be the number of sparse blocks in Ni . We call a top (bot-
tom) block B$ a neighboring block of a bottom (top) block
B if there is an oddeven transposition circuit connecting B
and B$ in N. Note that each block has exactly d neighbor-
ing blocks because each of the corresponding ,-dividers has
depth d. We call a block B good if all of the (no more than
d+d2) oddeven transposition circuits associated with B or
associated with any of the d neighboring blocks of B are %l-
approximate-sorting circuits. Let b$i1 be the number of good
dense blocks in Ni , and b$i2 be the number of good sparse
blocks in Ni .
When a good block B is connected to a top block B$ by
a %l-approximate-sorting circuit M, the correctness of the
following two simple observations is straightforward:
Observation 1. At most %l 0s can be moved from B$ to
B through M.
Observation 2. If B contains at least one 0 at the end of
M, then B$ contains at least (1&%) l 0s at the end of M.
The goal of our proof of Lemma 2.2 is to find a large num-
ber of 0s in N(R) in comparison with the number of 0s in R.
This goal will be achieved as follows. From Observation 1
above, each good dense block contains 0s throughout the
circuit. Therefore, we can hope to use the expansion
property of the ,-divider to find many 0s in the neighboring
blocks of a good dense block. From Observation 2, for each
good sparse block, its unique neighboring block at the end
of N contains many 0s, compared with the number of 0s
contained in the sparse block. In particular, we prove the
lemma by considering the following two cases. (Recall
that s denotes the total number of disjoint parallel sub-
circuits forming N.)
Case 1: 1isbi1(1,) 1isbi2 . By Lemma 2.1,
each dense block is good with probability at least
1&(d+d2) \3(l )=1&\3(l ), provided that \ is less than a
sufficiently small constant (specifically, we can assume \ to
be small compared with d+d2). A standard application of
the Chernoff bound [3] now implies that when \ is less than
a sufficiently small constant, the following inequality holds
with probability at least 1&\3(l 1isbi1):
:
1is
b$i1 23 :
1is
bi1 . (21)
By the assumption of Case 1, we have 1isbi1
(1(,+1)) 1is(bi1+bi2)=b(,+1). Hence, Inequality
(21) holds with probability at least 1&\3(bl)=
1&\3(bl+|R| ) (where the constant behind the 3-notation is
allowed to depend on ,, d, and %). In what follows, we need
only show that at least (,4) |R| registers in N(R) contain a
0 whenever Inequality (21) holds.
Consider any good dense block Bi1 in Ni . Since Bi1 con-
tains more 0-outputs than could possibly come from its d
neighboring blocks through the d%l-approximate-sorting
circuits, Bi1 contains 0s throughout all the levels of Ni .
Hence, by Observation 2, each neighboring block of Bi1
contains at least (1&%) l 0s just after being compared with
Bi1 . Moreover, by Observation 1, each of these neighboring
blocks of Bi1 contains at least (1&%d ) l 0s at the end of N,
since such a block may lose at most (d&1) %l 0s through the
later d&1 or fewer %l-approximate-sorting circuits.
Now, assume for the purposes of contradiction that
b$i1ni2l(,+1) for some i. Then, we can choose
ni 2l(,+1) good dense blocks in Ni . By the expansion
property of the ,-divider, these blocks have at least
,ni2l(,+1) neighboring blocks in the top half of Ni . By
the discussion of the preceding paragraph, each of these
neighboring blocks has at least (1&%d ) l 0-outputs at the
end of Ni . Thus, the number of 0-outputs of Ni is at least
,ni
2l(,+1)
(1&%d ) l=
,&1
2(,+1)
ni , (22)
where we have used Eq. (20). When , is large enough, the
quantity in Eq. (22) is strictly larger than 49100ni , which is
larger than the number of 0-inputs to Ni . This is a con-
tradiction. Hence, we conclude that for all i
b$i1<
ni
2l(,+1)
. (23)
276 LEIGHTON, MA, AND PLAXTON
File: 571J 147013 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5320 Signs: 2735 . Length: 56 pic 0 pts, 236 mm
By Inequality (23) and the fact that each Ni is constructed
from a (1(,+1), ,)-expander, all of the 1isb$i1 good
dense blocks have at least , 1isb$i1 neighboring blocks
of top registers in N(R). By the argument used for deriving
Inequality (22), we know that the number of 0s contained in
these , 1isb$i1 top blocks of registers in N(R) is at least
, 1is b$i1(1&%d ) l
=(,&1) 1is b$i1 l
(by Eq. (20))
 23 (,&1) 1is bi1 l
(by Inequality (21))
 13 (,&1) 1is (bi1 l+(bi2 l ),)
(by the assumption of Case 1)
= 13 (,&1) 1is (bi1 l+bi2d%l )
(by Eq. (20))
((,&1)3) |R|
(by the definitions of bi1 and bi2)
(,4) |R|
(for , sufficiently large).
Case 2: 1isbi1<(1,) 1isbi2 . By Lemma 2.1
and a standard Chernoff bound argument [3], we know
that the following inequality holds with probability at least
1&\3(1is bi2 l )=1&\3(bl+|R| ), provided that \ is suffi-
ciently small:
:
1is
b$i2 23 :
1is
bi2 . (24)
Next, we show that at least (,4) |R| registers in N(R) con-
tain a 0, provided that Inequality (24) holds.
By Observation 2, for each good sparse block, its unique
neighboring block at the end of N contains at least (1&%) l
0s. Moreover, since the dividers are constructed from
d-regular bipartite expanders, different blocks have different
neighboring blocks at the end of N. Hence, the number of
0-outputs contained in N(R) is at least
1is b$i2(1&%) l
 23 1is bi2(1&%) l
(by Inequality (24))
((1&%)3) 1is (bi2 l+bi1 ,l )
(by the assumption of Case 2)
=((1&%) ,3) 1is (bi1 l+bi2 d%l)
(by Eq. (20))
((1&%) ,3) |R|
(by the definitions of bi1 and bi2)
(,4) |R|
(for , sufficiently large). K
Lemma 2.3. Let bi(1) } } } bi( p4) be a subsequence of a
positive nondecreasing sequence b1 } } } bp . Then, there
exists an integer s p8 such that :
:
1 js
bi( j) 18 :
1 ji(s)
bj .
Proof. Take the minimum s such that
i(s)8(s& p8). (25)
(Such an s exists because s= p4 satisfies Inequality (25).)
By the minimality of s,
i(t)>8(t& p8) (26)
for all t such that p8+1ts&1. By the monotonicity of
the sequence b1 } } } bp and Inequality (26),
bi(t) 18 :
8(t& p8&1)<j8(t& p8)
bj (27)
for all t such that p8+1ts&1. By Inequality (25), we
have i(s)&8(s& p8&1)8. Hence,
bi(s) 18 :
8(s& p8&1)< ji(s)
bj . (28)
Adding Inequality (28) with Inequalities (27) (i.e., for all t),
we obtain
:
p8< js
bi( j) 18 :
1 ji(s)
bj ,
which is actually stronger than the claimed inequality. K
Proof of Theorem 2.1. We focus on a particular faulty
partial l-AKS circuit that violates Inequality (13) on a
particular input permutation 6, and we prove that the
faulty circuit has certain properties that can be satisfied by
a randomly generated faulty partial l-AKS circuit with
probability at most \3(l log m).
We choose the first stage t during which Inequality (13)
is not satisfied at a certain node X. By the minimality of t,
we have
P_(Y)+ cap(Y) (29)
277BREAKING THE 3(n log2 n) BARRIER
File: 571J 147014 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5573 Signs: 3598 . Length: 56 pic 0 pts, 236 mm
for any l-AKS tree node Y before stage t. Let Yi denote the
number of i-strange items in node Y. Then, the potential
function at node Y can be written as
P_(Y )= :
k1
:
r # Y, s(r)=k
_k&1= :
k1
Yk _k&1. (30)
Therefore, Inequality (29) can be rewritten as
:
i1
Yi_i&1+ cap(Y).
Thus,
:
i j
Yi_i&1+ cap(Y)
for all j1. Since _>1, the previous inequality implies that
: i j Yi
cap(Y)
\1_+
j&1
+ (31)
for all j1. Inequality (31) gives an upper bound on the
ratio of the number of items at node Y with strangeness j or
more to the capacity of Y, and it will be useful when we
upper bound the number of strange items inductively. (It is
analogous to Inequality (2) in [16].)
On the other hand, by the assumption that P_(X )>
+ cap(X ) and Eq. (30),
:
k1
Xk_k&1>+ cap(X ),
where Xk denotes the number of k-strange items in X.
Therefore, there exists an integer k1 such that
Xk>\12+
k
\1_+
k&1
+ cap(X ). (32)
We choose the minimum k that satisfies Inequality (32) and
analyze how these Xk k-strange items are misplaced into X.
By doing so, we derive some necessary properties of the
faulty l-AKS circuit. Then, we prove that these properties
can be satisfied with probability at most \3(l log m). We will
consider two cases: k=1 and k>1. The case k=1 is the
hard case in [16] and proceeds without much additional
work once we have Lemma 2.2. Unfortunately, the case
k>1 requires much more work than its fault-free counter-
part in [16].
Case 1: k=1. At the beginning of the first stage, all items
are either at the root or the cold storage, and nothing is
strange. Hence, our choice of t guarantees t2. We now
trace back how the X1 1-strange items at node X arrived
there from P, C, and C$, the parent and two children of X,
respectively. It is easy to see that
X1=|[i : i is 1-strange in X, and i came from C or C $]|
+|[i : i is 1-strange in X, and i came from P]|. (33)
Since a 1-strange item in X is 2-strange in either C or C$,
we can upper bound the first term of Eq. (33) by C2+C$2 ,
where C2 and C$2 denote the number of 2-strange items in C
and C$, respectively. By Inequality (31), we have
C2
1
_
+ cap(C)
and
C$2
1
_
+ cap(C$).
Hence, the first term of Eq. (33) is at most
1
_
+ cap(C)+
1
_
+ cap(C$)=
2
_
+ cap(X )
A
&
. (34)
In what follows, we will use Paterson’s argument aided by
Lemma 2.2 to upper bound the second term of Eq. (33).
This is fairly complicated because we have to deal with items
that become strange for the first time (in other words, some
items may not be strange in P but may be strange in X ). An
item can be misplaced into X due to one of the following two
reasons: (i) the first round divider at P may not divide all
the items into the correct halves; (ii) P may contain too
many items that should go to X ’s sibling, X $.
We assume that the number of items that are misplaced
into X due to reason (i) above is equal to
‘ |P|‘
1
2A&
cap(X ). (35)
Here, we use |P|, the number of registers in P, instead of
cap(P), since P may not be full. We will apply Lemma 2.2
where |P|, instead of cap(P), will be a parameter.
We next upper bound the number of items that are mis-
placed into X due to reason (ii) above. Let V be the set of
all items of strangeness 0 with respect to X$ (some of these
items may not be located in node X$). Following the ter-
minology of Paterson, the ‘‘natural’’ positions for V
correspond to the subtree rooted at X$ plus one-half of P,
one-eighth of P’s grandparent, and so on. (We can assume
an infinite chain of ancestors for this argument due to our
use of the cold storage. Note that at stage t&1: (i) X$ is
empty, (ii) by Claim 2.1, the levels of the l-AKS tree are
alternatively empty and full above X$.) Ideally, if all items in
278 LEIGHTON, MA, AND PLAXTON
File: 571J 147015 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5276 Signs: 3151 . Length: 56 pic 0 pts, 236 mm
V are in V ’s ‘‘natural’’ positions, then P cannot contain too
many items that should go to X$. In reality, some of the
items in V may not be in V ’s ‘‘natural’’ positions. In such a
case, some of V ’s ‘‘natural’’ positions are occupied by items
not in V. In particular, it is not hard to see that the number
of items that belong to V but will be forced into X due to the
capacity constraint at X$ is equal to the number of items
that are not in V but are occupying V ’s ‘‘natural’’ positions
not in P. We next upper bound the former quantity by giv-
ing an upper bound for the latter quantity.
Clearly, an item not in V is 2 or more strange in a child
of X$, 4 or more strange in a greatgrandchild of X$, and so
on. Since Inequality (31) holds before we find the first viola-
tion of Inequality (13) at some stage t, a child of X$ can con-
tain at most
+ \1_+
A
&
cap(X )
items not in V, a greatgrandchild of X$ can contain at most
+ \1_+
3 A3
&
cap(X )
items not in V, and so on. Hence, the total number of items
not in V but occupying V ’s ‘‘natural’’ positions (strictly)
below X$ is thus upper bounded by
2+ \1_+
A
&
cap(X )+8+ \1_+
3 A3
&
cap(X )
+32+ \1_+
5 A5
&
cap(X )+ } } }

2+A cap(X )
_&(1&4(A_)2)
. (36)
On the other hand, V ’s ‘‘natural’’ positions strictly above P
may be fully occupied by items not in V, but the number of
such positions is at most
cap(X )
23&A3
+
cap(X )
25&A5
+
cap(X )
27&A7
+ } } } =
cap(X )
&(8A3&2A)
. (37)
By the argument of the preceding paragraph, the number of
items in P that should go to X$ but will be forced into X due
to the capacity constraint is upper bounded by the sum of
the quantities in Eqs. (36) and (37):
2+A cap(X )
&_(1&4(A2_2))
+
cap(X )
&(8A3&2A)
. (38)
Now, adding the quantities in Eqs. (34), (35), and (38),
we obtain an upper bound for X1 in Eq. (33):
X1
2+A cap(X )
_&
+
2+A cap(X )
&_(1&4(A2_2))
+
cap(X )
&(8A3&2A)
+
‘ cap(X )
2A&
.
(We remark that a corresponding formula in [16] contains
a term of + cap(X )&A, which does not appear in our for-
mula. Such a difference occurs since we count the number of
items with strangeness exactly 1, whereas Paterson counted
the number of items with strangeness at least 1.) By
Inequality (32),
X1>
+ cap(X )
2
.
Combining the last two inequalities, we obtain
2+A
_&
+
2+A
&_(1&4(A2_2))
+
1
&(8A3&2A)
+
‘
2&A
>
+
2
. (39)
Hence,
‘>&+A&
4+A2
_ \1+
1
1&(4A2_2)+&
1
4A2&1
. (40)
By choosing , sufficiently large, we can ensure that
4+A2
_ \1+
1
1&(4A2_2)+<
1
576
.
By Eqs. (1) and (11) and Inequality (40),
‘> 43576&
1
576&
1
35>
4
100 .
By the choice of ‘ in Inequality (35), the preceding
inequality implies that the partitioner at P outputs at least
4
100 |P| of the items into the wrong half. Hence, at the end of
the first round divider at P, at least 4100 |P| items are output
into the wrong half. Without loss of generality, we assume
that at the end of the first round divider, at least 4100 |P|
items that belong to the top half are output to the bottom
half. Among the 4100 |P| or more items that belong to the top
half but are output to the bottom half, at most 1100 |P| have
ranks greater than 49100 |P|, and all of the other at least
4
100 |P|&
1
100 |P||P|100 items have ranks
49
100 |P| or less.
Let us define a pair (P, R) to be a bad pair if R is a set of
279BREAKING THE 3(n log2 n) BARRIER
File: 571J 147016 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5979 Signs: 4485 . Length: 56 pic 0 pts, 236 mm
bottom registers at the end of the first round divider at P such
that |R|=|P|100 and R contains output items with ranks
49
100 |P| or less at the end of the first round divider at P. The
above arguments show that the faulty partial l-AKS circuit
violating Inequality (13) on input permutation 6 has a bad
pair (P, R). So to complete our analysis of Case 1, we need
only show that the probability there exists a bad pair is at
most \3(l log m).
For any fixed pair (P, R) such that |R|=|P|100, by
Lemma 2.2 with s=1, k= 49100 |P|, and , sufficiently large so
that |R|+(,4) |R|>k,
Pr((P, R) is a bad pair)\3( |R| )=\3(|P| ). (41)
Hence,
Pr(_ a bad pair (P, R))
P R/P, |R|=|P|100 Pr((P, R) is a bad pair)
P ( |P||P|100) \
3( |P| )
(by Inequality (41))
P 2|P|\3( |P| )
(since ( xy)2
x)
P \3( |P| )
(for \ sufficiently small)
O(m log m) \3( |P| ), (42)
where the last inequality holds because there are at most
O(m log m) l-AKS tree nodes. On the other hand, since
priority is given to the upward movement of registers in any
l-AKS tree node and since X is not empty at stage t, P con-
tains at least * cap(P) registers at stage t&1. Therefore,
|P|* cap(P)* - m (since any node in the l-AKS tree for
the partial l-AKS circuit has capacity at least - m). Hence,
O(m log m) \3( |P| )O(m log m) \3(* - m)\3(l log m), (43)
where the last inequality holds due to the assumption that
lm18 and the fact that * is a constant. Combining
Inequalities (42) and (43) completes the proof for Case 1.
Case 2: k>1. This case is much more complicated than
the preceding one. The source of the difficulty is that all
previously known analyses of the AKS circuit rely on the
expansion property for arbitrarily small sets of registers
and, as discussed in Section 2.1, such an expansion property
cannot be preserved with high probability in the presence of
faults. To get around this problem, we will trace back k&1
stages to see how the Xk k-strange items in X are eventually
moved into X. Strange items can come either from below or
above in the tree. In the former case, the items would be
more strange one stage earlier, and we can apply Inequality
(31). In the latter case, if a good number of the comparators
associated with the items work correctly, we will get a cer-
tain expansion property. Our hope is to show that even
under the loss of the local expansion property for possibly
many small sets of registers, with high probability the circuit
maintains a certain global expansion property that ensures
its functionality as a sorting circuit.
Since we have k-strange items in the l-AKS tree at the
beginning of stage t, we have tk+1 (otherwise all the
nodes with depth more than k in the l-AKS tree would be
empty at stage t and no item could be k-strange). In what
follows, we will trace backward k&1 stages and see how
these Xk k-strange items are misplaced into X. We will
inductively define a sequence of sets Rt , ..., Rt&k+1 such
that each Rt&i is a set of registers at the beginning of stage
t&i with strangeness at least k&i. For ease of notation, let
c(R) be the set of all the items contained in R for any set of
registers R, and let
rt&i=|Rt&i |. (44)
Base step. Take Rt as the set of all Xk registers at the
beginning of stage t that are k-strange in X. By definition, all
registers in Rt have strangeness at least k.
Inductive step. Assuming that Rt&i has been defined
for some 0i<k&1, we now define Rt&i&1. For each
register r in an l-AKS tree node X, c(r) may come from
either the parent of X or a child of X. In the former case, we
say that c(r) comes from above; in the latter case, we say that
c(r) comes from below. Let
:t&i=
|[items in c(Rt&i) that come from above]|
rt&i
,
where rt&i is defined in Eq. (44). Given Rt&i , a set of
registers with strangeness at least k&i at the beginning of
stage t&i, we construct a set of registers Rt&i&1 at the
beginning of stage t&i&1 as follows:
Case (a): :t&i 12 (at most half of the items in Rt&i
come from above). We simply choose Rt&i&1 as the set of
all rt&i registers at the beginning of stage t&i&1 that con-
tain an item in c(Rt&i).
Case (b): :t&i> 12 (more than half of the items in Rt&i
come from above). We focus on the :t&i rt&i items in
c(Rt&i) that come from above. By the induction hypothesis,
each of these :t&i rt&i items is at least (k&i)-strange in a
register of Rt&i . Hence, each is either too small or too large
for the node in which it currently resides. Without loss of
generality, we assume that more than half of the items are
too small for the nodes in which they currently reside. Let
280 LEIGHTON, MA, AND PLAXTON
File: 571J 147017 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5666 Signs: 3630 . Length: 56 pic 0 pts, 236 mm
ranki be the maximum of these small items, and let c(r) be
any one of these small items. Let W be the AKS-tree node
that contains c(r) at stage t&i&1. By the choice of c(r), c(r)
is moved from W to a child of W from stage t&i&1 to stage
t&i. Hence, we immediately have
|W|* cap(W). (45)
By Inequality (8), at the end of stage t&i&1, c(r) cannot
possibly be located in the regions FL or FR of W. In other
words, at the end of stage t&i&1, c(r) is located in the bot-
tom half of the top ith round divider at W for some
i1+log(1*$)=4.
We choose the smallest h4 such that at least 18 :t&i rt&i
of such c(r)’s are located in the bottom halves of the top h th
round dividers at W ’s, each of which corresponds to a c(r).
(Note that different c(r)’s correspond to different W ’s and
that there are at least 12 :t&i rt&i such c(r)’s.) We trace back
where these 18 :t&i rt&i or more items are located at the end
of the corresponding top hth round dividers. Let Ut&i be the
set of registers containing these items at the end of the
corresponding top h th round dividers. Let u$t&i be the num-
ber of registers in N(Ut&i) (within the top h th round
dividers at the corresponding W ’s) that contain an item less
than or equal to ranki (the meaning of N(R), where R
denotes a set of registers in the bottom halves of a collection
of dividers, can be found immediately before Lemma 2.2).
Note that such registers are at least (k&i&1)-strange. If
u$t&irt&i , let Rt&i&1 be any set of registers at the begin-
ning of stage t&i&1 that contain min[,, wu$t&i rt&i x] rt&i
of the u$t&i items. If u$t&i<rt&i , then we abandon everything
that has been established in Case (b) and simply use the
method of Case (a) to choose Rt&i&1 .
Claim 2.3. If D is the hth round divider at an AKS tree
node W and D contains at least one register in Ut&i , then D
has at least (*$2) cap(W) registers.
Proof. Straightforward from Inequality (45) and
Claim 2.2. K
By the induction hypothesis, the items in c(Rt&i) are at
least (k&i)-strange in the registers in Rt&i . Since each
register in Rt&i&1 is at most one level higher in the l-AKS
tree than its corresponding register in Rt&i , all of the
registers in Rt&i&1 are at least (k&i&1)-strange. This
finishes our inductive construction of Rt&i&1 from Rt&i .
For 0ik&2, let
,t&i=rt&i&1 rt&i . (46)
Claim 2.4. For 0ik&2, ,t&i is an integer in [1, ,].
Proof. Straightforward from Eq. (46) and the construc-
tion of Rt&i&1 from Rt&i . K
For 0ik&2, the strangeness of any item in Rt&i&1 is
at least its strangeness in Rt&i minus 1. So by counting all
of the items from above or below, we have
P_(Rt&i&1),t&i_&1P_(Rt&i)
for all i, 0ik&2. If :t&i 12, then, by the fact that the
items from below should be one more strange in Rt&i&1
than in Rt&i ,
P_(Rt&i&1)
_
2
P_(Rt&i)
for all i such that 0ik&2 and :t&i 12. From the
preceding pair of inequalities, we have
P_(Rt&k+1)
P_(Rt) ‘
0ik&2, :t&i12
\_2+ ‘0ik&2, :t&i>12 \
,t&i
_ +
+ cap(X ) ‘
0ik&2, :t&i12
\_2+ ‘0ik&2, :t&i>12 \
,t&i
_ + .
(47)
Since we start from a set of registers in node X and we move
at most one level upward and one level downward in the
l-AKS tree when constructing Rt&i&1 from Rt&i , Rt&k+1 is
located within k&1 levels of X. Therefore, by Eq. (3), the
total capacity of all nodes that can possibly contain registers
in Rt&k+1 is upper bounded by
cap(X ) &&(k&1)[(2A)k&1+(2A)k&2+ } } }
+A+1+A&1+ } } } +A&(k&2)+A&(k&1)]
<cap(X ) \2A& +
k&1 1
1&12A
. (48)
By Inequality (48) and the fact that Inequality (13) is not
yet violated at stage t&k+1,
P_(Rt&k+1)+ cap(X ) \2A& +
k&1 1
1&12A
. (49)
Combining Inequalities (47) and (49), we get
‘
0ik&2, :t&i12
\_2+ ‘0ik&2, :t&i>12 \
,t&i
_ +
\2A& +
k&1 1
1&12A
.
281BREAKING THE 3(n log2 n) BARRIER
File: 571J 147018 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5520 Signs: 2741 . Length: 56 pic 0 pts, 236 mm
Hence,
‘
0ik&2, :t&i>12
,t&i\4A& +
k&1 1
1&12A
_x& y, (50)
where x = | [i : 0  i  k & 2, :t & i > 12] | and y=
|[i : 0ik&2, :t&i 12]|. From Eq. (12), the fact
,t&i1 (see Claim 2.4), and Inequality (50), we obtain
\ 4_A&(1&12A)+
2x
=\ ,65+
z
 ‘
0ik&2, :t&i<12
,t&i
\4A& +
k&1 1
1&12A
_x& y, (51)
where z=|[i : 0ik&2, :t&i> 12, ,t&i,65]|. Let ; be
a constant such that
4A
&(1&12A)
=_;. (52)
From Inequality (51) and the fact that k2, we have
_(;+1) 2z_(x& y)+;(k&1).
Therefore,
z f (;)=
;(k&1)+(x& y)
2(1+;)
. (53)
By simple calculus, we have
f $(;)=
(k&1)&(x& y)
2(1+;)2
0.
Thus, f (;) is monotonically increasing. According to
Eq. (52), we can enforce ; 12 by letting _ be large. There-
fore, by Inequality (53),
z f \12+=
(12)(k&1)+x& y
3

x& y
2
+
k&1
4
,
where the last inequality holds since, for _ sufficiently large,
Inequality (50) (together with Claim 2.4) implies
x& y&(k&1)2. Thus,
} {i : 0ik&2, :t&i>12, ,t&i<
,
65= }
=x&zx&\x& y2 +
k&1
4 +=
k&1
4
, (54)
where the last equality holds, since x+ y=k&1 by defini-
tion. Recall that for any i such that 0ik&2 and
:t&i> 12 , we have used Ut&i as an intermediate group of
registers in constructing Rt&i&1 from Rt&i . We define Ut&i
to be bad if :t&i> 12 and ,t&i<,65. Inequality (54) implies
that the number of bad Ut&i is at least (k&1)4.
Let i(1)< } } } <i(q) be the increasing sequence of all
integers i such that 0ik&2 and Ut&i is bad. We have
q 14 (k&1). Applying Lemma 2.3 to the sequences
rtrt&1 } } } rt&(k&2) ;
and
rt&i(1)rt&i(2) } } } rt&i(q) ,
we know that there exists an integer s(k&1)8 such that
:
1 js
rt&i( j) 18 :
0 ji(s)
rt& j . (55)
We will finish our analysis of Case 2 by proving that the
probability there exists such a sequence of bad sets (which
will be referred to as a bad sequence hereafter)
Ut&i(1) , ..., Ut&i(s) is small. For 1 js, let Bj be the set of
blocks that contain at least one register in Ut&i( j) ,
uj=|Ut&i( j) |, and bj=|Bj |, i.e., the number of blocks that
contain at least one register in Ut&i( j) .
Claim 2.5. For a given sequence Ut&i(1) , Ut&i(2) , ...,
Ut&i(s) ,
Pr(Ut&i(1) , Ut&i(2) , ..., Ut&i(s) is a bad sequence)
\3(1js lbj).
Proof. If Ut&i( j) is bad, then ,t&i( j),65 by definition.
Hence
\u$t&i( j)rt&i( j)<
,
65
, (56)
where u$t&i( j) is defined in the construction of Rt&i( j)&1 from
Rt&i( j) . For a sufficiently large choice of the constant , (so
that ,65+1<,64), Inequality (56) implies
u$t&i( j)
rt&i( j)
<
,
64
. (57)
According to our inductive construction of Rt&i( j)&1 from
Rt&i( j) and the fact :t&i( j)> 12 (since Ut&i( j) is assumed to be
bad), we have |Ut&i( j) | 18 :t&i( j)rt&i( j)rt&i( j) 16. There-
fore, Inequality (57) implies
u$t&i( j)
|Ut&i( j) |
<
,
4
. (58)
Note that Inequality (31) holds at stage t&i( j) and that all
of the items in Ut&i( j) are at least 1-strange. Hence, by Claim
2.3 and the fact 2+*$<0.48, Ut&i( j) satisfies the condition
282 LEIGHTON, MA, AND PLAXTON
File: 571J 147019 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5913 Signs: 3778 . Length: 56 pic 0 pts, 236 mm
on R in Lemma 2.2. Note that we have chosen to compare
2mu*$ with 0.48 (as opposed to 0.49) so that Lemma 2.2
can be applied, even if the effect of integer rounding is taken
into account (recall that all of the capacities under con-
sideration are at least - m). Thus, by applying Lemma 2.2
to all of the dividers associated with Ut&i( j) , the probability
that Inequality (58) holds is at most \3(lbj). Hence, we have
Pr(Ut&i( j) is bad)\3(lbj).
The claim now follows from the independence of the
Ut&i( j) ’s, 1 js. K
In the next few claims, we upper bound the number of
possible ways of choosing the sequence Ut&i(1) , ..., Ut&i(s)
and show that even after this number is taken into account,
the probability that a faulty l-AKS circuit contains a bad
sequence Ut&i(1) , ..., Ut&i(s) is very small.
Claim 2.6. For fixed k, the number of ways of choosing
the sequence rt , rt&1 , ..., rt&k+1 is at most m,k&1.
Proof. The number of ways of choosing rt is at most m.
By Claim 2.4, when rt&i is given, the number of ways for
choosing rt&i&1 is at most ,. Overall, the number of ways
of choosing rt , rt&1, ..., rt&k+1 is at most m,k&1. K
Claim 2.7. When cap(X ) and rt are both given, the num-
ber of ways for choosing Rt is at most
O(- m log m) \cap(X )rt + .
Proof. Since the partial l-AKS circuit is run for
O(log m) stages and the l-AKS tree has at most
(mcap(root))- m nonempty nodes at each stage, the
total number of ways of choosing node X is upper bounded
by
O(- m log m).
When X is given, the number of ways of choosing Rt , a set
of registers contained in X, is at most
\cap(X )rt + .
Multiplying the quantities in the last two formulae, we
obtain the desired upper bound. K
Claim 2.8. When the sequence rt&i(1) , ..., rt&i(s) is given,
the number of ways of choosing the sequence b1 , ..., bs is at
most
O \ ‘
1 js
rt&i( j) + .
Proof. This is because bjrt&i( j) for each 1 js. K
Claim 2.9. If Rt and the sequences rt ,
rt&1 , ..., rt&i(s) , i(1), i(2), ..., i(s), and b1 , ..., bs are given, then
the number of ways of choosing the sequence B1 , ..., Bs is at
most
2O(1js lbj).
Proof. Let B t&i be the set of blocks that contain at least
one register in Rt&i for i=0, 1, ..., i(s)+1. We first upper
bound the number of possible sequences
B t , B t&1, ..., B t&i(s)&1 .
(Note that by the choice of s stated before Inequality (55),
i(s)k&2 and, hence, B t&i(s)&1 is well-defined.) Clearly,
B t is fixed when Rt is given. We next count the number of
ways for choosing B t&i&1 when B t&i is given. Each block in
B t&i&1 is connected by a (t&i&1)th stage partitioner to
some block in B t&i . Since each block is connected to at
most d blocks by a divider, it is connected to at most d4
blocks by a partitioner. On the other hand, there are at most
rt&i blocks in B t&i . Therefore, when B t&i is given, the num-
ber of ways of choosing B t&i&1 is at most
\d
4rt&i
rt&i&1+=\
d4rt&i
,t&i rt&i+\
ed4
,t&i+
,t&i rt& i
2O(rt&i) for i=0, ..., i(s), (59)
where the constant behind the O-notation is dependent on
, and d but not dependent on \.
On the other hand, Bj corresponds to Ut&i( j) , and
B t&i( j)&1 corresponds to Rt&i( j)&1 . Moreover, Rt&i( j)&1
and Ut&i( j) correspond to the same stage of the AKS tree.
Hence, Bj and B t&i( j)&1 correspond to registers within the
same stage. Therefore, by an argument similar to the
preceding paragraph, we know that when B t&i( j)&1 is given,
the number of ways of choosing Bj is at most
2O(rt&i(j)&1)=2O(rt&i(j)) for j=1, ..., s, (60)
where we have used Claim 2.4. Multiplying all the quantities
(i.e., for i=1, ..., i(s)) in Inequality (59) with all the quan-
tities (i.e., for j=1, ..., s) in Eq. (60), we obtain an upper
bound on the number of ways to choose the sequence
B1 , ..., Bs under the assumption of the claim:
2O(0 ji(s) rt&j) } 2O(1js rt&i( j))
=2O(1js rt&i( j)) (by Inequality (55))
=2O(1js lbj) (since rt&i( j)lbj). K
283BREAKING THE 3(n log2 n) BARRIER
File: 571J 147020 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5130 Signs: 2518 . Length: 56 pic 0 pts, 236 mm
Claim 2.10. If the sequence B1 , ..., Bs is given, then the
probability there exists a bad sequence Ut&i(1) , ..., Ut&i(s) is
at most
\3(1js lbj). (61)
Proof. For each j such that 1 js, there are bj blocks
of size l in Bj . Hence, when Bj is given, the number of ways
to select the items in Ut&i( j) is at most 2lbj. Thus, the number
of ways to select the sequence Ut&i(1) , ..., Ut&i(s) is upper
bounded by
2l 1js bj.
Therefore,
Pr(_ a bad sequence Ut&i(1) , ..., Ut&i(s) given B1 , ..., Bs)
21js lbj } Pr(a fixed sequence Ut&i(1) , ..., Ut&i(s) is bad)
21js lbj\3(1js lbj)
\3(1js lbj),
where the second inequality follows by Claim 2.5, and the
last inequality holds for \ sufficiently small. K
Claim 2.11. Let cap(X ), k, and sequences rt , ..., rt&k+1
and i(1), ..., i(s) be given. Then
Pr(_ a bad sequence Ut&i(1) , ..., Ut&i(s))\3(l log m).
Proof. Claims 2.7 to 2.10 imply that for given cap(X ), k,
and sequences rt , ..., rt&k+1 and i(1), ..., i(s),
Pr(_ a bad sequence Ut&i(1) , ..., Ut&i(s))
O(- m log m) \cap(X )rt + O \ ‘1 js rt&i( j)+
_2O(1js lbj )\3(1js lbj )
O(- m log m) \cap(X )rt + \3(1js lbj ), (62)
where the last inequality follows since lbjrt&i( j) and \ can
be chosen sufficiently small.
To prove the claim, we need to show that the quantity in
Eq. (62) is at most \3(l log m). Since s(k&1)8 and
lbjrt&i( j)rt for each j such that 1 js, it suffices to
show that, for \ sufficiently small,
O(- m log m) \cap(X ) ert +
rt
\3((rt+l )(k&1))\3(l log m).
We do this by proving
\cap(X ) ert \k&1+
rt+l
\3(l log m). (63)
Let C0 be a constant such that
\12+
C0 log m+1
\1_+
C0 log m
+ - m=m14.
We establish Inequality (63) by considering the following
two cases:
Case (a): k&1<C0 log m. By Inequality (32) and the
fact that cap(X )- m, we have
rt=Xk>\12+
k
\1_+
k&1
+ cap(X )
\12+
C0 log m+1
\1_+
C0 log m
+ - m=m14.
Thus, rt+lm14l log m since lm18. Therefore, in
order to show Inequality (63), we need only show that for
\ sufficiently small,
cap(X ) e
rt
\k&1<\3(1).
For sufficiently small \, the last inequality follows since
k2 and, by Inequality (32), cap(X )rt=cap(X )Xt
(2+)(2_)k&1.
Case (b): k&1C0 log m. By Claim 2.1, it is easy to
see that each nonempty node has capacity at most O(m).
(This is true even after the integer rounding.) Hence, by the
fact that node X is not empty at stage t, we have
cap(X )=O(m). Therefore, for \ sufficiently small,
cap(X ) e
rt
\k&1O(m) e\C0 log m\3(log m),
establishing Inequality (63). K
Continuing with the proof of Theorem 2.1, the number of
choices for k is at most
O(log m),
and (as argued in Case (b) of Claim 2.11) the number of
ways of choosing cap(X ) is at most
O(m).
284 LEIGHTON, MA, AND PLAXTON
File: 571J 147021 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5450 Signs: 3241 . Length: 56 pic 0 pts, 236 mm
By Claim 2.6, when k is given, the number of choices for the
sequence rt , ..., rt&k+1 is at most
m,k&1O(m) 2O(log m)=2O(log m).
Furthermore, when k is given, the number of choices for the
sequence i(1), ..., i(s) is at most
:
(k&1)8sk&1 \
k&1
s +2k&12O(log m),
since (k&1)8sk&1 and i(1)<i(2)< } } } <i(s).
Therefore, the number of choices for k, cap(k), and the
sequences rt , ..., rt&k+1 and i(1), ..., i(s) is at most
O(log m) } O(m) } 2O(log m) } 2O(log m)=2O(log m).
Multiplying the above number with the quantity in Claim
2.11, we obtain the desired upper bound on the probability
that a bad sequence Ut&i(1) , ..., Ut&i(s) exists. This com-
pletes our consideration of Case 2, as well as the inductive
step for proving the theorem. K
Corollary 2.1. Let ;=1&18 log 6 and let X be the
set of all output registers of an m-input partial l-AKS circuit
(where lm18), with , and _ being sufficiently large con-
stants. Under both the passive and the reversal fault models,
there exists a fixed partition of X into disjoint sets
[S, X1 , ..., Xm ], where m =3(m1&;), |S|=O(m34), and
|X1 |= } } } =|Xm |=3(m;) such that when \ is less than a
sufficiently small constant, a randomly faulty l-AKS circuit
has the following property with probability at least
1&\3(l log m): On all input permutations, the items in Xi are
smaller than the items in Xj for 1i< jm .
Proof. We prove the corollary by exhibiting a fixed par-
tition of X with the necessary properties. Assume that at the
last stage of the partial l-AKS circuit, the l-AKS tree has
depth d (i.e., nodes at level d are nonempty and nodes
strictly below d are all empty). If we were to run the l-AKS
circuit for another stage, then the root of the l-AKS tree
would have capacity less than - m. Hence, the capacity of
the root at the last stage of the l-AKS tree is at most - m&.
Therefore, at the last stage of the partial l&AKS circuit, the
total capacity of all nonempty nodes and the cold storage in
the l-AKS tree is at most
- m
&
+
- m
&
} 2A+ } } } +
- m
&
} (2A)d&1+
- m
&2A
+
- m
&(2A)2
+
- m
&(2A)3
+ } } }
=
(2A)d
2A&1
}
- m
&
<6d- m, (64)
where the last inequality follows from Eq. (1). By Claim 2.1
and the fact cap(root)- m, we also know that (even after
integer rounding) the number of registers in the l-AKS tree
is at least
1
2A+1
(- m+- m } 2A+ } } } +- m } (2A)d&2)
=
(2A)d&1&1
(2A+1)(2A&1)
- m
>6d&3 - m. (65)
Combining Inequalities (64) and (65) with the fact that
there are actually m registers in the l-AKS tree (including
the cold storage), we find that
- m 6d&3<m<6d - m,
and hence,
log6 - m<d<log6 - m+3. (66)
Let S be the set of output registers in the top Wd2X levels
of the l-AKS tree. Consider the wd4x th level of the l-AKS
tree (the root is assumed to be at level 0). Label all of the
2wd4x nodes at this level from left to right with 1, ..., m ,
where
m =2wd4x=3(m1&;). (67)
(Here, we have used Inequality (66).) For im , let Ti be the
set of registers contained in the tree rooted at the node
labeled i. Let
Xi=Ti&S.
The sets S and Xi , 1im , completely determine the parti-
tion X=S _ X1 _ } } } _ Xm , and we only need to show that
the partition has the claimed property.
By calculations similar to those used to derive Eq. (64),
and by Inequality (66), we have
|S|6Wd2X - m=O(m34). (68)
For im ,
|Xi ||Ti |&|S|

m&|S|
m
&|S|
0(m;) (by Eq. (67) and Inequality (68)).
285BREAKING THE 3(n log2 n) BARRIER
File: 571J 147022 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6296 Signs: 4883 . Length: 56 pic 0 pts, 236 mm
On the other hand, by Eq. (67),
|Xi |
m
m
=O(m;).
Thus,
|Xi |=3(m;),
as claimed.
It remains to prove that with probability at least
1&\O(l log m), on all input permutations, xi<xj for any pair
of items xi in Xi and xj in Xj such that i< j.
By Theorem 2.1, we only need to show that the above rela-
tionship holds, provided Inequality (13) is true. Assume for
the purposes of contradiction that for a given permutation
6 and some i< j, there exist items xi in Xi and xj in Xj such
that xi>xj . Then, either xi is not in the natural interval of
the root of Ti , or xj is not in the natural interval of the root
of Tj . Without loss of generality, assume that item xi is not
in the natural interval of the root of Ti . Then xi is at least
(d4)-strange since xi is at least Wd2X&wd4xd4 levels
away from the root of Ti . Let ri be the register that contains
xi . By the definition of the potential function, together with
Inequality (66),
P_(Ri)_d4&1_(log 6 - m)4&1. (69)
On the other hand, by Inequality (13),
P_(Ri)P_(Xi)+ cap(Xi)+m. (70)
Inequalities (69) and (70) now yield a contradiction for a
sufficiently large choice of the constant _. K
3. PASSIVE-FAULT-TOLERANT CIRCUITS
In this section, we use the fault-tolerance of the l-AKS
circuit to construct circuits that are tolerant to passive
faults. The most important result of this section is the
construction of a passive-fault-tolerant sorting circuit
with O(log n log log n) depth. Since a circuit contains at
most n2 comparators at each level, our circuit has
O(n log n log log n) size. This provides the first nontrivial
upper bound for sorting circuits that tolerate random
passive faults, and answers the open question posed by Yao
and Yao [20] up to an O(log log n) factor.
In [20], Yao and Yao conjectured that any passive-fault-
tolerant sorting or merging circuit has |(n log n) size. As an
interesting application of our technique for sorting, we
prove a tight bound of 3(n log n) on the size of passive-
fault-tolerant selection circuits. (A selection circuit is defined
as a circuit that outputs the median to a specified output
register.) This result would imply a separation of the com-
plexities for merging and selection if Yao and Yao’s conjec-
ture is correct. To the best of our knowledge, no such
separation result is currently known for merging and selec-
tion.
Theorem 3.1. There exists an explicit construction of a
passive-fault-tolerant sorting circuit with O(log n log log n)
depth.
Roughly speaking, we construct the circuit in the follow-
ing fashion. We first use a partial 1-AKS circuit to move all
but O(n34) special items to within O(n;) of the correct posi-
tions, where ;<1 is the constant in Corollary 2.1. (In fact,
we will only use 1-AKS circuits throughout this section.)
Then, we use an approximate-insertion circuit (to be
described below) to move the special items close to their
correct positions. After these two steps, all items are within
O(n:) of the correct position, for some constant :<1, and
we can complete the sort recursively. Given the fault-
tolerance of the partial l-AKS circuit, our construction is
slightly simpler than that of [13]: we do not need a subcir-
cuit to isolate the extreme items. More importantly, by
using the fault-tolerance properties of the partial l-AKS cir-
cuit, we can prove that our circuit works on all input per-
mutations, whereas the circuit in [13] only works for any
fixed input permutation (or most input permutations). (For
a detailed discussion of this aspect, see [13].) The basic
approach in this section will be used again later in the paper
to construct other fault-tolerant circuits, networks, and
algorithms for sorting-related problems.
We first prove two lemmas before proving Theorem 3.1.
An (m+1)-input circuit C is defined to be a 2-approximate-
insertion circuit if C outputs every item to within 2 of the
correct position provided that the input sequence consists of
a sorted list of m items plus another ‘‘unknown’’ item to be
input to a given register.
Lemma 3.1. For any \ less than a sufficiently small con-
stant, there exists an explicit construction of an (m+1)-input
(\, \3(log m))-passive-fault-tolerant m67-approximate-inser-
tion circuit with O(log m) depth.
Proof. As mentioned in the Introduction, all the circuits
constructed in this paper are standard. So we will prove the
lemma by explicitly constructing a standard circuit with the
claimed properties. We will first construct a circuit that
receives the unknown item at the top register. A circuit that
receives the unknown item at the bottom register can be
constructed in an entirely symmetric fashion. We then use
these two circuits to construct a circuit that solves the
general approximate-insertion problem where the unknown
item is input to an arbitrary register in a general position.
In what follows, we describe an m67-approximate-inser-
tion circuit C that receives the unknown item at the top
286 LEIGHTON, MA, AND PLAXTON
File: 571J 147023 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6861 Signs: 6051 . Length: 56 pic 0 pts, 236 mm
register. By the definition of approximate-insertion, the
input list consists of a sorted list, L, of length m, and an
unknown item, x, which will be input to the top register of
C according to our assumption. Let t=Wlog mX. Rather
than solving the given approximate-insertion problem
‘‘directly,’’ we first partition the m items of the sorted list L
into r=wmsx contiguous group of size s=2t& w t6 x, except
for the last group, which has m&(r&1) s items. Clearly, the
last group contains at least s and strictly less than 2s items.
To construct circuit C, it is useful to think of all of the items
in any one group as being indistinguishable. Conceptually,
we now solve an insertion problem with only r+1 inputs
(one for each group, plus the unknown item x), and we only
need to move x into the group to which it belongs. This is
because each group contains at most 2s items and the maxi-
mum distance between any pair of items in any single group
is at most 2s2 } 2W (56) tX<4 } 2(56)Wlog mX<m67 for m
sufficiently large.
We call a register clean at level i if it is not the top register
and if it has not been compared with any other register
strictly before level i; we call a register unclean at level i
otherwise. Intuitively, a register that is clean at level i has no
chance of containing x at level i and a register that is
unclean at level i has a nonzero chance of containing x. We
also assign an index to each of the m registers as follows: (i)
assign index 0 to the top register; (ii) for each j such that
1 jr&1, assign index j to all registers whose inputs are
taken from the j th group, which contains items with ranks
between ( j&1) s+1 and js in list L; (iii) assign index r to
registers whose inputs are taken from the last group, which
contains items with ranks between (r&1) s+1 and m in list
L. Circuit C will have 5 log r depth and the following
properties for each i such that 1i5 log r: (i) the MIN
(resp., MAX) input of every comparator at level i is unclean
(resp., clean), and (ii) each register that is unclean at level
i is compared with a register with a larger index that is clean
at level i. Since C has 5 log r depth, the total number of clean
registers required in the whole process is
:
1i5 log r
2i&1=25 log r&1=r5&1<\ms +
5
s,
which is less than or equal to the number of items contained
in any single group. Hence, we cannot possibly run out of
clean registers in any group.
Given these restrictions, we can complete the description
of circuit C by specifying the unique comparator at the first
level and by specifying for each register y that is output from
level i&1, the index of the register against which y will be
compared at level i. In particular, C is inductively constructed
as follows. The unique comparator at the first level con-
nects input x and a register with index 1, which is certainly
clean at level 1. The second level of C will have two com-
parators. The MIN output of the comparator at the first
level is fed into the MIN input of a comparator whose MAX
input takes a register with index 1 that is clean at level 2;
the MAX output of the comparator at the first level is fed
into the MIN input of a comparator whose MAX input
takes a register with index 1+2=3 that is clean at level 2.
In general, for each comparator at level i&1 that connects
two inputs with indices j and j+h, the MAX output (which
certainly has index j+h) is fed into the MIN input of a
comparator at level i whose MAX input has index
min[ j+h+2h, r]=min[ j+3h, r]; the MIN output
(which certainly has index j ) is fed into a comparator at
level i whose MAX input has index j+max[wh2x, 1]. This
completes the description of circuit C.
We next show that circuit C has the claimed property. By
the 01 principle, a circuit is a 2-approximate-insertion cir-
cuit if it 2-approximate-sorts every input sequence of the
form 0x1 y or 10x1 y. Clearly, a standard circuit always sorts
input sequences of the form 0x1 y (which are already sorted).
Hence, to show that C has the claimed property, we only
need to prove that with probability at least 1&\3(log m), a
randomly generated faulty version of C outputs an m67-
approximate-sorted list on all input sequences of the form
10x1 y. Moreover, since there are only m&1 sequences of the
form 10x1 y, for \ sufficiently small, it suffices to show that on
any fixed input sequence s of the form 10x1 y, with probabil-
ity at least 1&\3(log m), a randomly generated faulty version
of C produces an m67-approximate-sorted list.
In what follows, we will focus on a particular input
sequence s of the form 10x1 y and a particular random faulty
version, C*, of C. We need to show that with probability at
least 1&\3(log m), C* outputs an m67-approximate-sorted
list on input sequence s. We make a special mark on the 1
input to the top register, and we assume without loss of
generality that the marked 1 will always be output to the
MIN register when being compared with another 1. In addi-
tion, at each level, we mark the unique comparator that
receives the marked 1 as one of its two input items. Clearly,
the movement of the marked 1 is completely determined by
the functionality of the marked comparators in C* (it has
nothing to do with the functionality of other comparators).
Let w be the index of the group to which x belongs. To
prove that C* m67-approximate-sorts s with high probabil-
ity, it suffices to show that the marked 1 will be successfully
moved to a register with index w with high probability.
When \ is less than a sufficiently small constant, by a
standard Chernoff bound argument [3], with probability
at least 1&\3(log r)=1&\3(log m), at most (log r)3 of the
5 log r marked comparators are faulty. In what follows, we
will show that if at most (log r)3 marked comparators are
faulty, then the marked 1 will be successfully moved to a
register with index w. Given s and C*, we observe how the
marked 1 moves within C*. For each i=1, 2, ..., 5 log r, let
di be the nonnegative integer such that the marked 1 is con-
tained in a register with index w&di immediately before
287BREAKING THE 3(n log2 n) BARRIER
File: 571J 147024 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5633 Signs: 3845 . Length: 56 pic 0 pts, 236 mm
level i ; let hi be the integer such that the other input register
to the unique marked comparator at level i has index
w&di+hi . By our construction of C, hi1, for i=1,
2, ..., 5 log r. We next prove that the marked 1 will be moved
to the correct group of registers at the end of C* by showing
that d5 log r=0. For any nonnegative integer k, we define
b(k) to be the number of bits in the binary representation of
k (assume b(0)=0). Define a potential function F as
F(i)=3b(di)+|b(di)&b(hi)|.
Claim 3.1. The following inequalities hold when di1
and the marked comparator at level i is correct: (i)
F(i+1)F(i)+1 and (ii) F(i+1)F(i)&1.
By the inequalities |x|&| y||x& y| and |x+ y|
|x|+| y|,
F(i+1)&F(i)3(b(di+1)&b(di))+|b(di+1)&b(di)|
+|b(hi+1)&b(hi)|. (71)
Hence, the first inequality in the claim follows from the fact
that
b(di+1)b(di) (72)
and
b(hi+1)b(hi)+1. (73)
Assuming that the marked comparator at level i is correct
and that di1, we next prove
F(i+1)F(i)&1. (74)
When b(di+1)<b(di), Inequalities (71) and (73)
immediately imply Inequality (74). Hence, by Inequality
(72), we only need to check Inequality (74) under the
assumption that
b(di+1)=b(di). (75)
Given Eq. (75) and the assumption that the marked com-
parator at level i is correct, we conclude that b(hi){b(di).
There are two cases to consider.
Case 1: b(hi)>b(di). In this case, the marked 1 will
be output to the MIN output of the marked comparator
at level i. Hence, hi+1=max[whi2x, 1]=whi2x since
b(hi)>b(di)1 implies hi2. Hence,
b(hi+1)=b(hi)&1.
This, together with the assumption of Case 1 and Eq. (75),
implies that
F(i+1)&F(i)=(b(hi+1)&b(di+1))&(b(hi)&b(di))
=b(hi+1)&b(hi)=&1.
Case 2: b(hi)<b(di). In this case, the marked 1 will be
output to the MAX output of the marked comparator at
level i. Hence, either hi+1=2hi or the marked 1 will be com-
pared against a register with the largest index, r, at level
i+1. In the former case, we have
b(hi+1)=b(hi)+1. (76)
In the latter case, hi+1di+1 , which implies b(hi+1)
b(di+1)=b(di)>b(hi) (where we have used Eq. (75) and
the assumption of Case 2). Again, Eq. (76) holds (note that
b(hi+1)b(hi)+1 holds trivially). Using Eqs. (75) and (76)
and the assumption of Case 2, we have
F(i+1)&F(i)=(b(di)&b(hi)&1)&(b(di)&b(hi))=&1.
This proves Inequality (74) and concludes the proof of
Claim 3.1. K
By Claim 3.1, we know that before di becomes 0, each
correct marked comparator decreases the potential function
by at least 1, and each faulty marked comparator increases
the potential function by at most 1. Since we have assumed
that at most (log r)3 of the 5 log r marked comparators are
faulty, the potential decreases by at least 143 log r&
1
3 log r=
13
3 log r unless di=0 for some i5 log r. Since the initial
potential is F(0)3wlog rx+|wlog rx&1|4 log r, this
means that F(5 log r) is negative unless di=0 for some
i5 log r. Therefore, di=0 for some i5 log r, which
means that at a certain level, the marked 1 is moved into a
register with index w. Since we only have passive faults, once
the marked 1 is moved into the register with index w, it will
stay there until the end of C*. This proves that C has the
claimed property.
By using an entirely symmetric construction, we can con-
struct an m67-approximate-insertion circuit that receives
the unknown item at the bottom register. The same argu-
ment can be used to prove that the circuit has the desired
property.
To construct an approximate-insertion circuit that
receives the unknown item at an arbitrary position, we use
the following simple technique to increase the success
probability of each comparator, as in [20]. If we apply l
consecutive comparators, each of which fails to work with
probability upper bounded by \, to two registers, then the
probability that the items contained in the two registers are
unsorted after these l consecutive comparators is at most \l.
288 LEIGHTON, MA, AND PLAXTON
File: 571J 147025 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6624 Signs: 5264 . Length: 56 pic 0 pts, 236 mm
The l consecutive comparators can be viewed as a com-
parator whose failure probability is at most \l. This techni-
que will be used in many other passive-fault-tolerant circuit
constructions and will be referred to as the replication
technique. For example, by using the replication technique,
the approximate-insertion circuit that receives the unknown
item at the top or bottom register can be made into an
O(l log m)-depth (\, \3(l log m))-passive-fault-tolerant m67-
approximate-insertion circuit that receives the unknown
item at the top or bottom register.
In general, when the unknown item has to be input to a
given register rx in a general position, we can construct the
desired circuit by applying a circuit to all of the registers
above and including rx followed by another circuit to all of
the registers below and including rx . In one of the two
circuits, we may have to use the replication technique to
achieve the desired success probability if the number of
inputs to that circuit is not large enough. K
Lemma 3.2. For any \ less than a sufficiently small
constant, there exists an explicit construction of an m-input
(\, \3(log m))-passive-fault-tolerant m:-approximate-sorting
circuit with O(log m) depth, where :<1 is a fixed constant.
Proof. We construct the claimed circuit as follows. First,
we apply a partial 1-AKS circuit to all m inputs. Let
X=S _ X1 _ } } } _ Xm be the partition of the output
registers of the partial 1-AKS circuit as specified in
Corollary 2.1, where m =3(m1&;), |S|=O(m34), and
|X1|= } } } =|Xm |=3(m;).
For im and j|X1|, let xij be the j th register in Xi and
let Yj=[xij | 1im ]. By Corollary 2.1, with probability
at least 1&\3(log m), the item contained in xkj is smaller than
that contained in xlj for any k<l, and Yj thus contains a
sorted list for j|X1|. For j|S| , let sj be the j th register
in S. In parallel, we apply the (m +1)-input m 67-approxi-
mate-insertion circuit of Lemma 3.1 to [sj] _ Yj for each
1 j|S |. By Lemma 3.1, with probability at least
1&|S| \3(m )=1&\3(log m), all of the items in [sj] _ Yj are
at most m 67 away from their correct positions within
[sj ] _ Yj for all j|S|. Since distance one within Yj
corresponds to distance at most |X1| within 1 j|X1| Yj=
1im Xi , distance one within [sj ] _ Yj corresponds to
distance at most |X1|+|S|=3(m;) in the whole circuit.
Thus, distance m 67 within [sj ] _ Yj corresponds to dis-
tance of at most m 673(m;)=3(m(1&;)(67)+;)m: (where
: can be any constant strictly less than 1 and strictly greater
than (1&;) 67+;) in the whole circuit. K
Proof of Theorem 3.1. To construct our passive-fault-
tolerant sorting circuit with O(log n log log n) depth, we
will repeatedly apply the approximate-sorting circuit of
Lemma 3.2. In order to apply Lemma 3.2, we need to ensure
that the failure probability for each comparator is upper
bounded by a sufficiently small constant. This can be
achieved by using the replication technique described in
Lemma 3.2. In particular, for any \ and =, we can construct
a circuit that simulates a comparator with failure probabil-
ity at most = by simply replicating Wlog\ =X times a com-
parator with failure probability at most \. The circuit thus
constructed will be referred to as a (\, =)-enforced com-
parator hereafter. In order to apply Lemma 3.2, we need
only set = to a sufficiently small constant, which in turn leads
to only constant factor replication. Then, we can build the
whole circuit from these (\, =)-enforced comparators. By
applying this replication technique, the size and the depth of
the whole circuit are both increased by only a constant
factor. Ignoring this constant factor, we will assume that all
of the comparators in our construction have a sufficiently
small \ and that Lemma 3.2 can be applied.
Let : be the constant in Lemma 3.2. Our circuit consists
of O(log log n) rounds. The i th round circuit has O(log n)
depth and outputs a 2i -approximate-sorted list with prob-
ability at least 1&1n2, provided that the list produced
by the (i&1)th round circuit is 2i&1-approximate-sorted,
where 2i is a parameter determined by the recurrence
2i=(42i&1): for i2, (77)
with the boundary condition
21=n:. (78)
The reason that we need to upper bound the failure prob-
ability of each round by 1n2 instead of 1n is that we will
have O(log log n) rounds and we will upper bound the over-
all failure probability by O((1n2) log log n)14 for n suf-
ficiently large.
Recurrence (77) can be rewritten as
a 2i=(a 2i&1):,
where a=( 14)
:(1&:). Solving this recurrence, we find that
2i=3(n:
i
). (79)
In the first round of our circuit, we apply the n:-approx-
imate-sorting circuit of Lemma 3.2 to all n inputs. In par-
ticular, we use the replication technique to make the failure
probability of each comparator sufficiently small so that the
term \3(log n) in Lemma 3.2 is smaller than 1n2. Lemma 3.2
implies that, with probability at least 1&1n2, the outputs
of the first round form a 21-approximate-sorted list. The
depth for the first round is O(log n).
For i2, we construct the i th round of the circuit so that
with probability at least 1&1n2, the outputs of the i th
round form a 2i-approximate-sorted list, provided that the
outputs from the (i&1)th round form a 2i&1-approximate-
sorted list. At the beginning of the i th round, we group all
289BREAKING THE 3(n log2 n) BARRIER
File: 571J 147026 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6363 Signs: 4994 . Length: 56 pic 0 pts, 236 mm
the output registers from the (i&1)th round as follows: For
1kn2 2i&1 , let Xk be the set of registers in positions
(k&1) 2 2i+1+1 to k2 2i&1. In parallel, we apply an
approximate-sorting circuit of Lemma 3.2 to X1 _ X2 ,
another approximate-sorting circuit of Lemma 3.2 to
X3 _ X4 , and so on. Then, in parallel, we apply an
approximate-sorting circuit of Lemma 3.2 to X2 _ X3 ,
another approximate-sorting circuit of Lemma 3.2 to
X4 _ X5 , and so on. We need the second set of approximate-
sorting circuits to bring items across the boundaries of the
first set of approximate-sorting circuits.
To analyze the behavior of the i th round circuit, we
will make use of the 01 principle. For any 01 sequence
s=(s1 , s2 , ..., sm), we define the dirty window of s to be
a subsequence of s of the form (si , si+1, ..., sj) such that:
(i) si=1, (ii) sj=0, (iii) sk=0 for all k<i, and (iv) sk=1 for
all k> j. In other words, the dirty window of s is the shortest
subsequence of s such that every item strictly before the
subsequence is 0 and every item strictly after the subsequence
is 1. Intuitively, the dirty window of a 01 sequence s is the
part of s that we need to work on in order to sort s.
We next use the 01 principle to prove that if all of the
constituent approximate-sorting circuits in the i th round
work correctly as (4 2i&1):-approximate-sorting circuits,
and if the list produced by the (i&1)th round is 2i&1-
approximate-sorted, then the i th round circuit produces a
2i -approximate-sorted list. Let us restrict our attention to
01 inputs and assume that the list produced by the
(i&1)th round is 2i&1-approximate-sorted. Then, the list
input to the i th round contains a dirty window of size
2 2i&1 or less. This dirty window is fully contained in one
of the approximate-sorting circuits involved in the i th
round. (Recall that we have two sets of (4 2i&1)-input
approximate-sorting circuits in the i th round, where the
second set is offset from the first by 2 2i&1. Also, note that
passive faults cannot increase the size of the dirty window.)
Hence, after the two sets of approximate-sorting circuits in
the i th round, the output list is (4 2i&1):-approximate-
sorted; i.e., it is 2i-approximate-sorted.
In order to make sure that with sufficiently high probabil-
ity, all of the constituent approximate-sorting circuits in the
ith round work correctly, we need to do some careful
calculations. In the above construction, if we simply apply
the approximate-sorting circuits of Lemma 3.2 as in the
first round, then the failure probability for each of the
approximate-sorting circuits is 1(4 2i&1)2=1O(n2:
i&1
),
which is too large in comparison with the goal of 1n2.
To overcome this difficulty, we need to make use of the
replication technique again. In particular, we construct the
approximate-sorting circuits from (\, =)-enforced com-
parators, where = satisfies
=3(log(4 2i&1))1n3 (80)
and where the constant behind the 3-notation is the same
as that in Lemma 3.2. By doing so, each of the approximate-
sorting circuits fails with probability at most 1n3. Hence,
the probability that all of the constituent approximate-
sorting circuits in the i th round are good is at least 1&1n2.
To construct the (\, =)-enforced comparators, we replicate
each of the original comparators
log\ ==O \ 3 log1\ nlog(4 2i&1)+=O \
log1\ n
log n:i&1+
times (where the second equality follows from Eq. (79)).
This replication results in an
O \ log1\ nlog n:i&1+
blowup in the original O(log(4 2i&1))-depth construction.
Hence, the total depth of the i th round is
O(log(4 2i&1)) O \ log1\ nlog n:i&1+
=O \log n:i&1 log1\ nlog n:i&1+=O(log1\ n).
We repeatedly apply the above construction until every
item is moved to within a constant number of positions of
the correct position, i.e., we use i rounds such that 2i=O(1),
which is equivalent to n:i=O(1) or i=O(log log n) by
Eq. (79). Note that we should not apply the above construc-
tion all the way to the end to sort every item precisely to the
correct position, since we have only established the fault-
tolerance property of the l-AKS circuit for large m. When
every item is within a constant number of positions of the
correct position, we can combine the replication technique
with any sorting circuit with a constant number of inputs
(hence also constant depth and size) to achieve exact
sorting. This costs O(log n) additional depth. Overall, we
have O(log log n) rounds each of which has O(log n) depth
and works with probability at least 1&1n2. Therefore, the
total depth is O(log n log log n) and the failure probability
is O(log log n) 1n21n for n sufficiently large. K
As we have pointed out at the beginning of the paper, the
construction can be made (\, 1poly(n))-passive-fault-
tolerant with only a constant factor increase in depth and
size. In particular, the proof of Theorem 3.1 can easily be
extended to establish the following corollary.
Corollary 3.1. For any constants \<1 and c, there
exists an explicit construction of a (\, 1nc)-passive-fault-
tolerant sorting circuit with O(log n log log n) depth.
290 LEIGHTON, MA, AND PLAXTON
File: 571J 147027 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6341 Signs: 5166 . Length: 56 pic 0 pts, 236 mm
Theorem 3.2. There exists an explicit construction of
a passive-fault-tolerant selection circuit with asymptotically
optimal size of 3(n log n).
Proof. An 0(n log n) lower bound on the size of selec-
tion circuits was proved by Alekseyev [2] even in the fault-
free case (see also Theorem A on pages 234235 of [9]). In
what follows, we give a passive-fault-tolerant construction
with O(n log n) size.
Let : be the constant of Lemma 3.2, and let \<1 be
a constant upper bound on the failure probability of
each comparator. Take a (\, 1n2)-passive-fault-tolerant
n:-approximate-sorting circuit C1 as in Lemma 3.2. Take
another (2n:+2)-input (\, 1n2)-passive-fault-tolerant-
sorting circuit C2 as in Corollary 3.1; e.g., we can choose
c=3: in Corollary 3.1. Our passive-fault-tolerant selection
circuit C consists of C1 followed by C2 with the middle
2n:+2 outputs of C1 being the inputs of C2 . Clearly, the size
of C is O(n log n)+O(n: log n log log n)=O(n log n).
We next show that the circuit constructed is a (\, 1n2)-
passive-fault-tolerant circuit that outputs the median to the
middle output register of C2 . By the choice of C1 and C2 , the
probability that C1 is an n:-approximate-sorting circuit and
that C2 is a sorting circuit is at least 1&2n21&1n.
Hence, to show that C is a (\, 1n)-passive-fault-tolerant
selection circuit, we need only prove that when C1 is an
n:-approximate-sorting circuit and C2 is a sorting circuit;
C always outputs the median input to its middle output
register. When C1 is an n:-approximate-sorting circuit, by
the definition of approximate-sorting, the top n2&n:&1
outputs of C1 all contain items smaller than the median
input to C, and the bottom n2&n:&1 outputs of C1 all
contain items larger than the median input to C. Hence, the
median input to C will be the median among all of the
inputs to C2 . When C2 is indeed a sorting circuit, the median
input to C2 , which is identical to the median input to C, will
be output to the middle output register of C2 , which is the
same as the middle output register of C. K
4. REVERSAL-FAULT-TOLERANT CIRCUITS
AND NETWORKS
In this section, we present our results for reversal faults.
We consider both the circuit model and the network model.
In Section 4.1, we use the fault-tolerance of the l-AKS circuit
to construct a reversal-fault-tolerant O(log n)-approximate-
sorting circuit (for \ less than a sufficiently small constant)
with O(n log n(log log n)2) size and O(log2 n) depth. In
addition, we present some general lower bounds for reversal-
fault-tolerant approximate-sorting circuits. In Section 4.2,
we use the approximate-sorting circuit of Section 4.1 to
construct a reversal-fault-tolerant sorting network (for \
less than a sufficiently small constant) with O(n loglog2 3n)
size. This provides the first o(n log2 n)-size reversal-fault-
tolerant sorting network, thereby answering the open ques-
tion posed by Assaf and Upfal [4]. In light of a lower bound
established in [12, Theorem 2], this result separates the size
complexities of sorting networks with reversal and destruc-
tive faults. In all of the upper bound results of this section
(except Lemma 4.4), we will need the assumption that \ is
less than a sufficiently small constant. Whether or not such
an assumption is necessary is an open question. Note that
no assumption will be made on \ in the lower bound results.
4.1. Approximate-Sorting Circuits
In this section, we study approximate-sorting circuits that
are tolerant to reversal faults. As defined in Section 2, a
2-approximate-sorting circuit is a circuit that outputs every
item to within 2 of its correct position on all input permuta-
tions.
In Theorem 4.1 below, we show that any reversal-fault-
tolerant 2-approximate-sorting circuit has 2=0(log n).
This lower bound leads us to focus our attention on the
study of reversal-fault-tolerant O(log n)-approximate-
sorting circuits in the remainder of Section 4.1. We continue
in Theorem 4.2 with an 0(log2 n) lower bound on the depth
of any reversal-fault-tolerant O(log n)-approximate-sorting
circuit. The main result of Section 4.1 is Theorem 4.3, where
we construct a reversal-fault-tolerant O(log n)-approximate-
sorting circuit (for \ less than a sufficiently small constant)
with O(n log n(log log n)2) size and asymptotically optimal
depth of O(log2 n). The size of this circuit is of particular
importance when we further modify the circuit into a
network with o(n log2 n) size in Section 4.2.
Theorem 4.1. For any positive constant #<1, there
exists a positive constant c depending on # such that for any
n-input circuit, where n is sufficiently large (depending on \
and #), with probability at least 1&exp(&n#log(1\)) there
is an input permutation for which the circuit outputs some
item at least c log1\ n away from the correct position.
Proof. See Appendix A. K
When \ is a constant, the preceding theorem has the
following immediate corollary, where the constant c
depends on \.
Corollary 4.1. There is some constant c for which no
reversal-fault-tolerant (c log n)-approximate-sorting circuit
exists.
Theorem 4.2. For any positive constants c, \, and #<1,
any (\, n&c)-reversal-fault-tolerant n#-approximate-sorting
circuit has 0(log2 n) depth.
Proof. See Appendix B. K
291BREAKING THE 3(n log2 n) BARRIER
File: 571J 147028 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6384 Signs: 5322 . Length: 56 pic 0 pts, 236 mm
We remark that, as indicated in the proof, the 0(log2 n)
depth lower bound of Theorem 4.2 actually holds for the
much simpler problem of n#-approximate-insertion.
Theorem 4.3. For any \ less than a sufficiently small
constant, there exists an explicit construction of a reversal-
fault-tolerant O(log n)-approximate-sorting circuit with
3(log2 n) depth and O(n log n(log log n)2) size.
The remainder of Section 4.1 is devoted to the proof of
Theorem 4.3. The next few lemmas will be very useful.
Lemma 4.1. For any \ less than a sufficiently small
constant and for llog(ml ), there exists an explicit
construction of an m-input (\, \3(l))-reversal-fault-tolerant
l-approximate-insertion circuit with O(l log(ml )) depth and
O(ml ) size.
Proof. Without loss of generality, we assume that both
m and l are integral powers of 2. We label the registers
r1 , ..., rm from the top to the bottom, and we assume that the
unique unknown item is input to a particular register rx .
For 1ilog(ml ), let Ri=[rj | j#x mod(m2 il)]. Our
circuit consists of log(ml ) rounds. In the i th round, we
group all the registers in Ri from the top to the bottom into
groups Xi, 1 , Xi, 2 , ..., Xi, 2i so that each of the groups
contains l registers. First, in parallel, we apply a set of
2l-input and 4l-depth oddeven transposition circuits to
Xi, 1 _ Xi, 2 , Xi, 3 _ Xi, 4 , and so on. Second, in parallel, we
apply another set of 2l-input and 4l-depth oddeven trans-
position circuits to Xi, 2 _ Xi, 3 , Xi, 4 _ Xi, 5 , and so on.
Since each Ri contains 2i groups of the form Xi, j , the total
number of oddeven transposition circuits in the i th round
is at most 2 } 2i&1=2i. Hence, the total number of oddeven
transposition circuits used in the entire circuit is at most
:
1ilog(ml )
2i21+log(ml )=2ml. (81)
Since each of the oddeven transposition circuits has size
O(l2), the size of the entire circuit is O((ml ) l2)=O(ml ).
Also, since we have log(ml ) rounds, each of which consists
of two sets of O(l )-depth oddeven transposition circuits,
the depth of the entire circuit is O(l log(ml )).
By Lemma 2.1 with = 132 , each of the oddeven trans-
position circuits used in the entire circuit is an (l32)-
approximate-sorting circuit with probability at least
1&\3(l ). Hence, by Inequality (81), the probability that all
of the constituent oddeven transposition circuits are (l32)
approximate-sorting circuits is at least
1&O \ml + \3(l )=1&\3(l ),
where we have used the fact that \ is sufficiently small and
llog(ml ).
Thus, to prove the lemma, it remains to show that the
circuit thus constructed is an l-approximate-insertion circuit
when all of the constituent oddeven transposition circuits
are (l32)-approximate-sorting circuits. Assuming that all of
the constituent oddeven transposition circuits are (l32)-
approximate-sorting circuits, we prove by induction on i
that the items of Ri form an (l8)-approximate-sorted list
after the i th round. (This suffices to prove that our circuit is
an l-approximate-insertion circuit since Rlog(ml ) contains all
m registers.)
The base case i=1 is trivial, since the first round actually
consists of a single (l32)-approximate-sorting circuit.
Assuming that the items from the (i&1)th round form an
(l8)-approximate-sorted list, we now show that the items
from the i th round form an (l8)-approximate-sorted list. By
the 01 principle, we need only consider the case where the
inputs consist of 0s and 1s. Recall that, as defined in the
proof of Theorem 3.1, the dirty window of a 01 sequence s
is the shortest subsequence of s such that every item strictly
before the subsequence is 0 and every item strictly after the
subsequence is 1. Since the sequence from the (i&1)th
round is assumed to be (l8)-approximate-sorted, the input
sequence to the i th round is an (l4)-approximate-sorted list
and has a dirty window of size at most l2. There are two
cases to consider.
Case 1: The dirty window of the sequence input to the
i th round is fully contained in a circuit that belongs to the
first set of (l32)-approximate-sorting circuits in the i th
round. In this case, the size of the dirty window will
be decreased to 2l32 or less by the first set of (l32)-
approximate-sorting circuits in the i th round. Then, the
second set (l32)-approximate-sorting circuits may increase
the size of the dirty window by at most an additive term of
2l32. (Note that this is different from the case of passive
faults, which cannot increase the size of a dirty window.)
Hence, the final output sequence from the i th round has
a dirty window of size at most 432 l=l8, and is (l8)-
approximate-sorted.
Case 2: The dirty window of the sequence input to the
i th round is not fully contained in any of the circuits in the
first set of (l32)-approximate-sorting circuits in the i th
round. In this case, the first set of approximate-sorting
circuits in the i th round may increase the size of the dirty
window by an additive term of 2l32. Hence, the sequence
input to the second set of (l32)-approximate-sorting
circuits has a dirty window of size at most l2+2l32=
9l16. By the assumption of Case 2 and the fact that the
boundaries of the first set of (l32)-approximate-sorting
circuits are the centers of the second set of 2l-input (l32)-
approximate-sorting circuits, this dirty window of size at
most 9l16 is fully contained in a circuit that belongs to the
292 LEIGHTON, MA, AND PLAXTON
File: 571J 147029 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6245 Signs: 4924 . Length: 56 pic 0 pts, 236 mm
second set of (l32)-approximate-sorting circuits. Hence, the
final output sequence produced by the second set of (l32)-
approximate-sorting circuits in the i th round has a dirty
window of size at most 2l32 and is, thus, (l8)-approximate-
sorted.
Lemma 4.2. For lm(1&;)2 (where ; is the constant
specified in Corollary 2.1) and for any \ less than a suf-
ficiently small constant, there is a constant :<1 such that
there exists an explicit construction of an m-input
(\, \3(l log m))-reversal-fault-tolerant m:-approximate-sorting
circuit with O(l log2m) depth and O(ml log m) size.
Proof. We use a similar technique as in the proof of
Lemma 3.2. First, we apply the partial l-AKS circuit of
Theorem 2.1. Then, we apply a set of (\, \3(l log m))-reversal-
fault-tolerant (l log m)-approximate-insertion circuits of
Lemma 4.1 in a fashion similar to that in the proof of
Lemma 3.2. The assumption lm(1&;)2 implies lm18,
which makes it possible to apply Corollary 2.1 in the first
step. In the second step, we can argue that every item is out-
put to within O(l log m) 3(m;) of the correct position. The
assumption lm(1&;)2 guarantees that every output item is
within O(l log m) 3(m;)<m: of the correct position for any
:>(1+;)2. In particular, the constant : in this lemma is
different from that of Lemma 3.2.
By Theorem 2.1 and Lemma 4.1, the depth of the entire
circuit is
O(l log m)+O \l log m log ml log m+=O(l log2m),
and the size of the entire circuit is
O(ml log m)+O(ml log m)=O(ml log m). K
Lemma 4.3. For any \ less than a sufficiently small
constant and lc log m, where c is a sufficiently large
constant, there exists an explicit construction of an m-input
(\, \3(l))-reversal-fault-tolerant l-approximate-sorting circuit
with O(l log2 m) depth and O(ml log2 m) size.
Proof. We first construct a (\, \3(l ))-reversal-fault-
tolerant circuit that l-approximate-merges two l-approxi-
mate-sorted lists. Given two t-item lists L1 and L2 , define
L1: and L2: to consist of the even-index items of L1 and L2
(resp.), and define L1; and L2; to consist of the odd-index
items of L1 and L2 (resp.). If L1 and L2 are l-approximate-
sorted, then so are L1: , L2: , L1; , and L2; . Next, recursively
l-approximate-merge L1: with L2; to form L: , and L1; with
L2: to form L; . By definition, L: and L; are l-approximate-
sorted. Next, shuffle L: with L; to form a list L$ that is
2l-approximate-sorted. Finally, partition L$ into contiguous
blocks consisting of 16l items each and apply a 16l-input
and 32l-depth oddeven transposition circuit as described
in Lemma 2.1 (with = 132) to each block. Finish up by
repartitioning the resulting list into contiguous blocks of
size 16l so that the boundaries of the earlier blocks fall in the
centers of the current blocks, and then applying a 16l-input
and 32l-depth oddeven transposition circuit to each of the
blocks.
By an argument similar to that used in the proof of
Lemma 4.1, we can show that if all of the oddeven trans-
position circuits in the entire approximate-merging circuit
work correctly as (l4)-approximate-sorting circuits, then
the entire circuit works correctly as an l-approximate-
merging circuit. By Lemma 2.1 with = 132 , the probability
that a particular oddeven transposition circuit is not an
(l4)-approximate-sorting circuit is at most \3(l). on the
other hand, the total number of oddeven transposition cir-
cuits in the entire circuit is upper bounded by a polynomial
in m. Hence, the probability that all of the oddeven trans-
position circuits function correctly as (l4)-approximate-
sorting circuits is at least 1&poly(m) } \3(l )=1&\3(l )
when \ is sufficiently small and lc log m for a sufficiently
large constant c.
The depth of the l-approximate-merging circuit thus
constructed is determined by the recurrence
M(t)=M \ t2++O(l )
with the boundary condition M(t)=3(1) for 1tl,
where M(t) is defined as the depth of a circuit for
l-approximate-merging two t-item lists. Solving the
recurrence, we find that our circuit has depth M(t)=
O(l log(tl )).
Given the approximate-merging circuit thus constructed,
we can construct the claimed l-approximate-sorting circuit
by partitioning the m items to be sorted into two subsets of
m2 items each, l-approximate-sorting the subsets recur-
sively, and then l-approximate-merging them. This techni-
que leads to the following recurrence for the depth of the
l-approximate-sorting circuit
D(m)=D \m2 ++M \
m
2 +=D \
m
2 ++O \l log
m
l + ,
with the boundary condition D(m)=3(1) for 1ml.
Solving the recurrence, we have D(m)=O(l log2(ml )).
Since each level of the circuit contains at most m2 com-
parators, the size of the circuit is O(ml log2(ml )). K
Proof of Theorem 4.3. We construct the circuit in a
fashion similar to that of Theorem 3.1. In O(log log n)
rounds, we repeatedly apply the approximate-sorting circuit
of Lemma 4.2 with some appropriately chosen value of l.
The i th round consists of two sets of approximate-sorting
circuits where the boundaries of the second set of circuits fall
293BREAKING THE 3(n log2 n) BARRIER
File: 571J 147030 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6319 Signs: 4745 . Length: 56 pic 0 pts, 236 mm
in the centers of the first set of circuits. Our hope is that with
probability at least 1&1n2, the i th round outputs every
item to within O(n:i) of the correct position (where : is the
constant specified in Lemma 4.2), provided that each of
the outputs from the (i&1)th round is within O(n:i&1) the
correct position. The detailed parameter choices governing
the application of the approximate-sorting circuits in each
round are similar to those in the proofs of Theorem 3.1 and
Lemma 4.1. The detailed argument of Lemma 4.1 can be
applied to show that our circuit has the desired property,
provided that all of the constituent approximate-sorting
circuits function correctly.
In the proof of Theorem 3.1, we have used the replication
technique to guarantee a success probability of at least
1&1n2 in each round. For reversal faults, however, if we
replicate a comparator many times, then the outcome is
completely determined by the behavior of the last com-
parator. Thus, we need a new method to replace the replica-
tion technique. Lemma 2.1 will essentially play a role similar
to that of the replication technique. Note that we have
assumed \ to be sufficiently small in both Lemma 2.1 and
Theorem 2.1. Hence, we can prove the current theorem only
under the assumption that \ is less than a sufficiently small
constant.
Returning to the construction of the reversal-fault-
tolerant O(log n)-approximate-sorting circuit, let li denote
our choice of the parameter l in Lemma 4.2 for the i th
round, in which we need circuits with 3(n:i&1) inputs each.
We will choose li=3(1:i&1 log(1\)) so that
3(li log n:
i&1
)=log1\ n3, (82)
where the constant behind the 3-notation is the same as
that of Lemma 4.2. This guarantees that the failure prob-
ability for the i th round is at most n\3(li log n:
i&1
)1n2. By
Lemma 4.2 and Eq. (82), the depth and the size for the i th
round are
O(li log2 n:
i&1
)=O(log n log n:i&1) (83)
and
O(nli log n:
i&1
)=O(n log n), (84)
respectively. In order to apply Lemma 4.2, we stop this
procedure immediately before step i for some i such that
li=3((n:
i&1
)(1&;)2), (85)
where ; is the constant specified in Corollary 2.1. Let i0 be
the least i satisfying Eq (85). Simple calculations show that
li0=3(log n), n
:i0&1=3(log2(1&;) n), and i0=O(log log n).
Hence every item is within O(n:i0&1)=O(log2(1&;) n) of the
correct position at the end of the (i0&1)th round.
Given this O(log2(1&;) n)-approximate-sorted list, we apply
the O(log n)-approximate-sorting circuits of Lemma 4.3
with l=O(log n) to contiguous blocks of O(log2(1&;) n)
items twice, where the boundaries of the second set of
circuits fall in the centers of the first set of circuits. (Detailed
parameter choices are similar to those obtained in the proof
of Lemma 4.1.) The depth and size of this final round
are O(log n(log log n)2) and O(n log n(log log n)2), respec-
tively.
Now, by expressions (83) and (84), the depth of the entire
circuit is
O(log n(log log n)2)+ :
1ii0
O(log n log n:i&1)=O(log2 n),
and the size of the entire circuit is
O(n log n(log log n)2)+ :
1ii0
O(n log n)
=O(n log n(log log n)2). K
If we are only interested in a circuit with O(log2 n) depth
and do not particularly care about its size, there is a much
simpler construction that does not depend on Theorem 2.1.
In fact the proof technique of Lemma 2.2 enables us to con-
struct an m-input (\, 1n2)-reversal-fault-tolerant =-halver
with O(log n) depth for m=0(log n) and any positive
constant =. Replacing each of the =-halvers in the AKS
circuit by a reversal-fault-tolerant =-halver constructed in
this manner, we can get a reversal-fault-tolerant O(log n)-
approximate-sorting circuit with O(log2 n) depth relatively
easily. Since the fault-tolerance of the l-AKS circuit is very
difficult to prove, the existence of the simpler construction
with asymptotically optimal depth is worth mentioning.
Nevertheless, we need the fault-tolerance of the l-AKS
circuit to achieve the better size bound, which is of crucial
importance in Section 4.2.
4.2. Sorting Networks
In this section, we use the reversal-fault-tolerant
O(log n)-approximate-sorting circuit designed in Section 4.1
to construct a reversal-fault-tolerant sorting network of
O(n loglog2 3 n) size (for \ less than a sufficiently small
constant). This is the first reversal-fault-tolerant sorting
network of o(n log2 n) size, and it answers the open question
posed by Assaf and Upfal [4]. As pointed out in the Intro-
duction, we will assume (as in [4]) that the replicators are
fault-free. This is not a particularly unreasonable assump-
tion since replicators can be hard-wired and they do not
contain any logic elements.
Theorem 4.4. For any \ less than a sufficiently small
constant, there exists an explicit construction of a reversal-
fault-tolerant sorting network with O(n loglog2 3 n) size.
294 LEIGHTON, MA, AND PLAXTON
File: 571J 147031 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6174 Signs: 5184 . Length: 56 pic 0 pts, 236 mm
The next lemma addresses how to use a network to
compute a certain majority-like function. (As in the case of
sorting, the network consists of comparators, which are
subject to reversal faults, and replicators; but unlike a
sorting network, the result of the computation is output
to only one of the many registers in the network.) In
particular, we are interested in a network that computes the
majority function correctly provided that a large fraction
(much larger than 12) of the inputs are either all 0s or all
1s. Formally, for any constant r # ( 12 , 1), we define an
r-MAJORITY function with n inputs to be a Boolean func-
tion that outputs 1 if more than rn of the inputs are 1s and
that outputs 0 if more than rn of the inputs are 0s. (We do
not care how the function behaves if neither the number of
0s nor the number of ls exceeds rn.)
Lemma 4.4 [8]. For some constant r # ( 12 , 1) and for any
constant \< 12 , there exists an explicit construction of a
(\, =)-reversal-fault-tolerant network with O((log\ =)log2 3)
size that computes the r-MAJORITY function of O(log\ =)
inputs.
Proof. See [8, Lemma 3.2]. K
Proof of Theorem 4.4. Our network consists of two
parts. In the first part, we apply the reversal-fault-tolerant
O(n log n(log log n)2)-size O(log n)-approximate-sorting
circuit of Theorem 4.3. (We actually need the circuit to be
(\, 12n)-fault-tolerant instead of (\, 1n)-fault-tolerant.
This requirement can be satisfied since the circuit of
Theorem 4.3 is actually (\, O((log2 n)n2))-fault-tolerant.)
At the end of this part, with probability at least 1&12n,
every item is within O(log n) of the correct position.
Observe that this part of the network does not use any
replicators and does not use more than n registers.
The second part of our network moves all of the items
to the correct positions with probability at least 1&12n,
provided that each of the inputs to this part is at most
O(log n) away from the correct position. To construct
the second part, it suffices to construct an O(log n)-input
(\, 1n2)-reversal-fault-tolerant sorting network with
O(log n loglog2 3 n) size, since we can finish the second part
by applying such networks to contiguous blocks of O(log n)
items twice, where the boundaries of the second set of
networks fall in the centers of the first set of networks.
Such an O(log n)-input ( \, 1n2)-reversal-fault-tolerant
sorting network can be constructed by using the Assaf
Upfal method [4] with some modifications. Let C be an
O(log n)-input sorting circuit with O(log log n) depth, e.g.,
let C be an AKS circuit with O(log n) inputs. Replace each
original register ri by a block of registers Ri=[ri1 , ..., ris],
where s=O(log n). Also, replace each comparator between
ri and rj by s comparators that connect rik and rjk for each
ks. After each set of comparators that correspond to a
single level in C, apply the expander-like construction of the
so-called majority preserver designed in [4] to each of the
blocks. For any fixed constant r<1, by carefully choosing
the parameters involved in the majority preservers, we can
use the argument of [4] to show that with probability at
least 1&12n2, for all i, more than rs of the items contained
in Ri are the same as that contained in ri in the fault-free
case. The details of the construction and its proof of correct-
ness can be found in [4]. We complete the O(log n)-input
(\, 1n2)-reversal-fault-tolerant sorting network by apply-
ing (in parallel) the s-input (\, 12n2)-reversal-fault-tolerant
r-MAJORITY network of Lemma 4.4 to each of the
O(log n)-size blocks of the form Ri . K
5. AN OPTIMAL EREW FAULT-TOLERANT
SORTING ALGORITHM
In this section, we design a fault-tolerant sorting algo-
rithm for the EREW PRAM model. In the PRAM model, as
pointed out in the Introduction, we assume that faults only
occur as incorrect answers to comparison queries and that
no item is lost in any comparison. The main result in this
section is a fault-tolerant EREW PRAM sorting algorithm
that runs in optimal 3(log n) time on n processors. This
answers an open question posed by Feige, Peleg, Raghavan,
and Upfal [7]. The only previously known o(log2 n) time
fault-tolerant PRAM sorting algorithm on n processors is a
randomized algorithm [7].
Theorem 5.1. There exists an explicit deterministic fault-
tolerant EREW PRAM sorting algorithm with 3(log n) run-
ning time on n processors.
The following lemma is due to Feige, Peleg, Raghavan,
and Upfal [7].
Lemma 5.1 [7]. There exists an explicit deterministic
(\, \3(log m))-fault-tolerant EREW PRAM algorithm that
selects the maximum of m items in 3(log m) time on m
processors.
Proof. See [7, Theorem 20]. K
Proof of Theorem 5.1. We use the approach of Theo-
rem 3.1, with modifications to achieve the claimed 3(log n)
running time (recall that the depth bound in Theorem 3.1 is
O(log n log log n)).
For any constant =, by simple majority vote, we can easily
obtain a comparison scheme with O(1) running time that
yields the correct answer with probability at least 1&=. In
our algorithm, we first use the PRAM to simulate the par-
tial 1-AKS circuit (i.e., the partial l-AKS circuit with l=1).
In order to apply Corollary 2.1, we need to make sure that
the fault probability of each comparison is upper bounded
295BREAKING THE 3(n log2 n) BARRIER
File: 571J 147032 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6227 Signs: 5023 . Length: 56 pic 0 pts, 236 mm
by a sufficiently small constant. This can be achieved by the
majority-vote scheme, which only causes a constant slow-
down. By Corollary 2.1, all of the outputs from the PRAM
simulation of the partial 1-AKS circuit can be partitioned as
X=S _ X1 _ } } } _ Xn such that all the items in Xi are
smaller than all the items in Xj for i< j, where n =3(n1&;).
In the proof of Theorem 3.1, we went on to use the
approximate-insertion circuit of Lemma 3.1. Here, in order
to achieve the claimed running time, we need a better inser-
tion scheme so that we will not have to deal with the bound-
ary problem that occurred in the proof of Theorem 3.1.
In particular, we use the following nonoblivious strategy. In
parallel, we apply the selection algorithm of Lemma 5.1 to
Xi for each in . Let Mi be the maximum item in Xi as
reported by the selection algorithm. We use all n processors
to sort the set
P=S _ [M1 , ..., Mn ].
Since we have n processors and P contains at most
n +|S |=O(n34) items, we can sort P with probability at
least 1&1n2 by simulating the m-input, O(log m)-time, and
O(m log m)-register (\, \3(log m))-destructive-fault-tolerant
sorting network designed in [4] with m=n +|S |.
Based on the sorted order of P, we can derive the
approximately correct position for each item in S. In
particular, we can partition X as
X= .
1in
Yi ,
where Yi=Xi _ [s # S : Mi&1<sMi ] for i<n , and
Yn =Xn _ [s # S : s>Mn &1] (M0 is assumed to be &). It
is easy to see that
3(n;)=|X1||Yi ||S |+ |X1|
O(n34)+3(n;)=3(n;), (86)
where ; is the constant specified in Corollary 2.1. For i< j,
all items in Yi are smaller than all items in Yj , provided that
the simulation of the partial 1-AKS circuit and the sorting
of P are both done correctly. Note that this output order is
‘‘cleaner’’ than that obtained after the first round of the
circuit of Theorem 3.1; in the present case, we have no
boundary problems to cope with.
To sort the items within Yi recursively, in Theorem 3.1 we
have used a more and more intensive application of the
replication technique for smaller and smaller sets of items
in order to guarantee that each of the O(log log n)
rounds works with probability at least 1&1n2, instead of
1&1O(n:i). To achieve the O(log n) running time, here we
use an adaptive approach to avoid the O(log log n) running
time blowup caused by this replication technique.
In parallel, we apply a (\, 1|Yi | )-fault-tolerant EREW
sorting algorithm to Yi for each in . A standard Chernoff
bound argument [3] shows that the number of unsorted
groups of the form Yi is at most n (1|Yi | )+n 34 with
probability at least 1&e&n 242. By Inequality (86), this
means that with probability at most 1&1n2, the number of
unsorted groups of the form Yi is at most O(n 34).
We now detect which groups of the form Yi remain
unsorted. For each i in parallel, we first assume that the
order for Yi reported by the recursive sorting algorithm is
correct. Then, we check the correctness of the order by
repeatedly comparing adjacent items O(log n) times. With
probability at least 1&1n2, we will detect all of the O(n 34)
unsorted groups. The total number of items contained in the
unsorted groups is at most O(n 34n;)=O(n34(1&;)+;)=
O(n(3+;)4). Since we have n processors to sort the
O(n(3+;)4) unsorted numbers, we can proceed by simu-
lating the m-input, O(log m)-time, and O(m log m)-register
(\, \3(log m))-destructive-fault-tolerant sorting network
designed in [4] with m=O(n(3+;)4), which succeeds with
probability at least 1&1n2,
It is easy to see that the failure probability of the whole
algorithm is O(1n2)1n. Furthermore, this construction
leads to the recurrence
T(n)=O(log n)+T(O(n;)),
where T(m) denotes the time of the optimal (\, 1m)-fault-
tolerant EREW sorting algorithm using m processors.
Solving this recurrence, we find that T(n)=O(log n).
6. WORST-CASE FAULT-TOLERANCE
In this section, we extend our results for random faults
to construct worst-case fault-tolerant sorting circuits,
networks, and algorithms. All previous work on worst-case
faults seems to have focused on passive or destructive faults
[12, 1820]. We are not aware of any previous work on
sorting networks that are tolerant to worst-case reversal
faults, and no results were known for PRAM sorting algo-
rithms that are tolerant to worst-case faults. We define a
circuit, network, or algorithm to be k-fault-tolerant if the
circuit, network, or algorithm remains a sorting circuit,
network, or algorithm, even if any k or fewer comparators
(or comparisons in the case of an algorithm) are faulty.
Throughout this section, we will use the following simple
scheme to construct k-fault-tolerant circuits, networks, and
algorithms: Take a (\, =)-fault-tolerant circuit, network, or
algorithm where ==\k+1. Despite different technical details
for different fault and computational models, our hope is
that such a circuit, network, or algorithm will be able to
tolerate up to k worst-case faults. This is formally proved in
the next lemma, where Q should be interpreted as a sorting
related problem.
296 LEIGHTON, MA, AND PLAXTON
File: 571J 147033 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6175 Signs: 4985 . Length: 56 pic 0 pts, 236 mm
Lemma 6.1. Let A be a circuit, network, or algorithm
for solving a certain problem Q. If A is (\, =)-fault-tolerant,
then A is k-fault-tolerant for k=log\ =&1.
Proof. Assume for the purposes of contradiction that A
is not k-fault-tolerant. Then, there exists a set S of k or fewer
comparators (or comparisons if A is an algorithm) such
that if all the comparators (or comparisons) in S are faulty
and all of the other comparators (or comparisons) not in S
are correct, then the resulting faulty version of A fails to
solve the problem Q. On the other hand, if we make each
comparator (or comparison) in S faulty with probability \
and each comparator (or comparison) not in S faulty with
probability 0, then A fails to solve Q with probability
\ |S | =\k=\ log\ =&1>=. This contradicts the assumption
that A is (\, =)-fault-tolerant. K
The remainder of Section 6 is organized into four parts.
Sections 6.1 to 6.3 contain results for passive faults, reversal
faults, and EREW PRAM algorithms, respectively. We
conclude by pointing out in Section 6.4 that all of the results
in Section 6 can be proved independently of the fault-
tolerance of the l-AKS circuit established in Theorem 2.1.
6.1. Results for Passive Faults
In this section, we construct a k-passive-fault-tolerant
sorting circuit with small depth. Although a tight bound on
the size of worst-case passive-fault-tolerant sorting circuits
was derived by Yao and Yao [20] in 1985, our result
provides the first asymptotically nontrivial upper bound on
the depth of such circuits and is itself asymptotically
optimal over a large range of k
Theorem 6.1. There exists an explicit construction of
a k-passive-fault-tolerant sorting circuit with O(log n+
k log logk n) depth.
There is a trivial lower bound of 0(log n+k) on
the depth of k-passive-fault-tolerant sorting circuits. The
0(log n) lower bound follows from the trivial 0(log n)
lower bound for the fault-free case. In order for a circuit to
tolerate k passive-faults, each register r in the circuit has to
be connected to at least k+1 comparators. (Otherwise, all
of the comparators associated with r could be faulty, and
the circuit would fail if the item input to r should not be
output to r.) This implies a depth lower bound of 0(k).
Combining the 0(k) and 0(log n) lower bounds, we have a
depth lower bound of 0(log n+k). Therefore, the upper
bound of Theorem 6.1 is actually tight when k=O(log n
log log n) or k=0(n:) for any constant :>0.
In fact, when k=O(log n), by taking the circuit of
Theorem 3.1 and applying Lemma 6.1 with \ set to a
constant, we immediately get a k-passive-fault-tolerant
sorting circuit with O((log n+k) log log n) depth. For
larger k, we can construct a k-passive-fault-tolerant sorting
circuit with O((log n+k) log log n) depth by replicating
each comparator of the O(log n)-passive-fault-tolerant
sorting circuit 0(klog n) times. However, to achieve the
better depth bound claimed in Theorem 6.1, we will use a
slightly different approach.
Lemma 6.2. For some constant :<1, there exists an
explicit construction of a k-passive-fault-tolerant m-input
m:-approximate-sorting circuit with depth O(log m+k).
Proof. An m-input (log m)-passive-fault-tolerant m:-
approximate-sorting circuit can be easily constructed by
applying Lemmas 6.1 and 3.2. This proves the theorem for
the case klog m. When k>log m, the claimed circuit can
be constructed by replicating each comparator of the
(log m)-passive-fault-tolerant circuit Wklog mX times. K
Lemma 6.3. There exists an explicit construction of
an m-input O(m)-passive-fault-tolerant sorting circuit with
O(m) depth.
Proof. This follows from Lemmas 2.1 and 6.1. (Recall
that as pointed out in the proof of Lemma 2.1, for passive
faults, the circuit of Lemma 2.1 is actually a (\, \3(log m))-
fault-tolerant circuit for sorting, as opposed to approxi-
mate-sorting.)
Proof of Theorem 6.1. We first apply the k-passive-
fault-tolerant n:-approximate-sorting circuit of Lemma 6.2
to all n items. The outputs of this circuit form an O(n:)-
approximate-sorted list. Then, we apply two sets of O(n:)-
input O(n:2)-approximate-sorting circuits of Lemma 6.2 so
that the boundaries of the second set of circuits fall in the
centers of the first set of circuits. We can repeat this con-
struction within smaller and smaller groups until the group
size is no more than k, at which point we can finish up by
using the circuit of Lemma 6.3. (Detailed parameter choices
are similar to those given in the proof of Theorem 3.1.) This
construction leads to the formula on the depth D(n),
D(n)=O(k+log n)+O(k+log n:)+ } } }
+O(k+log n:i)+O(k),
where i is the smallest integer such that O(n:i)k. This
yields D(n)=O(log n+k log logk n). K
6.2. Results for Reversal Faults
We begin this section by proving that 2k for any
k-reversal-fault-tolerant 2-approximate-sorting circuit. Then,
we prove a lower bound on the depth of any k-reversal-
fault-tolerant k-approximate-sorting circuit. The most
important result in Section 6.2 is the construction of a
k-reversal-fault-tolerant k-approximate-sorting network
297BREAKING THE 3(n log2 n) BARRIER
File: 571J 147034 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5960 Signs: 4403 . Length: 56 pic 0 pts, 236 mm
with small size and depth. In light of a lower bound
established in [12, Theorem 1], this result separates the size
complexities of sorting networks with worst-case reversal
and destructive faults.
Theorem 6.2. For any k<n, there is no k-reversal-
fault-tolerant (k&1)-approximate-sorting circuit.
Proof. See Appendix C. K
Theorem 6.3. Any k-reversal-fault-tolerant k-approxi-
mate-sorting circuit has 0(k log(nk)) depth.
Proof. See Appendix D. K
We remark that the above proof can be extended to
obtain the same lower bound for the much simpler problem
of k-approximate-insertion. This requires slightly more
effort, and the proof is omitted.
Theorem 6.4. There exists an explicit construction of
a k-reversal-fault-tolerant k-approximate-sorting circuit
with O(k log n+k log2 k) depth and O(n(log n+
k log logk n+k log2 k)) size.
Lemma 6.4. There exists an explicit construction of an
m-input k-reversal-fault-tolerant k-approximate-insertion
circuit with O(k log m) depth and O(mk) size.
Proof. By using the proof technique of Lemma 2.1, it is
straightforward to prove that for any constant c>1, there
exists a ck-input oddeven transposition circuit of O(k)
depth that is a k-reversal-fault-tolerant k-approximate-sort-
ing circuit. Using this fact and the argument for Lemma 4.1,
it is easy to show that the circuit of Lemma 4.1 with l=O(k)
has the desired property. Note that the assumption
llog(ml ) in Lemma 4.1 was used only in the probabilistic
analysis. Hence, this assumption is unnecessary when con-
sidering worst-case faults. In fact, for worst-case faults, all of
the oddeven transposition circuits involved in the analysis
are k-approximate-sorting circuits. K
Lemma 6.5. For km(1&;)2 (where ; is the constant
specified in Corollary 2.1), there exists an explicit construc-
tion of an m-input k-reversal-fault-tolerant m:-approximate-
sorting circuit with O(k log m) depth and O(m(k+log m))
size, where : is some constant strictly less than 1.
Proof. By Lemma 6.1 and Corollary 2.1, for some
l=O(max[(klog m), 1]), an m-input partial l-AKS circuit
moves every item, except those in a set S with O(m34) items,
to within O(m;) of the correct position, even in the presence
of up to k worst-case reversal faults. This initial part of our
circuit has O(log m+k) depth and O(m(log m+k)) size.
We complete the claimed circuit by applying a set of
k-approximate-insertion circuits of Lemma 6.4 in a fashion
similar to that used in the proof of Lemma 3.2. This latter
part of our circuit has O(k log m) depth and O(mk) size. K
Lemma 6.6. There exists an explicit construction of an
m-input k-reversal-fault-tolerant k-approximate-sorting cir-
cuit with O(k log2m) depth and O(mk log2 m) size.
Proof. By using the argument of Lemma 4.3, it is not
hard to show that the circuit of Lemma 4.3 with l=k is a
k-reversal-fault-tolerant O(k)-approximate-sorting circuit
with O(k log2 m) depth and O(mk log2 m) size. (The
assumption lc log m was used in the probabilistic argu-
ment for Lemma 4.3; such an assumption is not needed for
worst-case faults, since we can actually argue that all of the
constituent oddeven transposition circuits are k-reversal-
fault-tolerant.) Given the O(k)-approximate-sorted list thus
produced, we can finish up by applying an oddeven trans-
position circuit of O(k) depth to the entire list. By the argu-
ment of Lemma 2.1, we can show that after O(k) steps of the
oddeven transposition circuit, every item is within k of
the correct position. K
Proof of Theorem 6.4. Given the circuit of Lemma 6.5,
we can construct the claimed circuit in a manner similar to
that used in the proof of Theorem 4.3. In particular, we
repeatedly apply the circuit of Lemma 6.5 until m=k2(1&;),
at which point we can directly apply the circuit of Lemma 6.6.
The depth of the entire circuit is at most
O(k log n)+O(k log n:)+ } } } +O(k log n:i)+ } } }
+O(k log k2(1&;))+O(k log2 k)
=O(k log n+k log2 k).
The size of the entire circuit is at most
O(n(k+log n))+O(n(k+log n:))+ } } }
+O(n(k+log:i))+ } } }
+O(n(k+log k2(1&;)))+O(nk log2 k)
=O \n \log n+k log log nlog k +k log2 k++ .
Theorem 6.5. There exists an explicit construction of
a k-reversal-fault-tolerant sorting network with O(n(log n+
k log logk n+klog2 3)) size.
The definition of the r-MAJORITY function in the next
lemma can be found following Theorem 4.4.
Lemma 6.7. [8]. There exists an explicit construction of
a k-reversal-fault-tolerant network with O(klog2 3) size that
computes the r-MAJORITY function with O(k) inputs for
some constant r # ( 12 , 1).
298 LEIGHTON, MA, AND PLAXTON
File: 571J 147035 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6240 Signs: 5322 . Length: 56 pic 0 pts, 236 mm
Proof. See [8, Lemma 3.2]. K
Proof of Theorem 6.5. We first apply the k-approxi-
mate-sorting circuit in Theorem 6.4. Then, we use the Assaf
Upfal [4] technique followed by n r-MAJORITY networks
with O(k) inputs each, as described in Lemma 6.7, for some
constant r # ( 12 , 1). The details are similar to those in the
proof of Theorem 4.4.
6.3. An Optimal EREW k-Fault-Tolerant Sorting Algorithm
This section contains an optimal k-fault-tolerant EREW
PRAM sorting algorithm.
Theorem 6.6. There exists an explicit EREW PRAM
k-fault-tolerant sorting algorithm that runs in asymptotically
optimal 3(log n+k) time on n processors.
Proof. We first prove that any k-fault-tolerant PRAM
sorting algorithm on n processors runs in 0(log n+k) time.
Since any sorting algorithm uses 0(n log n) comparisons, it
is sufficient to prove a lower bound of 0(k). In the PRAM
model, each processor can only make c comparisons at a
single step, where c is a constant. Assume for the purposes
of contradiction that a k-fault-tolerant sorting algorithm
runs in at most kc steps. Then there exists an output item
x that is involved in k or fewer comparisons. If these k or
fewer comparisons for x are all faulty, then we actually
know nothing about the rank of x. This is a contradiction.
According to Lemma 6.1, a (log n)-fault-tolerant EREW
sorting algorithm A with O(log n) running time on n
processors can be easily constructed from the algorithm of
Theorem 5.1. For k<log n, algorithm A satisfies all the
claimed properties. For k>log n, we construct another
algorithm A$ with the claimed property by simulating A, as
follows. In A$, we replace each comparison query of A by
(2k+1)log n comparisons, and we use the majority of the
answers as the final answer to that query. In order to make
an answer to a comparison query in A be incorrect, an
adversary would need to spend more than klog n faults for
A$. With up to k faults, the adversary can force at most
k(klog n)=log n incorrect comparison answers in A.
Since A can tolerate up to log n faults, A$ works correctly
with up to k faults. Since we simulate each comparison of A
by (2k+1)log n comparisons in A$, the running time of A$
is 3(klog n) times the running time of A. Hence, algorithm
A$ runs in 3(k) time. K
6.4. Remarks
Thus far, all of our worst-case fault-tolerant construc-
tions of circuits, networks, and algorithms have been based
on the corresponding constructions for random faults. Since
our results for random faults are dependent on the fault-
tolerance properties of the l-AKS circuit proved in
Theorem 2.1, our results for worst-case faults are also
dependent on Theorem 2.1.
Given the difficulty of proving Theorem 2.1, it is worth
pointing out that all of our results for worst-case faults can
be proved independently of Theorem 2.1. For this purpose,
the substitute for Theorem 2.1 is a lemma that is essentially
due to Yao and Yao [20], who proved the lemma for
passive faults. In what follows, we define the Hamming dis-
tance D(x, y) of any two 01 sequences x, y # [0, 1]n as the
number of positions where x and y differ. For any (possibly
faulty) circuit C, we use the notation C(x) to denote the 01
sequence that is output by C on input x. Yao and Yao’s
lemma states that for any circuits C and C*, where C* is a
faulty version of C with at most k passive or reversal faults,
D(C(x), C*(x))2k, where x is any 01 sequence in
[0, 1]n. Even though Yao and Yao considered passive faults
only, their proof can actually be extended to deal with rever-
sal faults. Given Yao and Yao’s lemma, we can construct
circuits, networks, and algorithms by an approach used in
[13]. In particular, we will need circuits that isolate extreme
items into a small group of registers. Such circuits can be
built by adapting the fault-tolerant minimum-finding algo-
rithm of [7] (see Lemma 5.1), or by adapting the passive-
fault-tolerant minimum-finding circuit of [6]. The details
are not simple and are omitted from this paper.
APPENDIX A: PROOF OF THEOREM 4.1
As pointed out in the Introduction, to get the strongest
possible result, \ should be interpreted as the failure prob-
ability of each comparator in the circuit, as opposed to an
upper bound on the failure probability. Throughout the
proof, we further assume that \ 12. This assumption does
not affect the generality of the proof, since a comparator
failure probability of \ can be viewed as a failure probability
of 1&\ if the MINMAX output assignment is reversed. We
first prove the following useful lemma.
Lemma A.1. For any m-input circuit C$, where m is
sufficiently large (depending on \), the probability that a
randomly faulty version of C$ is a 2-approximate-sorting
circuit is at most 1&\2 2+1 for 2<(m&1)2.
The lemma states that in the reversal fault model, there is
an inherent limitation on the success probability that can be
achieved by any approximate-sorting circuit. The lemma
also expresses a trade-off between the accuracy of approxi-
mation and the degree of reliability that can be achieved by
any circuit with reversal faults. Such a trade-off will be used
to prove Theorem 4.1.
For ease of notation, we next introduce the notion of a
register segment as follows. As described in the Intro-
duction, the comparators in a circuit are partitioned into
disjoint levels. We assume that the comparators nearest to
299BREAKING THE 3(n log2 n) BARRIER
File: 571J 147036 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 5996 Signs: 4222 . Length: 56 pic 0 pts, 236 mm
the inputs are at level 1. Also, we assume that the inputs to
a circuit are provided at level 0 and the outputs are
produced at level d+1, where d denotes the depth of the
circuit. A register segment is defined to be the part of a
register between two consecutive levels. Note that each level
is used to partition all of the registers into corresponding
register segments, even in those cases where the number of
comparators in the level is less than half the number of
registers. Hence, an m-input d-depth circuit contains
(d+1) m register segments, regardless of its size.
We next prove Lemma A.1. Since the lemma is concerned
with the functionality of C$, and not the depth of C$, we may
assume without loss of generality that C$ contains only one
comparator at each level. We label the registers in C$ as
r1 , ..., rm , from top to bottom. Let ri, j be the register
segment of ri between levels j and j+1 for j=0, ..., d, where
d denotes the depth of C$. For i=1, ..., m and j=0, ..., d,
define Pi, j to be the probability that 1 is contained in ri, j
when a uniformly chosen permutation of [1, ..., m] is fed
into a randomly faulty version of C$.
For any index set I/[1, ..., m] such that |I |<m,
consider the inequality
:
i # I
Pi, j1&\ |I |. (87)
It is easy to see that Lemma A.1 follows from Inequality
(87) with j=d and |I |=2 2+1. We now prove Inequality
(87) by induction on j.
Base case: j=0. On a uniformly chosen input permuta-
tion of [1, ..., m], each input register segment contains 1
with probability 1m. Hence, we need to verify that |I |m
1&\ |I | for any I such that |I |<m. It suffices to show that
f (x)=1&\x&
x
m
0 (88)
for 0xm&1. Clearly, f $(x)=&\x loge\&1m and
f $(x)=0 has a unique root in [0, m&1] when m is large.
Moreover, for m sufficiently large (depending on \),
f $(0)>0 and f $(m&1)<0. Hence, Inequality (88) follows
from: (i) f (0)=0; (ii) f (m&1)=1&\m&1&(m&1)m=
1m&\m&1>0 for m sufficiently large (depending on \).
Inductive step: Assuming that Inequality (87) holds up
to j&1, we next prove that Inequality (87) holds for j. By
assumption, there is only one comparator at level j in C$.
Assume that this comparator connects two registers ri1 and
ri2 . Clearly, if I contains both i1 and i2 , or if I contains
neither i1 nor i2 , then i # IPi, j=i # IPi, j&1 and the
correctness of Inequality (87) for j follows from the induc-
tion hypothesis. Hence, we can assume without loss of
generality that i1 # I and i2  I. Let I$=I&[i1]. By the
definition of reversal faults and the fact that \ 12 ,
:
i # I
Pi, j :
i # I$
Pi, j&1+(1&\)(Pi1, j&1+Pi2, j&1)
 :
i # I$
Pi, j&1+(1&\) \1& :i # I$ Pi, j&1+
=\ :
i # I$
Pi, j&1+1&\
\(1&\ |I$| )+1&\
(by the induction hypothesis)
=1&\ |I |.
This concludes the inductive proof of Inequality (87), as
well as the proof of Lemma A.1.
In what follows, let
m=
1&#
3
log1\ n+1 (89)
and
2=\m&22  . (90)
We next consider an arbitrary n-input circuit C, and prove
the theorem by showing that the probability a randomly
faulty version of C is a 2-approximate-sorting circuit is at
most e&n#log(1\).
Roughly speaking, we will use Lemma A.1 to prove
Theorem 4.1, as follows. Consider an input sequence
consisting of nm groups of size m such that all items in the
i th group are smaller than all items in the j th group for i< j.
If there is no comparator between items in different groups
(intuitively, such comparisons provide no additional infor-
mation), then each of the nm groups will be sorted by nm
independent smaller circuits. By Lemma A.1, the probabil-
ity that all of these groups will be 2-approximate-sorted is
upper bounded by
(1&\2 2+1)nm(1&\m&1)nm=(1&n&(1&#)3)nm
(e&n&(1&#)3)nme&n#log(1\)
for n sufficiently large, depending on \ and #. To deal with
the dependence problem caused by comparisons between
items in different groups, it will be convenient to use some
terminology introduced by Leighton and Ma in [12].
We define a fault pattern to be all of the information
needed to specify the functionality of all comparators in C;
i.e., a fault pattern completely describes which comparators
in C are faulty and which are not faulty. Given a fault
pattern F, we use C(F ) to denote the faulty version of C in
which the (correct or faulty) behavior of each comparator is
determined according to F. Let
300 LEIGHTON, MA, AND PLAXTON
File: 571J 147037 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6340 Signs: 4007 . Length: 56 pic 0 pts, 236 mm
?=\1, ..., 1
m
, 2, ..., 2
m
, ...,  nm|&1, ...,  nm|&1,
m
 nm| , ...,  nm|+ .
n&m(WnmX&1)
We define a comparator of C(F ) to be a crossing comparator
if it compares two distinct items when C(F ) is executed on
input sequence ?. We use Cross(F ) to denote the set of all
crossing comparators of C(F ). We label the n registers as
r1 , ..., rn , from top to bottom. We next describe a procedure
to decompose C(F ) into WnmX smaller circuits, C(F )1 , ...,
C(F )Wnm X . For i=1, ..., WnmX, we construct C(F ) i by
applying the following 4-step procedure: (i) input ? to C(F )
and observe how the input items move within C(F );
(ii) include all register segments (at all levels) of C that
receive the item i; (iii) include all comparators for which
both inputs receive item i; (iv) for any crossing comparator
that causes item i to move from ru, j to rv, j+1 for some j and
u{v, directly connect ru, j and rv, j+1 in C(F ) i . Due to the
direct connections introduced in step (iv), C(F )i may
contain some ‘‘twisted’’ registers, and look abnormal.
However, these abnormalities will not prevent us from
applying Lemma A.1 to C(F) i . (As a matter of fact, we
can construct a circuit that looks ‘‘normal’’ and that is
equivalent to C(F ) i , but we do not need this fact to apply
Lemma A.1.)
We next use an argument based on conditional proba-
bilities to show that for a random fault pattern F, with the
probability claimed in Theorem 4.1, C(F ) is not a
2-approximate-sorting circuit. In particular, we will
organize all of the fault patterns into groups and prove that
for a randomly generated fault pattern F within each group,
with the claimed probability, C(F ) is not a 2-approximate-
sorting circuit.
We organize all of the fault patterns into groups according
to Cross(F ), as follows. For any two fault patterns F and F $,
we put F and F $ in the same group if and only if Cross(F )=
Cross(F $). Within each group thus constructed, we choose
an arbitrary fault pattern as a representative of that group.
Then, we list all of the representatives as F1 , ..., Ft for some
t. To prove the theorem, we only need to show that for each
st, for a randomly generated F such that Cross(F )=
Cross(Fs), the probability that C(F ) is a 2-approximate-
sorting circuit is at most e&n#log(1\). The reason such a
conditional probability is easier to analyze is as follows: By
definition, the information of F on Cross(F ) completely
determines the decomposition of C(F ) into C(F )1 , ...,
C(F )Wnm X . Hence, full information of F on Cross(F )
completely determines the constructions of C(F )1 , ...,
C(F )Wnm X (although not the functionality of these
circuits) and will allow us to apply Lemma A.1 on the
wnmx independent smaller circuits C(F )1 , ..., C(F )w nmx .
Now, for each st, we have
Pr(C(F ) is a 2-approximate-sorting circuit,
given Cross(F )=Cross(Fs))
Pr(C(F ) i is a 2-approximate-sorting circuit
for iwnmx, given Cross(F )=Cross(Fs))
= ‘
1iwnmx
Pr(C(F ) i is a 2-approximate-sorting circuit)
(since the behaviors of the C(F ) i ’s
are mutually independent)
(1&\m&1)w nm x (by Lemma A.1 and Eq. (90))
(1&n(#&1)3)w nmx (by Eq. (89))
(e&n(#&1)3)w nm x (by the inequality 1&xe&x)
e&n#log(1\) (for n sufficiently large,
depending on \ and #).
This completes the proof of Theorem 4.1.
APPENDIX B: PROOF OF THEOREM 4.2
As pointed out in the Introduction, to get the strongest
possible result, \ should be interpreted as the failure
probability of each comparator in the circuit, as opposed
to an upper bound on the failure probability. In fact, if \
were interpreted as such an upper bound, the correctness
of Theorem 4.2 would follow from Theorem 6.3 with
k=O(log n).
In what follows, we assume that
=< 12. (91)
Let
4= log4 \n&122 +|&1 (92)
and
+=w log\ 2=x. (93)
The correctness of Theorem 4.1 follows from the next
lemma with ==n&c and 2=n#.
Lemma B.1 Any (\, =)-reversal-fault-tolerant 2-approx-
imate-insertion circuit has depth strictly greater than 4+.
We now prove Lemma B.1. Consider an arbitrary circuit
C that is a (\, =)-reversal-fault-tolerant approximate-inser-
tion circuit, and apply one of the two input permutations
301BREAKING THE 3(n log2 n) BARRIER
File: 571J 147038 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6359 Signs: 5004 . Length: 56 pic 0 pts, 236 mm
(1, ..., n) and (n, 1, 2, ..., n&1), each with probability 12 ,
to a randomly faulty version of C. Note that the item
contained in each register segment is a random variable
depending on the random input permutation and the
random faults assigned to C. (Recall that, as defined in
Appendix A, a register segment is the part of a register
between two consecutive levels in the circuit.)
For each register segment x, let X denote the random
variable corresponding to the item received by that register
segment, and let label(x) in [1, n] be chosen to satisfy
Pr(Xlabel(x)) 12
and
Pr(Xlabel(x)) 12 .
Note that label(x) always exists but may not be unique.
Define a register segment y to be a descendant of a register
segment x if the item received by x could subsequently enter
y in some faulty or nonfaulty, version of C. Also, define the
descendant set D(x, i) of x immediately after level i to be the
set of all descendants of x between levels i and i+1.
For any i such that 0i4 and for any pair of numbers,
r1 and r2 , such that
r2&r1
n&1
4i
>22,
we make the following pair of definitions. First, a register
segment x between levels i+ and i++1 is defined to be
(r1 , r2)-bad if and only if x receives an item less than or
equal to r1 with probability at least = and receives an item
greater than or equal to r2 with probability at least =.
Second, a descendant set D(x, i+) is defined to be (r1 , r2)-
bad if and only if one of the following two conditions holds:
(i) x receives an item less than or equal to r1 with
probability at least =, and the label of every register segment
in D(x, i+) is greater than or equal to r2 , or
(ii) x receives an item greater than or equal to r2 with
probability at least =, and the label of every register segment
in D(x, i+) is less than or equal to r1 .
Claim B.1. No output register segment of C can be
(r1 , r2)-bad. Furthermore, if C has depth 4+, then C cannot
contain an (r1 , r2)-bad descendant set D(x, 4+) for any
register segment x.
The first part of the claim is straightforward since C is
a (\, =)-reversal-fault-tolerant 2-approximate-insertion cir-
cuit. Now consider the second part of the claim. Assume for
the purposes of contradiction that C has depth of 4+ and
contains an (r1 , r2)-bad descendant set D(x, 4+) for some x.
Without loss of generality, we can assume that the first con-
dition in the definition of a descendant set holds (the other
case can be handled by an entirely symmetric argument).
Under this assumption, with probability at least =, x
receives an item less than or equal to r1 . This item will even-
tually enter some register segment in D(x, 4+). On the other
hand, every register segment in D(x, 4+) receives an item
greater than or equal to r2 with probability at least 12>=.
Hence, as required by the insertion problem, every register
segment y in D(x, 4+) is assigned an output rank greater
than or equal to r2&2. But if any register segment in
D(x, 4+) receives an item less than or equal to r1 , then C
will fail since (r2&2)&r1>2. This proves the claim.
Claim B.2. For any i such that 0i4, the circuit C
contains either an (r1 , r2)-bad register segment between levels
i+ and i++1, or an (r1 , r2)-bad descendant set D(x, i+) for
some register segment x.
We prove the claim by induction on i. For the base case,
i=0, note that the top input register segment x receives 1
with probability 12>=, and n with probability
1
2>=. Thus,
x is (1, n)-bad.
Now assume that the lemma holds for i= j, 0 j<4,
and consider the induction step, i= j+1. Let s=(n&1)4 j.
We first argue that if there is a register segment x between
levels j+ and j++1, and a register segment y belonging to
D(x, ( j+1) +), such that |label( y)&label(x)|s4, then y
is either (label(x), label( y))-bad or (label( y), label(x))-bad.
To see this, assume without loss of generality that
label( y)&label(x)s4 (the case where label(x)&
label( y)s4 can be handled by an entirely symmetric
argument). Note that y receives an item less than or equal
to label(x) with probability at least \+2, which is at least =
by Eq. (93). Furthermore, y receives an item greater than or
equal to label( y) with probability at least 12>=. Hence, y is
(label(x), label( y))-bad, since
label( y)&label(x)
s
4
=
n&1
4 j+1

n&1
44
>22 (94)
by Eq. (92).
By the argument of the preceding paragraph, we now
have the additional assumption that for every register
segment x between levels j+ and j++1, and every register
segment y belonging to D(x, ( j+1)+), |label( y)&
label(x)|<s4. Proceeding with the induction step, there are
two cases to consider.
Case 1: There is an (r1 , r2)-bad register segment x at
level j+. Let S denote the set of labels associated with the
register segments in D(x, ( j+1) +). By our additional
assumption, every pair of labels in S differ by strictly less
than s2. On the other hand, r2&r1(n&1)4 j=s. Hence,
either every label in S exceeds r1 by at least s4, or r2 exceeds
every label in S by at least s4. Therefore, D(x, ( j+1) +) is
either (r1 , r1+s4)-bad or (r2&s4, r2)-bad since s4>22
by Eq. (94).
302 LEIGHTON, MA, AND PLAXTON
File: 571J 147039 . By:CV . Date:01:04:97 . Time:12:59 LOP8M. V8.0. Page 01:01
Codes: 6317 Signs: 4835 . Length: 56 pic 0 pts, 236 mm
Case 2: There is an (r1 , r2)-bad descendant set
D(x, j+). We will assume that the first condition in the
definition of an (r1 , r2)-bad descendant set is satisfied;
the other case can be handled by an entirely symmetric
argument. Each register segment z in D(x, ( j+1) +) is a
descendant of some register segment y in D(x, j+). By our
additional assumption, label( y)&label(z)<s4. Further-
more, by the first condition in the definition of an (r1 , r2)-
bad descendant set, label( y)r2 . Hence, label(z)r2&s4,
and D(x, ( j+1) +) is (r1 , r2&s4)-bad. This concludes the
induction step for the proof of Claim B.2.
Finally, it is immediate that any circuit of depth d (not
necessarily a multiple of +) can be viewed as a circuit of
depth Wd+X + with the same behavior. Hence, the correct-
ness of Lemma B.1 follows from Claims B.1 and B.2. This
completes the proof of Theorem 4.2.
APPENDIX C: PROOF OF THEOREM 6.2
We focus on an arbitrary n-input circuit C and show that
some faulty version of C with at most k faults cannot
(k&1)-approximate-sort. Since the theorem is concerned
with the functionality of C, and not the depth of C, we can
assume without loss of generality that each level of C
contains only one comparator. We use the notions of fault
patterns and register segments as defined in the Introduc-
tion and Appendix A, respectively. Also, we use C(F ) to
denote the faulty version of C specified by fault pattern F.
We label the n registers of C as r1 , ..., rn , from top to bottom.
Let ri, j denote the register segment of ri between levels j and
j+1 for j=0, ..., d, where d is the depth of C.
In what follows, we assume that x is the item that is
supposed to be output to register r1 when a permutation of
[1, ..., n] is input to C. Let Ik( j) denote the set of all indices
i such that there exists a fault pattern F with k or fewer
faults, and a permutation ? of [1, ..., n], such that ri, j
contains x when ? is input to C(F ). We next prove that
|Ik( j)|k+1 for k<n. (95)
The fact that some faulty version of C does not (k&1)-
approximate-sort follows from the preceding inequality
with j=d, since x is supposed to be output to r1 and there
are at most k registers within k&1 of r1 .
We prove Inequality (95) by induction on j. The base case
j=0 is trivial, because each of the n input register segments
can contain item x.
Assuming that Inequality (90) holds up to j&1, we prove
that Inequality (95) holds for j. Assume
Ik( j&1)=[i1 , ..., is].
By the induction hypothesis, sk+1. Assume that the
unique comparator at level j connects registers ru and rv .
Since the comparator between ru and rv is the only com-
parator at level j, we have
Ik( j)$Ik( j&1)&[u] or Ik( j)$Ik( j&1)&[v]. (96)
There are three cases to consider:
Case 1: u  Ik( j&1) and v  Ik( j&1). In this case,
Ik( j)$Ik( j&1), and Inequality (95) follows from the
induction hypothesis.
Case 2: Exactly one of u and v is in Ik( j&1). Without
loss of generality, we assume that u # Ik( j&1) and
v  Ik( j&1). Hence, on some input permutation, we can
force ru, j&1 to contain x by using up to k faults before and
including level j&1. This will cause either ru, j or rv, j to
contain x. Hence, by relation (96), either Ik( j)$Ik( j&1) or
Ik( j)$Ik( j&1) _ [v]&[u]. In either case, Inequality (95)
holds for j.
Case 3: u # Ik( j&1) and v # Ik( j&1). In this case,
Ik( j&1)$Ik&1( j&1) _ [u, v]. Hence, if neither u nor v is
in Ik&1( j&1), then
|Ik( j&1)||Ik&1( j&1)|+2k+2,
where the second inequality follows from the induction
hypothesis. This, together with Relation (96), implies that
Inequality (95) holds for j. Therefore, we only need to check
the case where either u or v is in Ik&1( j&1). Without loss
of generality, we assume that u # Ik&1( j&1). Under this
assumption, on some input permutation, we can force
ru, j&1 to contain x by spending up to k&1 faults before and
including level j&1. Then, by setting the unique com-
parator at level j to be either faulty or correct, we can force
x to enter either ru, j or rv, j , whichever we desire. This means
u # Ik( j) and v # Ik( j). Hence, by Relation (96), Ik( j)$
Ik( j&1), which, together with the induction hypothesis
implies the correctness of Inequality (95) for j. This com-
pletes the inductive proof of Inequality (95), as well as the
proof of Theorem 6.2.
APPENDIX D: PROOF OF THEOREM 6.3
Let C be a k-reversal-fault-tolerant k-approximate-
sorting circuit with depth d. We will use the notion of
register segments as defined in Appendix A. In particular, let
r(i, l ) be the register segment of C that contains i between
levels l and l+1 when the identity permutation (1, ..., n) is
input to the nonfaulty version of C. For example, r(i, 0) is
the i th input register segment. For ease of notation, we
assume that r(i, l )=r(i, 0) for l<0.
Claim D.1 If circuit C contains at most k faults, then for
any integers l0 and i # [1, n], r(i, d&lk) can only contain
an item in [i&2lk, i+2lk].
303BREAKING THE 3(n log2 n) BARRIER
File: 571J 147040 . By:CV . Date:01:04:97 . Time:13:26 LOP8M. V8.0. Page 01:01
Codes: 7131 Signs: 5765 . Length: 56 pic 0 pts, 236 mm
We will prove the claim by induction on l. For the base
case, l=0, the claim is true since (the nonfaulty) C should
k-approximate-sort all permutations, including the identity
permutation. Assuming that the claim holds for l&1, we
now prove the claim for l.
Recall that as defined in Appendix B, a register segment y
is a descendant of a register segment x if and only if the item
contained in x could subsequently enter y in some faulty or
nonfaulty version of C. Also, D(x, j) denotes the set of
descendants of x between levels j and j+1. We next prove
that
D(r(i, d&lk), d&(l&1) k)
[r( j, d&(l&1) k): j # [i&2l&1k, i+2l&1k]]. (97)
Assume for the purposes of contradiction that the above
inequality does not hold. Then, there exists
j  [i&2l&1k, i+2l&1k] (98)
such that r( j, d&(l&1) k) is a descendant of r(i, d&lk).
Hence, there is a path from r(i, d&lk) to r( j, d&(l&1) k)
that spans k levels in C. On the other hand, r(i, d&lk)
receives i when the identity permutation is input to any
version of C that is fault-free in the first d&lk levels. By
forcing each of the k or fewer comparators along the path
from r(i, d&lk) to r( j, d&(l&1) k) to be faulty or correct,
as appropriate, we can force r( j, d&(l&1) k) to receive i.
However, by the induction hypothesis, r( j, d&(l&1) k)
can only receive an item in [ j&2l&1k, j+2l&1k] when C
contains at most k faults. Hence, we have i # [ j&2l&1k,
j+2l&1k], which contradicts Relation (98). This proves
Relation (97).
By Relation (97), if on any input permutation r(i, d&lk)
receives an item j $ when C contains k or fewer faults, then
j $ is eventually moved from r(i, d&lk) to a register segment
r( j, d&(l&1) k) for some j # [i&2l&1k, i+2l&1k]. On the
other hand, by the induction hypothesis, r( j, d&(l&1) k)
can only receive items in [ j&2l&1k, j+2l&1k] when
C contains at most k faults. Thus, j $ # [ j&2l&1k,
j+2l&1k][i&2lk, i+2lk]. This completes the induction
step for proving Claim D.1.
By Claim D.1 with l=WdkX, we find that the input
register segment r(1, 0) can only contain numbers in
[1&2lk, 1+2lk]. On the other hand, the item input to
r(1, 0) can be any number in [1, n]. Hence, we have
1+2lkn, which yields l=WdkXlog((n&1)k). This
completes the proof of Theorem 6.3.
ACKNOWLEDGMENTS
The authors thank Yiqun Yin for suggesting the use of the probabilistic
method for a particular problem that arose in the early stages of this
research. Her suggestion eventually motivated the use of the potential func-
tion that plays a very important role in the analysis of the AKS circuit.
REFERENCES
1. M. Ajtai, J. Komlo s, and E. Szemere di, Sorting in clog n parallel steps,
Combinatorica 3 (1983), 119; see also the conference version, in
‘‘Proceedings, of the 15th Annual ACM Symposium on the Theory of
Computing, May 1983,’’ pp. 19.
2. V. E. Alekseyev, Sorting algorithms with minimum memory, Kiber-
netika 5 (1969), 99103.
3. N. Alon and J. H. Spencer, ‘‘The Probabilistic Method,’’ Wiley
Interscience, New York, 1991.
4. S. Assaf and E. Upfal, Fault-tolerant sorting network, in ‘‘Proceedings,
31st Annual IEEE Symposium on Foundations of Computer Science,
October 1990,’’ pp. 275284.
5. K. E. Batcher, Sorting networks and their applications, in
‘‘Proceedings of the AFIPS Spring Joint Computer Conference,’’
Vol. 32, pp. 307314, 1968.
6. P. Donejko, K. Diks, A. Pelc, and M. Piotro w, Reliable minimum
finding comparator networks, in ‘‘Proceedings, 19th Symposium on
Mathematical Foundations of Computer Science, Kos ice, Slovakia,
August 1995’’ (I. Pr@ vara, B. Rovan, and P. Ruz ic ka, Eds.), Lect. Notes
in Comput. Sci., Vol. 841, pp. 306315, Springer-Verlag, New York,
1994.
7. U. Feige, D. Peleg, P. Raghavan, and E. Upfal, Computing with
unreliable information, in ‘‘Proceedings, 22nd Annual ACM Sym-
posium on the Theory of Computing, May 1990,’’ pp. 128137.
8. D. Kleitman, T. Leighton, and Y. Ma, On the design of Boolean
circuits that contain partially unreliable gates, in ‘‘Proceeding, 35th
Annual IEEE Symposium on Foundations of Computer Science,
November 1994,’’ pp. 332346.
9. D. E. Knuth, ‘‘The Art of Computer Programming. Vol. 3. Sorting and
Searching,’’ AddisonWesley, Reading, MA, 1973.
10. T. Leighton, ‘‘Introduction to Parallel Algorithms and Architectures:
Arrays, Trees, and Hypercubes,’’ Morgan-Kaufmann, San Mateo, CA,
1992.
11. T. Leighton and Y. Ma, Breaking the 3(n log2 n) barrier for sorting
with faults, in ‘‘Proceedings, 34th Annual IEEE Symposium on Foun-
dations of Computer Science, November 1993,’’ pp. 734743.
12. T. Leighton and Y. Ma, Tight bounds on the size of fault-tolerant
merging and sorting networks with destructive faults, in ‘‘Proceedings,
5th Annual ACM Symposium on Parallel Algorithms and Architec-
tures, July 1993,’’ pp. 3041.
13. T. Leighton, Y. Ma, and C. G. Plaxton, Highly fault-tolerant sorting
circuits, in ‘‘Proceedings, 32nd Annual IEEE Symposium on Founda-
tions of Computer Science, October 1991,’’ pp. 458469.
14. T. Leighton and B. Maggs, Expanders might be practical: Fast algo-
rithms for routing around faults in multibutterflies, in ‘‘Proceedings,
30th Annual IEEE Symposium on Foundations of Computer Science,
October 1990,’’ pp. 458469.
15. A. Lubotzky, R. Phillips, and P. Sarnak, Ramanujan graphs, Com-
binatorica 8 (1988), 261277.
16. M. S. Paterson, Improved sorting networks with O(log N) depth,
Algorithmica 5 (1990), 7592.
17. N. Pippenger, On networks of noisy gates, in ‘‘Proceedings, 26th
Annual IEEE Symposium on Foundations of Computer Science,
October 1985,’’ pp. 3036.
18. L. Rudolph, A robust sorting network, IEEE Trans. Comput. 34
(1985), 344354.
19. M. Schimmler and C. Starke, A correction network for N-sorter, SIAM
J. Comput. 18 (1989), 11791187.
20. A. C. Yao and F. F. Yao, On fault-tolerant networks for sorting, SIAM
J. Comput. 14 (1985), 120128.
304 LEIGHTON, MA, AND PLAXTON
