Computing Linear Transformations with Unreliable Components by Yang, Yaoqing et al.
1Computing Linear Transformations with Unreliable
Components
Yaoqing Yang, Pulkit Grover and Soummya Kar
Abstract—We consider the problem of computing a binary
linear transformation when all circuit components are unre-
liable. Two models of unreliable components are considered:
probabilistic errors and permanent errors. We introduce the
“ENCODED” technique that ensures that the error probability
of the computation of the linear transformation is kept bounded
below a small constant independent of the size of the linear
transformation even when all logic gates in the computation
are noisy. By deriving a lower bound, we show that in some
cases, the computational complexity of the ENCODED technique
achieves the optimal scaling in error probability. Further, we
examine the gain in energy-efficiency from use of a “voltage-
scaling” scheme where gate-energy is reduced by lowering the
supply voltage. We use a gate energy-reliability model to show
that tuning gate-energy appropriately at different stages of the
computation (“dynamic” voltage scaling), in conjunction with
ENCODED, can lead to orders of magnitude energy-savings over
the classical “uncoded” approach. Finally, we also examine the
problem of computing a linear transformation when noiseless
decoders can be used, providing upper and lower bounds to the
problem.
Index terms: error-correcting codes, encoding and decoding
errors, unreliable components, energy of coding and decoding.
I. INTRODUCTION
It is widely believed that noise and variation issues in
modern low-energy and low-area semiconductor devices call
for new design principles of circuits and systems [2]–[4].
From an energy-viewpoint, an urgent motivation for studying
noise in circuits comes from saturation of “Dennard’s scaling”
of energy with smaller technology [5]. Reducing CMOS
transistor size no longer leads to a guaranteed reduction in
energy consumption. Many novel devices are being explored
to continue reducing energy consumption, e.g. [6]. However,
such emerging low-energy technologies generally lack the
reliability of CMOS. On the other hand, aggressive design
principles, such as “voltage-scaling” (which is commonly used
in modern circuits), reduce energy consumption [7], but often
at a reliability cost: when the supply voltage is reduced below
the transistor’s threshold voltage, component variability results
in reduced control of component reliability. From an area
A preliminary version of this work [1] was presented in part at the 2016
IEEE International Symposium on Information Theory (ISIT). This work
is supported by NSF ECCS-1343324, NSF CCF-1350314 (NSF CAREER)
and NSF CNS-1702694 for Pulkit Grover, NSF ECCS-1306128, NSF CCF-
1513936, the Bertucci Graduate Fellowship for Yaoqing Yang, and by Systems
on Nanoscale Information fabriCs (SONIC), one of the six SRC STARnet
Centers, sponsored by MARCO and DARPA.
Y. Yang, P. Grover and S. Kar are with the Department of Electrical and
Computer Engineering, Carnegie Mellon University, Pittsburgh, PA, 15213,
USA. Email: {yyaoqing,pgrover,soummyak}@andrew.cmu.edu
viewpoint, as transistors become smaller and clock frequen-
cies become higher, noise margin of semiconductor devices
is reduced [8]. In fact, voltage variation, crosstalk, timing
jitter, thermal noise caused from increased power density and
quantum effects can all jeopardize reliability.1 Beyond CMOS,
circuits for many emerging technologies, such as those built
out of carbon-nanotubes [10], suffer from reliability problems,
such as wire misalignment and metallic carbon-nanotubes [11].
Thus, for a host of factors, circuit reliability is becoming an
increasingly important issue.
While most modern implementations use overwhelmingly
reliable transistors, an appealing idea is to deliberately allow
errors in computation, and design circuits and systems that
“embrace randomness and statistics, treating them as oppor-
tunities rather than problems” [3]. Inspired by the triumph
of Shannon theory in dealing with noise in communication
channels [12], von Neumann initialized the study of noise
in circuits [13]. He showed that even when circuit compo-
nents are noisy, it is possible to bias the output towards
the correct output using repetition-based schemes. Repeated
computations, followed by majority voting, have been used in
some applications to make circuits error-tolerant [14]–[16]. In
fact, many functional-block-level or algorithmic error-tolerant
designs have been tested on real systems [14], [17], [18].
However, in absence of a comprehensive understanding of
the fundamental tradeoffs between redundancy and reliability,
these designs have no guarantees on the gap from optimality.
Can use of sophisticated codes help? For storage, which can
be viewed as computing the identity function, Low-Density
Parity-Check (LDPC) codes [19] and Expander codes [20]
have been used to correct errors [21]–[24]. Closer in spirit of
computation with noisy elements, in [21]–[23], the decoders
(though not the encoders) for storage are assumed to be noisy
as well. In [25], adaptive coding is used to correct memory
faults for fault-tolerant approximate computing. Decoding with
noisy elements has become an area of active research [26]–
[30]. In [26]–[30], noisy decoders performing message-passing
algorithms are analyzed using the density evolution tech-
nique [31], [32]. The idea of using noisy decoders, and
yet achieving reliable performance, is further extended to
noisy discrete-time error-resilient linear systems in [33], where
LDPC decoding is utilized to correct state errors after each
state transition. Error control coding is also used in fault-
tolerant parallel computing [34], AND-type one-step comput-
ing with unreliable components [35] and applied to error-
1e.g. in source-drain channels of small CMOS transistors, the number
of electrons can be so few that laws of large numbers may not apply [9],
increasing the component variability.
ar
X
iv
:1
50
6.
07
23
4v
3 
 [c
s.I
T]
  1
3 M
ay
 20
17
resilient systems on chips (SoCs) [36]. Unfortunately, all of the
above works on using sophisticated codes in noisy computing
have one major intellectual and practical shortcoming: while
they use noisy gates to perform some computations, they all
assume absolute reliability in either the encoding part, or the
decoding part, or both.
The perspective of allowing some noiseless gates in noisy
computing problems has permeated in the investigation of
fundamental limits as well (e.g. [37]–[39]), where, assuming
that encoding and/or decoding are free, the authors derive
fundamental limits on required resources for computation with
noisy elements with no assumptions on the computation strat-
egy. Can one choose to ignore costs associated with encoding
or decoding? While ignoring these costs is reasonable in
long-range noisy communication problems [40], where the
required transmit energy tends to dominate encoding/decoding
computation energy, recent work shows this can yield unreal-
istically optimistic results in short-range communication [40]–
[44] and noisy computing [45], especially in the context
of energy. These works derive fundamental limits for sim-
plistic implementation models that account for total energy
consumption, including that of encoding and decoding, in
communication [41]–[44] and computing [45].
In this paper, we investigate the problem of reliable2
computation of binary linear transformations using circuits
built entirely out of unreliable components, including the
circuitry for introducing redundancy and correcting errors.
In Section III, we study the problem of computing linear
transformations using homogeneous noisy gates, all of which
are drawn from the same faulty gate model. We consider
both probabilistic error models (transient gate errors) [46] and
permanent-errors models (defective gates) [28]. The problem
formulation and reliability models are detailed in Section II.
The key to our construction is the “ENCODED” technique
(Encoded Computation with Decoders EmbeddeD), in which
noisy decoders are embedded inside the noisy encoder to re-
peatedly suppress errors (Section IV). The entire computation
process is partitioned into multiple stages by utilizing the
properties of an encoded form of the linear transformation
matrix (see Section III-A for details). In each stage, errors are
introduced due to gate failures, and then suppressed by embed-
ded noisy decoders [27], preventing them from accumulating.
Intuition on why embedded decoders are useful is provided in
Section III-B.
In Section III and IV, we show that using ENCODED
with LDPC decoders, an L×K binary linear transformation
can be computed with O(L) operations per output bit, while
the output bit error probability is maintained below a small
constant that is independent of L and K. In Section IV-C,
we use expander LDPC codes to achieve worst-case error
tolerance using these codes, while still using error-prone
decoding circuitry. We show that ENCODED can tolerate
defective gates errors as long as the fraction of defective gates
is below a small constant. We also obtain a stronger result on
2Note that the notion of reliability here differs from that in Shannon theory.
The goal here is to bound the error-probability by a small constant that
depends on the gate-error probability, but does not depend on the size of
the computation.
the computational complexity when the block error probability,
instead of bit error probability, is specified: by deriving a
fundamental lower bound (when the linear transform has full
row rank), we show that the computational complexity per
bit matches the lower bound in the scaling of the target error
probability, as the required block error probability approaches
zero. Interestingly, in the derivation of this lower bound, we
allow the circuit to use noiseless gates to perform decoding
operations. In Section IV-D, we use simulations to show that
using exactly the same types of noisy gates (even with the
same fan-in), the achieved bit error ratio and the number
of iterations of ENCODED are both smaller than those of
repetition-based schemes. Since computing energy is closely
related to the number of operations, this shows an energy
advantage of our ENCODED technique as well.
In Section V, we go a step further and systematically
study the effect of tunable supply voltage (“dynamic” voltage
scaling) on the total energy consumption by modeling energy-
reliability tradeoffs at gate-level. For dynamic scaling, the
gates are no longer homogeneous. We introduce a two phase
algorithm in which the first phase is similar to ENCODED
with homogeneous gates, but in the second phase, the voltage
(and hence gate-energy) is tuned appropriately, which leads
to orders of magnitude energy savings when compared with
“static” voltage scaling (where the supply voltage is kept
constant through the entire computation process). For example,
when the required output bit error probability is ptar, for
polynomial decay of gate error probability  with gate energy
E (i.e.,  = 1Ec ), the energy consumption per output bit
is O
(
N
K max
{
L,
(
1
ptar
) 1
c
})
with dynamic voltage scaling,
while it is Θ(NLK (
1
ptar
)
1
c ) for the static case (we note that
energy for ENCODED with static voltage scaling is still
smaller than “uncoded” with static voltage scaling, which
is Ω(L( Lptar )
1
c )). Finally, in Section VI, for deriving a lower
bound as well as to connect with much of the existing litera-
ture, we allow the circuit to use noiseless gates for decoding.
We derive (asymptotically) matching upper and lower bounds
on required number of gates to attain a target error-probability.
A. Related Work
In spirit, our scheme is similar to von Neumann’s repetition-
based construction [13] where an error-correction stage fol-
lows each computation stage to keep errors suppressed. Sub-
sequent works [47]–[49] focus on minimizing the number of
redundant gates while making error probability below a small
constant. The difference from our work here is that these
works do not allow (noiseless) precomputation based on the
the knowledge of the required function, which our scheme
(ENCODED) explicitly relies on. Therefore, our results are
applicable when the same function needs to be computed
multiple times for (possibly) different inputs, and thus the one-
time cost of a precomputation is worth paying for. Thus, we do
not include the preprocessing costs of the linear transformation
matrix in the computational complexity calculation, and we
assume all preprocessing can be done offline in a noise-free
2
fashion3.
We note that the algorithm introduced by Hadjicostis in [50],
which is applied to finite-state linear systems, is similar to ours
in that he also uses a matrix encoding scheme. However, [50]
assumes that encoding and decoding procedures are noiseless,
which we do not assume. In [51], Cucu Laurenciu, et al.
designed a fault-tolerant computing scheme that embeds the
encoding of an error control code into the logical functionality
of the circuit. Decoding units are allowed to be noisy as
well. However, the computation size is small, and theoretical
guarantees are not provided. In [48, Theorem 4.4], Pippenger
designed an algorithm to compute a binary linear transforma-
tion with noisy gates. The algorithm requires gates with fan-in
223 and a gate-error probability of 35 · 2−50. While the fan-
in values are unrealistically high, the gate-error probability
is also low enough that most practical computations can be
executed correctly using “uncoded” strategies, possibly the
reason why it has not received significant attention within
circuits community. At a technical level, unlike the multi-
stage computing scheme used in our work, Pippenger uses
exhaustive enumeration of all linear combinations with length
1
3 logL for computing, where L is the number of rows in
the binary linear transformation. Lastly, we note here that
Pippenger’s scheme only works for the case when the number
of columns K in the binary linear transformation matrix
and the code length N of the utilized LDPC code satisfies
K = Θ(N3), while our algorithm works in a more practical
scenario where K = Θ(N).
This work builds on our earlier work [52], in which the
problem of reliable communication with a noisy encoder is
studied. In [52], noisy decoders are embedded in the noisy
encoder to repeatedly suppress errors. The noisy encoding
problem is a special case of computing noisy linear transfor-
mation when the linear transformation matrix is the generator
matrix of an error-correcting code. In [53], an augmented
encoding approach was introduced to protect the encoder from
hardware faults using extra parity bits. In [54], which considers
a similar problem, errors are modelled as erasures on the
encoding Tanner graph.
Outside information theory, fault-tolerant linear transfor-
mations and related matrix operations have been studied
extensively in algorithm-based fault tolerance [55]–[59]. The
main difference in our model is that faults happen at the
circuit-level, e.g., in AND gates and XOR gates. Instead, in
[55]–[59], each functional block, e.g. a vector inner product,
fails with a constant probability. If errors are considered at
gate level, the error probability of a vector inner product will
approach 1/2 [33] as vector size grows, and one may not
be able to use these schemes. Fault-detection algorithms on
circuits and systems with unreliable computation units have
also been studied extensively [14], [60]–[63]. However, these
algorithms assume that the detection units are reliable, which
we do not assume. Moreover, using error control coding, we
can combine the error detection and correction in the same
processing unit.
3This difference in problem formulation is also why some of our achievable
results on computational complexity might appear to beat the lower bounds
of [47]–[49].
II. SYSTEM MODEL AND PROBLEM FORMULATION
A. Circuit Model
We first introduce unreliable gate models and circuit models
that we will use in this paper. We consider two types of
unreliable gates: probabilistic gates and defective gates.
Definition 1. (Gate Model I (D, )) The gates in this model
are probabilistically unreliable in that they compute a deter-
ministic boolean function g with additional noise zg
y = g(u1, u2, ..., udg )⊕ zg, (1)
where dg denotes the number of inputs and is bounded above
by a constant D > 3, ⊕ denotes the XOR-operation and zg
is a boolean random variable which takes the value 1 with
probability  which is assumed to be smaller than 12 . The event
zg = 1 means the gate g fails and flips the correct output.
Furthermore, in this model, all gates fail independently of each
other and the failure events during multiple uses of a single
gate are also independent of each other. We allow different
kinds of gates (e.g. XOR, majority, etc.) to fail with different
probabilities. However, different gates of the same kind are
assumed to fail with the same error probability4.
This model is similar to the one studied in [49] and the
failure event is often referred to as a transient fault. Our next
model abstracts defective gates that suffer from permanent
failures.
Definition 2. (Gate Model II (D,n, α)) In a set of n gates,
each gate is either perfect or defective. A perfect gate always
yields a correct output function
y = g(u1, u2, ..., udg ), (2)
where dg denotes the number of inputs and is bounded above
by a constant D > 3. A defective gate outputs a deterministic
boolean function of the correct output y˜ = f(g(·)). This
function can be either f(x) = x¯ (NOT function), f(x) = 0 or
f(x) = 1. The fraction of defective gates in the set of n gates
is denoted by α. We assume that measurement techniques
cannot be used to distinguish between defective gates and
perfect gates5.
From the definition, a defective gate may repeatedly output
the value 1 no matter what the input is, which is often referred
to as a “stuck-at error”. This might happen, for example, when
a circuit wire gets shorted.
Remark 1. In fact, we can generalize all the results in this
paper on the permanent error model (Gate Model II) to
arbitrary but bounded error model, in which errors occur in
4A weaker assumption is that different gates fail independently, but with
different probabilities all smaller than , which is called -approximate [48].
The ENCODED technique also works for this model. Also note that our model
is limited in the sense that the error probability  does not depend on the gate
input. This may not be realistic because the gate error probability can also
depend on the input and even the previous gate outputs, which is also noted
in [64]. However, the assumption that  does not depend on the gate input
can be relaxed by assuming that  is the maximum error probability over all
different input instances.
5Defective gates may result from component aging after being sold, and
examining each gate in circuitry is in practice extremely hard. The storage
units are easier to examine [65], but replacing faulty memory cells requires
replacing an entire row or column in the memory cell array [66].
3
 1
i
s
 
ai
s
 
1
j
y
 
bj
y
Fig. 1. This figure shows an unreliable gate g (Gate Model I or II) in a
noisy circuit defined in Definition 4. A noisy circuit is constituted by many
unreliable gates, which form the set G.
a worst-case fashion but no more than a fixed fraction. The
latter error model has been used in worst-case analyses in
coding theory and arbitrarily varying channels [67]. However,
for consistency with the existing literature on error-prone
decoding with permanent errors [23], [28], we limit our
exposition to these errors.
The computation in a noisy circuit is assumed to proceed
in discrete steps for which it is helpful to have circuits that
have storage components.
Definition 3. (Register) A register is an error-free storage unit
that outputs the stored binary value. A register has one input.
At the end of a time slot, the stored value in a register is
changed to its input value if this register is chosen to be
updated.
Remark 2. We assume that registers are noise-free only for
clarity of exposition. It is relatively straightforward to incor-
porate in our analysis the case when registers fail probabilisti-
cally. A small increase in error probability of gates can absorb
the error probability of registers. A similar change allows us
to incorporate permanent errors in the registers as well.
Definition 4. (Noisy Circuit Model (G,R)) A noisy circuit
is a network of binary inputs s = (s1, s2, ...sL), unreliable
gates G = {g1, g2, ..., gS } and registers R = {r1, r2, ..., rT }.
Each unreliable gate g ∈ G can have inputs that are ele-
ments of s, or outputs of other gates, or from outputs of
registers. That is, the inputs to an unreliable gate g are
si1 , . . . , sia , yj1 , . . . , yjb , rk1 , . . . , rkc , where a + b + c = dg ,
the total number of inputs to this gate. Each register r ∈ R
can have its single input from the circuit inputs s, outputs of
unreliable gates or outputs of other registers. For simplicity,
wires in a noisy circuit are assumed to be noiseless.
Definition 5. (Noisy Computation Model (L,K,Ncomp)) A
computing scheme F employs a noisy circuit to compute a
set of binary outputs r = (r1, r2, ...rK) according to a set
of binary inputs s = (s1, s2, ...sL) in multiple stages. At
each stage, a subset of all unreliable gates G are activated
to perform a computation and a subset of all registers R are
updated. At the completion of the final stage, the computation
outputs are stored in a subset of R. The number of activated
unreliable gates in the t-th stage is denoted by N tcomp. Denote
by Ncomp the total number of unreliable operations (one
unreliable operation means one activation of a single unreliable
gate) executed in the noisy computation scheme, which is
obtained by
Ncomp =
T∑
t=1
N tcomp, (3)
where T is the total number of stages, which is predetermined.
The noisy computation model is the same as a sequential
circuit with a clock. The number of stages T is the number of
time slots that we use to compute the linear transform. In each
time slot t, the circuit computes an intermediate function ft(x)
using the computation units on the circuit, and the result ft(x)
is stored in the registers for the computation in the next time
slot t + 1. The overall number of stages T is predetermined
(fixed before the computation starts).
Remark 3. A computing scheme should be feasible, that is, in
each time slot, all the gates that provide inputs to an activated
gate, or a register to be updated, should be activated.
In this paper, we will only consider noisy circuits that are
either composed entirely of probabilistic gates defined in Gate
Model I or entirely of unreliable gates in Gate Model II. Note
that if we consider probabilistic gates, the noisy circuit can
be transformed into an equivalent circuit that does not have
registers. This is because, since the probabilistic gate failures
are (assumed to be) independent over operations, we can
replicate each gate in the original circuit multiple times such
that each gate in the equivalent circuit is only activated once.
This circuit transformation is used in the proof of Theorem 4.
B. Problem Statement
The problem considered in this paper is that of computing
a binary linear transformation r = s ·A using a noisy circuit,
where the input vector s = (s1, s2, ...sL), the output vector
r = (r1, r2, ...rK) and the L-by-K (linear transformation)
matrix A are all composed of binary entries. We consider
the problem of designing a feasible (see Remark 3) computing
scheme F for computing r = s·A with respect to Definition 5.
Suppose the correct output is r. Denote by rˆ = (rˆ1, rˆ2, ...rˆK)
the (random) output vector of the designed computing scheme
F . Note that the number of operationsNcomp has been defined
in Definition 5. The computational complexity per bit Nper-bit
is defined as the total number of operations per output bit in
the computing scheme. That is
Nper-bit = Ncomp/K. (4)
For gates from Gate Model I (Definition 1), we are in-
terested in the usual metrics of bit-error probability P bite =
1
K
K∑
k=1
Pr(rˆk 6= rk) and block-error probability P blke = Pr(rˆ 6=
r), averaged over uniformly distributed inputs s and noise
realizations. In addition, in the spirit of “excess distortion”
formulation in information theory [68], we are also interested
in keeping the fraction of (output) errors bounded with high
probability. This could be of interest, e.g., in approximate
computing problems. To that end, we define another metric,
δfrace , the “bit-error fraction,” which is simply the Hamming
4
distortion between the computed output and the correct output
(per output bit). That is, δfrace = maxs
1
K
K∑
k=1
1{rˆk 6=rk}, where
1{·} is the indicator function. The bit-error fraction depends
on the noise, which is random in Gate Model I. Thus, we will
constrain it probabilistically (see Problem 26). The resulting
problems are stated as follows:
Problem 1.
min
F
Nper-bit, s.t. Pe < ptar, (5)
where ptar > 0 is the target bit error probability, and Pe could
be P bite or P
blk
e .
Problem 2.
min
F
Nper-bit, s.t. Pr(δfrace < ptar) > 1− δ, (6)
where ptar > 0 is the target block error fraction and δ is a
small constant.
When we consider the Gate Model II (Definition 2), since
all gates are deterministic functions, we are interested in
the worst-case fraction of errors δfrace . Thus, the optimization
problem can be stated as follows:
Problem 3.
min
F
Nper-bit, s.t. max
s,Sidef s.t.|Sidef|<αinF,i,∀i∈W
δfrace < ptar, (7)
where s is the input vector, Sidef is the set of defective gates of
type i, W is the set of indices of different types of noisy gates
(such as AND gates, XOR gates and majority gates), αi is the
error fraction of the gates of type i, nF,i is the total number
of gates of type i in the implementation of F , and ptar > 0 is
the target fraction of errors. Note that nF,i is chosen by the
designer as a part of choosing F , while the error-fraction αi
is assumed to be known to the designer in advance.
Throughout this paper, we rely on the family of Bachmann-
Landau notation [69] (i.e. “big-O” notation). For any two
functions f(x) and g(x) defined on some subset of R, asymp-
totically (as x → ∞), f(x) = O(g(x)) if |f(x)| ≤ c2|g(x)|;
f(x) = Ω(g(x)) if |f(x)| ≥ c1|g(x)|; and f(x) = Θ(g(x))
if c3|g(x)| ≤ |f(x)| ≤ c4|g(x)| for some positive real-valued
constants c1, c2, c3, c4.
C. Technical Preliminaries
First we state a lemma that we will use frequently.
Lemma 1 ( [19], pp. 41, Lemma 4.1). Suppose Xi, i =
1, . . . , L, are independent Bernoulli random variables and
Pr(Xi = 1) = pi,∀i. Then
Pr(
L∑
i=1
Xi = 1) =
1
2
[
1−
L∏
i=1
(1− 2pi)
]
, (8)
where the summation is over F2, i.e., 1 + 1 = 0.
We will use error control coding to facilitate the computa-
tion of the binary linear transformation. Here, we introduce
some notations related to the codes that we will use. We will
6We will show that the bit-error fraction is constrained probabilistically
(see Problem 2) for all input vector s.
use a regular LDPC code [19], [31] with code length N ,
dimension K and a K × N generator matrix G written as
G =

← g1 →
← g2 →
← ... →
← gK →
 . (9)
where each row gk is a length-N codeword. In the LDPC
Tanner graph, denote the degree of a variable node v by dv
and the degree of a parity check node c by dc. The embedded
decoders use either the Gallager-B decoding algorithm which
is a 1-bit hard-decision based decoding algorithm proposed
in [19] and is included for completeness in Appendix A, or
the parallel bit flipping (PBF) algorithm, which is also a hard-
decision algorithm proposed in [20]. In particular, we use the
modified parallel bit flipping algorithm defined in [70].
Definition 6. The PBF algorithm is defined as follows
• Flip each variable node that is connected to more than
dv
2 unsatisfied parity check nodes;
• Set the value of each variable node connected to exactly
dv/2 unsatisfied parity-check nodes to 0(or 1) with prob-
ability 1/2;
• Update all parity check nodes;
• Repeat the first three steps for ce logN times, where ce
is a constant.
The PBF algorithm can be used to correct a constant
fraction of errors after Θ(logN) decoding iterations when the
computing components in the decoder are noiseless and the
error fraction is small enough. However, since we will consider
noisy decoders, we will build on a more refined result, which
concerns a single decoding iteration of the algorithm (see the
following requirement (A.3) and Lemma 8).
In our main results, we may require the utilized LDPC code
to satisfy some of (not all) the following conditions.
• (A.1) Degree Bound: The variable node degree dv and
the parity check node degree dc are both less than or
equal to D, so that each majority or XOR-operation (in
the Gallager-B decoding algorithm) can be carried out
by a single unreliable gate. Moreover, we assume that
the variable node degree dv ≥ 4,∀v.
• (A.2) Large Girth: The girth lg = Θ(logN). An LDPC
code with the following girth lower bound is obtained
in [19], [32]:
lg >
2 logN
log((dv − 1)(dc − 1)) − 2cg, (10)
where cg = 1− log
dcdv−dc−dv
2dc
log((dv−1)(dc−1)) is a constant that does
not depend on N .
• (A.3) Worst-case Error Correcting:One iteration of the
PBF algorithm using a noiseless decoder can bring down
the number of errors in the codeword from α0N to (1−
θ)α0N for two constants α0, θ ∈ (0, 1), for any possible
patterns of α0N errors.
The requirement in (A.2) can be met by using codes introduced
in [32] or using the PEG construction proposed in [71]. The
requirement in (A.3) can be met either by using (dv, dc)-
regular random code ensembles and using the analysis in
5
[70], or by using regular Expanders [20]. In particular, in
Appendix D, we show that almost all codes in the (9, 18)-
regular code ensemble of sufficiently large length N can
reduce the number of errors by θ = 15% after one iteration of
the PBF algorithm, if the fraction of errors is upper-bounded
by α0 ≤ 5.1 · 10−4. We also show that at least 4.86% of the
(9, 18)-regular codes of length N = 50, 000 can reduce the
number of errors by θ = 15% after one iteration of the PBF
algorithm, if the number of errors satisfies α0N ≤ 20, which
is equivalent to α0 ≤ 0.0004.
III. ENCODED: ENCODED COMPUTATION WITH
DECODERS EMBEDDED
In this section, we present the main scheme that we use
for noisy computation of linear transformations. We call this
scheme “ENCODED” (Encoded Computation with Decoders
EmbeddeD). We aim to provide an overview of ENCODED
in this section. Then, this scheme will be modified to a tree-
structured scheme, ENCODED-T, in Section IV-A and further
modified to a bit-flipping-based technique ENCODED-F in
Section IV-C.
A. ENCODED: A Multi-stage Error-Resilient Computation
Scheme
Instead of computing a binary linear transformation r = s·A
without using any redundancy, we will compute
x = r ·G = s ·AG, (11)
where G = [I,P] = [g1;g2; ...;gK ] is the K × N generator
matrix of the chosen systematic LDPC code. The matrix
product AG is assumed to be computed offline in a noise-
free fashion. An important observation is that since all rows
in the matrix product AG are linear combinations of the rows
in the generator matrix G, the rows of AG are codewords as
well. That is,
G˜ = AG =

← g˜1 →
← g˜2 →
← ... →
← g˜L →
 (12)
where each row g˜l, l = 1, . . . , L is a codeword. Then, if the
computation were noiseless, the correct computation result r =
s ·A could be obtained from the combined result
x = [r, r ·P] = r ·G. (13)
Since r ·G = s ·AG = s · G˜,
x = s · G˜ =
L∑
l=1
slg˜l. (14)
In the following sections, we will explain how error control
coding can be used to reliably compute x =
L∑
l=1
slg˜l. The
basic idea is as follows: we break the computation into L
stages, so that the noiseless intermediate result after the l-
th stage would be x(l) =
l∑
j=1
sj g˜j . When gates are noise-
free, x(l) is a codeword. When gates are noisy, during the l-th
Noiseless
Encoder
Input NoisyComputation
Noiseless
Decoder Outputs 
Input 
Classical 
ENCODED
Coded Noisy
Computation
Noisy
Decoder
Outputs 
Fig. 2. An illustration of the conceptual difference between classical noisy
computing schemes and the ENCODED technique.
stage, we first compute x(l−1) + slg˜l using noisy AND gates
(binary multiplication) and noisy XOR gates (binary addition)
and then correct errors (with high probability) using an LDPC
decoder or an expander decoder to get x(l). During the entire
computing process, AND gates and XOR gates introduce
errors, while the noisy decoders suppress errors. Finally, it will
be proved in Theorem 1 and Theorem 3 that error probability
is maintained below a small constant. We summarize the
ENCODED technique in Algorithm 1. Compared to many
Algorithm 1 ENCODED (Encoded Computation with
Decoders EmbeddeD)
INPUT: A binary vector s = (s1, s2, ...sL).
OUTPUT: A binary vector x = (x1, x2, ...xN ).
INITIALIZE
Compute G˜ = AG = [g˜1; g˜2; ...; g˜L]. Store an all-zero vector
x(0) in an N -bit register.
FOR l from 1 to L
• Use N unreliable AND gates to multiply sl with g˜l, the
l-th row in G˜, add this result to x(l−1) using N unreliable
XOR gates, and store the result in the N -bit register.7
• Use an unreliable decoder to correct errors and get x(l).
END
Output x(L) as the output x.
classical results [34], [35], [37], [55] on applying error control
coding to noisy computing, instead of computing after encod-
ing, the proposed scheme combines encoding and computing
into a joint module (see Fig. 2). Because there is no separation
between computing and encoding, in some sense, we encode
the computation, rather than encoding the message. We briefly
discuss the intuition underlying the ENCODED technique in
6
Section III-B. We note that, we change the FOR-loop in Alg. 1
to a tree-structure (ENCODED-T) in Section IV-A in order
to reduce error accumulation as explained in Remark 4 in
Section IV-A.
B. Intuition underlying the Embedded Decoders
The basic idea of our proposed computing scheme is to split
the computation into a multistage computation of x =
L∑
l=1
slg˜l,
and use embedded decoders inside the noisy circuit to repeat-
edly suppress errors as the computation proceeds. Since the
noisy circuit can only be constructed using unreliable gates, the
embedded decoders are also constituted by unreliable gates.
Why is such a multistage computation helpful? For in-
stance, if “uncoded” matrix multiplication r = sA is carried
out, each output bit is computed using an inner product,
and O(L) unreliable AND and XOR-operations are required.
Without repeated suppression, each output bit is erroneous
with probability 12 as L→∞. Intermediate and repeated error
suppression alleviates this error accumulation problem. Can
one use a feedback structure, as is used in Turbo codes [72]
and ARA codes [73] (these codes often have a feedback
structure [74] for encoding), to keep errors suppressed, instead
of the LDPC codes used here? A feedback structure can
be detrimental since errors persist in the feedback loop and
propagate to the future, which can make the final bit error
probability large. This observation motivated us to use LDPC
codes.
Also note that due to the ‘last-gate’ effect in noisy circuits,
error probability cannot approach zero. Thus, our goal is not
to eliminate errors, but to suppress them so that the error
probability (or the error fraction) is kept bounded below a
target value that depends on the error probability of the last
gate.
IV. MAIN RESULTS ON COMPUTING A BINARY LINEAR
TRANSFORMATION WITH EMBEDDED DECODERS
In this section, we show that a linear transformation can
be computed ‘reliably’ (in accordance with the goals of
Problems 1-3 in Section II-B) even in presence of noise,
using error control coding. We provide three results, one
each for formulations in Problem 1 to Problem 3. These
results are obtained using two variants of ENCODED, which
we call ENCODED-T and ENCODED-F. ENCODED-T uses
Gallager-B decoding algorithm for error suppression, while
ENCODED-F uses the PBF algorithm. The implementation
details of ENCODED-T and ENCODED-F are respectively
provided in Section IV-A and Section IV-C. We also compare
resource requirements of this coding-based computation with
repetition-based computation using simulations (in Section
IV-D.)
A. ENCODED-T: A Scheme for Reliable Computation of
Linear Transformations under Gate Model I
The utilized unreliable gates in the computing scheme are
AND gates, XOR gates and majority gates that are defined
7These operations are assumed to be performed noiselessly, as discussed
earlier.
in Gate Model I, with error probabilities pand, pxor and pmaj
respectively. We change the FOR-loop of ENCODED in
Section III-A slightly and use a D-branch tree with depth
M instead. We call this computing scheme “ENCODED-T”
(ENCODED scheme with a Tree structure) and is conceptually
illustrated in Fig. 3(a). We use this tree structure because it
reduces the number of stages of the computing scheme from L
in the FOR-loop in Alg. 1 to Θ(logL). This reduces the extent
of information mixing caused by message-passing decoding
(in comparison with ENCODED’s sequential structure), which
introduces correlation among messages and makes the density
evolution analysis difficult. This issue will be detailed in
Remark 4.
The message s = (s1, ..., sL) is input from the leaf nodes.
The output x = s · G˜ = (x1, ..., xN ) is computed from bottom
to top and finally obtained at the root. Note that the tree
structure is not necessarily a complete tree. Specifically, the
tree is complete from the first level to the (M − 1)-th level,
i.e., the level just above the bottom level, and the number of
nodes in the bottom level is chosen such that the number of
leaf nodes is L. An illustration of a non-complete 3-branch
tree with L = 22 leaf nodes (which are colored red) is shown
in Fig. 3(a). From Fig. 3(a), we see that by removing the
leaf children-nodes of each complete non-leaf node (a non-leaf
node with dT leaf children-nodes), we effectively reduce the
number of leaf-nodes by (dT − 1), because the dT removed
leaf children-nodes (red nodes) is replaced by one non-leaf
node that will turn into a leaf node. Therefore, it is easy to
see that the total number of non-leaf nodes is
⌈
L−1
dT−1
⌉
.
Each of the L leaf nodes has one of the L rows of the matrix
G˜, i.e., g˜1 to g˜L stored in it. At the start of the computing
process, the l-th node of the first L nodes calculates sl·g˜l using
N unreliable AND gates and stores it as an intermediate result
in a register. In the upper levels, each non-leaf node performs a
component-wise XOR-operation of the dT intermediate results
from dT children. Observe that if no gate errors occur, the root
gets the the binary sum of all sl · gl, i = 1, . . . , L, which is
the correct codeword x = s · G˜.
In order to deal with errors caused by unreliable gates,
each non-leaf tree node is a compute-and-correct unit shown
in Fig. 3(c). Unreliable XOR gates are used to perform the
component-wise XOR-operation of the intermediate results. A
noisy Gallager-B decoder (see Appendix A) is used to correct
errors in the associated register after the XOR-operation. Note
that the number of bits transmitted from variable nodes to
parity check nodes during each Gallager-B decoding iteration
is E = dvN , where dv is the variable node degree of the
Tanner graph and E is the total number of edges. Therefore,
at a tree node, the register stores E = dvN bits as intermediate
results instead of N bits as in Algorithm 18. These E bits of
messages can be viewed as dv copies of the corresponding
N -bit codeword, with indices from 1 to dv . (The technique
of replicating N bits into E = dvN bits was first used in
[21], [22] for the correction circuit in fault-tolerant storage,
8We have to store E bits instead of an N -bit codeword, because we need
the i.i.d. assumption of messages in the density evolution analysis. Note that
by storing these E bits, the corresponding N -bit codeword can be retrieved
either using a noisy majority vote or a random selection.
7
BE
R
pli
mpre
g
s1s2 sL
 (0)p
 y=x
 y=f(x)
 (1)p
 (2)p
pregplim p0
p0
pXiI
fXpdddIXiI
St
ag
es
ANDdOperations
TreedStructure
dTdcodewords
L-bitdinputXaI XbI
Outputdto
NextdStage 1dcodeword
NoisydGallager
BdDecoder
Unreliabled
XordGate
dTdcodewords
Inputsdfrom
ChildrendNodes
XcI XdI
St
ag
es
Fig. 3. (a) shows the tree structure of the noisy computing scheme. During the computing process, the bit error probability is bounded between two constants
preg and plim shown in (b). (c) shows a compute-and-correct structure. The bit error probability evolution in one embedded decoder is shown in (d).
which is known as the Taylor-Kuznetsov memory. A related
technique was used in [33, Pg. 216] to construct fault-tolerant
linear systems. See [75] for details.) The XOR-operations at
the beginning of the next stage are also done on codeword
copies with the same index.
Before sending the output to the parent-node, each node
performs C iterations of the message-passing decoding on the
E-bit intermediate result obtained by the XOR-operations with
the embedded decoder. Note that there are C iterations of
decoding at each non-leaf node of the tree structure shown
in Fig. 3. We will use the index i = 1, . . . , C, to number
the decoding iterations done at a single tree node, which is
different from the index m used to number the level in the tree
structure. However, we will show that it suffices to use C = 1
iteration (see Lemma 2) to bring the bit error probability of
the E-bit intermediate result back to pmaj + 1dT pthr. In the
noisy decoder, the bit error probability of the intermediate
result, assuming the decoding neighborhood is cycle-free for
i+ 1 iterations (in fact, the number of levels of the decoding
neighborhood grows by 2C at each tree node, see Remark 4 for
details), follows the density evolution P (i+1)e = f(P
(i)
e ) and
the explicit expression of function f(·) is given in Lemma 4
in Appendix B. This evolution is illustrated in Fig. 3(d).
The bit error probability follows the directed path shown in
Fig. 3(d), and asymptotically approaches the fixed point of the
density evolution function as number of iterations increases
if decoding neighborhoods continue to satisfy the cycle-free
assumption (which no fixed good code does). However, the
expression of f(·) is complicated, so we provide an upper
bound f(P (i)e ) < f0(P
(i)
e ) in Lemma 5 for further analysis.
During the computing process, the XOR-operations intro-
duce errors, while the Gallager-B decoding process suppresses
them. The bit error probability of the intermediate result
is reduced repeatedly at different stages and is ensured to
be bounded within the interval (pmaj, preg) (as illustrated in
Fig. 3(b)), where the parameter preg is defined as
preg = pxor + (dT + 1)pthr. (16)
Remark 4. An astute reader might notice that as the compu-
tation process proceeds, the message-passing algorithm intro-
duces increasing correlation in messages as they are passed
up the tree. This introduces a difficulty in analyzing the
8
TreesStructuresins
ENCODED-Ts
DecodingsNeighborhood
insthes,M-2)-thsLevel
DecodingsNeighborhood
insthes,M-1)-thsLevel
IndependentsMessages
M-thsLevel
,M-1)-thsLevel
dTscodewordssXORed
,M-2)-thsLevel
XOR XOR
XOR
Thesestwosmessagessaresdependent
duestosdecodingsatsthes,M-1)-thslevel.
Thesmessagessthatsaressentsoutsfromsthesredsnodesv1sare
sdependent,swhichsinducesthescorrelationsinsthes,M-2)-thslevel.
Fig. 4. This figure shows the decoding neighborhoods in two adjacent levels of the tree structure in ENCODED-T. The black nodes in the decoding
neighborhood denote variable nodes, while the white ones denote parity check nodes. The decoding neighborhood in the (M − 2)-th level is not cycle-free
due to a short cycle of length 8. Note that the tree structure in ENCODED-T and the decoding neighborhoods are two completely different things and should
not be confused.
Algorithm 2 ENCODED-T
INPUT: A binary vector s = (s1, s2, ...sL).
OUTPUT: A binary vector x = (x1, x2, ...xN ).
NOTATION: Denote by vkm the k-th node in the m-th level
of the tree structure. Denote the stored intermediate result in
node v by yv and vlm by y
l
m. Denote the dT children-nodes
of vlm by D(vlm).
INITIALIZE
For 1 ≤ l ≤ L, use noisy AND gates to create dv copies of
sl ·gl and store them as an E-bit vector y˜lM in the register of
vlM , or in the (M−1)-th level with the appropriate index. All of
these E-bit vectors are stored as the first layer of intermediate
results.
FOR m from M − 1 to 1
• Each non-leaf node vkm calculates the XOR of the outputs
of its dT (or less if the node is not complete) children-
nodes and writes the result in its own E-bit register
(computations could be noisy):
y˜km =
⊕
v∈D(vkm)
y˜v, (15)
• Each node vkm performs one iteration of the message-
passing decoding.
END
Change the E-bit vector y˜11 back to the N -bit codeword y
1
1
by randomly selecting one copy. Output y11 as the output y.
error probability using density-evolution techniques, because
density evolution relies on the cycle-free assumption on the
decoding neighborhoods9. As shown in Fig. 4, the decoding
neighborhoods of the same level in the ENCODED-T tree have
the same neighborhood structure (because all nodes use the
same LDPC code and the number of decoding iterations is a
constant across the same level), while the decoding neighbor-
hood on the upper level, the (M − 2)-th level (see Fig. 4),
grows by a depth of 2C compared to the adjacent lower level,
the (M − 1)-th level. Note that the XOR-operations before
the LDPC decoding iterations do not change the nodes in the
decoding neighborhood (again, because the same code is used
at all nodes). Due to small girth of the LDPC code chosen in
the example in Fig. 4 (girth = 8), the decoding neighborhood is
not cycle free, and hence the messages sent to a variable node
can be correlated. In Fig. 4, the correlation between messages
at the root of the decoding neighborhood is caused by the
correlation between messages sent by the red-colored node
ν1. We circumvent this issue10 by choosing a code of girth
large enough so that no correlations are introduced even over
multiple levels of the tree. The tree structure of ENCODED-
T reduces the number of levels exponentially (in comparison
9In fact, the analysis of message-passing type algorithms implemented on
general graphs with cycles are the key question to many real-world problems,
and convergence results are only known in some limited cases [33], [76].
10Another possible way could be to prove convergence to cycle-free
neighborhoods through randomized constructions, as is done in [26], [31].
9
with ENCODED), thereby reducing the girth requirement.
To understand this, as in [31], [32], we define the number
of independent iterations of an LDPC code as the maximum
number of iterations of message-passing decoding such that
the messages sent to any variable node or parity check node
are independent (i.e., number of iterations after which the code
reaches its girth). We also use the phrase ‘overall number of
decoding iterations’ in ENCODED-T to denote the sum (over
levels) of maximum number of iterations over all nodes at each
level of the ENCODED-T tree. For instance, if C decoding
iterations are executed at each tree node, then the “overall”
number of iterations is CM , where M is the number of levels
in the tree. Intuitively, this quantifies the extent of information
mixing in the Tanner graph by the message-passing algorithm.
We need to ensure that the overall number of decoding
iterations is smaller than the number of independent iterations
of the LDPC code used. For a fixed C, the tree structure of
ENCODED-T requires the number of independent iterations,
and hence also the code-girth, to be Θ(logN/ log dT ).
The importance of the ENCODED-tree structure also be-
comes clear: the FOR-loop in ENCODED (Alg. 1) has expo-
nentially larger number of levels, and requires a large girth of
Θ(L) for L stages of the algorithm to maintain independence
of messages in a decoding neighborhood at the root node.
The number of independent iterations of message-passing
decoding of an LDPC code can be made as large as
Θ( logNlog(dv−1)(dc−1) ) (see assumption (A.2)). Thus, for dT and
N large enough, the overall decoding neighborhoods are cycle
free, and density evolution analysis is valid. More details are
provided in Appendix B.
In Section IV-C, we provide another way to handle this
correlation issue that uses codes designed for correcting worst-
case errors. For these codes, correlations do not matter as long
as the number of errors does not exceed a certain threshold.
B. Analysis of ENCODED-T
In what follows, we prove Theorem 1, which characterizes
the performance of Algorithm 2.
Theorem 1 (Error Suppression Using Gallager-B decoding for
Problem 1). Using unreliable AND gates, majority gates and
XOR gates from Gate Model I with respective gate error
probabilities pand, pxor and pmaj that satisfy11
pthr :=
(
dv − 1
bdv−12 c
)− 1d−1
dc
− dd−1 d
− 1d−1
T (dT + 1)
− dd−1 ,
pmaj ≤ pthr,
pxor ≤ dT + 1
dc
pthr,
pand ≤ dT + 1
dT
pthr.
(17)
where d =
⌈
dv−1
2
⌉
, and the tree width dT satisfies⌈
logL
log dT
⌉
≤ logN
2 log(dv − 1)(dc − 1) , (18)
11In this result, we obtain different conditions on the error probabilities of
different types of gates as sufficiency conditions, instead of a uniform bound
on all gates. We do so to offer maximum flexibility to the circuit designer.
the binary linear transformation r = s · A can be computed
with output bit error probability P bite ≤ pmaj+pthr · 1dT using the
ENCODED technique provided in Alg. 2 (see Section IV-A)
that uses an LDPC code which satisfies assumptions (A.1) and
(A.2). Further, the number of operations per bit satisfies
Nper-bit ≤ 3E
K
·
⌈
L− 1
dT − 1
⌉
+ LE/K = Θ
(
LN
K
)
. (19)
where E is the number of edges in the Tanner graph, K is the
number of outputs and N is the code length of the utilized
LDPC code.
Remark 5. Note that P bite remains bounded even as L is
increased. Thus, we can compute linear transforms with large
size and still obtain good error performance.
Proof of Theorem 1. We construct the computation tree of the
ENCODED-T technique (see Section IV-A and Section IV-B)
for computing x = s·AG = s·G˜ as described in Section IV-A.
In all, there are M levels, and in the m-th level the number
of nodes is chosen such that the overall number of leaf-nodes
is L. Each leaf node has the codeword g˜l stored in it at the
beginning of computation, where g˜l is the l-th row of G˜, an
L-by-N matrix (see (12)). Note that M satisfies dM−2T < L ≤
dM−1T , or
M =
⌈
logL
log dT
+ 1
⌉
. (20)
To ensure that the number of leaf nodes is L, the number of
non-leaf nodes Snl is
Snl =
⌈
L− 1
dT − 1
⌉
. (21)
1) Error probability analysis: Define p0 to be the bit error
probability at the start of the first round of the Gallager-
B decoding carried out in the (M − 1)-th level of the
computing tree in Fig 3(b). Since at this time, each bit is
computed by XORing dT bits (see Fig 3(a)) where each bit
is calculated from an AND-operation (see Fig 3(b)), we know
from Lemma 1 that
p0 =
1
2
[
1− (1− 2pxor)(1− 2pand)dT
]
(a)
<
1
2
[1− (1− 2pxor)(1− 2dT pand)] < pxor + dT pand,
(22)
where step (a) is from the inequality (1 − x)dT > 1 − dTx,
x ∈ (0, 1). Using definition of preg in (16) and condition (17)
p0 < pxor + (dT + 1)pthr = preg. (23)
In the following, we will prove that as long as the error
probabilities of noisy gates satisfy (17), the noisy Gallager-
B decoder can make the bit error probability fall below
pmaj +
1
dT
pthr after one iteration of decoding.
Lemma 2. Suppose all noise parameters satisfy (17). Then,
as long as the bit error probability before decoding is smaller
than preg defined in (16), after one iteration of decoding, the
bit error probability pdece satisfies
pmaj < p
dec
e ≤ pmaj +
1
dT
pthr. (24)
Proof. See Appendix B.
10
Remark 6. Lemma 2 describes the behavior shown in Fig. 3
(d). For noiseless LDPC decoders [32] [77] [31], the fixed
point of the density evolution function is zero when the code is
operated for a channel that has small enough noise. However,
for noisy density evolution, it can be shown that the fixed point
plim = pmaj +O(p2maj + p2xor) ≈ pmaj [27, Thm 5].
Therefore, after one iteration of Gallager-B decoding, the
bit error probability reduces from p0 to something less than
pmaj +
1
dT
pthr. Then, the corrected codeword is sent to the
parent-node in the next level of the tree structure. After that,
the XOR-operation of dT codewords from children-nodes are
carried out. From Lemma 1, we know that the error probability
after this XOR-operation is upper bounded by
1
2
[
1− (1− 2pxor)
[
1− 2(pmaj + 1
dT
pthr)
]dT ]
<pxor + dT pmaj + pthr
(a)
< pxor + (dT + 1)pthr
(b)
= preg,
(25)
where (a) follows from (17) and (b) follows from (16).
Therefore, we can reuse the result in Lemma 2 and carry out a
new Gallager-B decoding iteration. To summarize, the bit error
probability starts at p0 and then oscillates between preg and
pmaj during the entire computation process. This behavior is
qualitatively illustrated in Fig. 3(b) and numerically illustrated
through simulation results in Fig. 6.
2) Computational complexity analysis: Each compute-and-
correct unit is constituted by E D-fan-in noisy XOR gates,
an E-bit register and a Gallager-B decoder, where E is the
number of edges in the LDPC bipartite graph.
The operations required in one iteration of decoding are E
XOR-operations and E majority-operations, because on each
edge that connects a variable node v and a parity-check node
c, two messages m(i)c→v and m
(i)
v→c are calculated respectively
using an XOR-operation and a majority-operation. Thus, the
number of operations to carry out during one iteration of
decoding is 2E. Since the number of non-leaf tree nodes is
Snl =
⌈
L−1
dT−1
⌉
(see (21)), the number of operations required
in all non-leaf nodes is at most (2E +E)
⌈
L−1
dT−1
⌉
= O(EL).
The operations required in the first layer (input-layer) of
the computing tree are LE AND-operations. The computing
process outputs K bits, so the number of operations required
per bit is 3EK ·
⌈
L−1
dT−1
⌉
+ LE/K = O(LE/K). Since the
number of edges E = dvN = Θ(N), we know that the
number of operations per bit is in the order of O(LN/K).
C. ENCODED-F: A scheme for Reliable Computation of
Linear Transformations under Gate Model II
In this Section, we consider unreliable gates defined in
Definition 2, which are either perfect or defective. We con-
struct a computing scheme using a decoding scheme different
from that of ENCODED-T that is still able to attain a small
error fraction in the final output. This computing scheme
operates in exactly the same manner as ENCODED (see
Alg. 1). However, the embedded noisy decoder is a PBF
decoder. This computation scheme is named ENCODED-
F (ENCODED using the flipping algorithm). We modify
ENCODED as follows: we partition the entire computing
process into
⌈
L
ds−1
⌉
stages, where ds is called the group size.
First, we store an all-zero codeword in the N -bit register.
In the l-th stage, we first use (ds − 1)N AND gates to
obtain the ds − 1 scalar-vector multiplications si · g˜i for
i ∈ {(ds − 1)(l − 1) + 1, (ds − 1)(l − 1) + 2, . . . , (ds − 1)l},
where g˜i is the i-th row of the combined matrix G˜ = AG.
Then, we use N XOR gates to add the ds − 1 results to the
N -bit register. The parameter ds is chosen so that ds ≤ D,
the maximum input to each noisy gate. After that, we use
one iteration of the PBF algorithm (see Section III) to correct
errors. We use P XOR gates and N majority gates in one
iteration of the PBF algorithm.
We note here that the tree structure in ENCODED-T (see
Alg. 2) could be used in the ENCODED-F. However, we still
use the FOR-loop structure in Alg. 1, because the resulting
maximum tolerable gate error probability is smaller using
the FOR-loop structure. This is for two reasons (i) The
tree structure in ENCODED-T was motivated by induced
correlations in messages as they are passed up. However,
correlations do not matter when decoding using the PBF
algorithm; (ii) Interestingly, there is a benefit to using the
FOR-loop structure: in ENCODED-T, the error probability
increases by a factor of dT at every level due to XOR-ing
of dT inputs. Since these XOR operations are not needed
in ENCODED-F, to keep error probability suppressed, the
constraints on the gate error probability will thus be more
severe in ENCODED-T than in ENCODED-F.
In what follows, we prove Theorem 2, which quantifies
the error-tolerance of ENCODED-F. The basic tool used to
prove Theorem 2 is a modified version of the worst-case error
correcting result in the requirement (A.3), which provides the
worst-case error correcting capability of regular LDPC codes
using one iteration of noisy PBF decoding.
Theorem 2 (Error Suppression Using the PBF algorithm for
Problem 3). Using unreliable AND gates, XOR gates and
majority gates from Gate Model II (D,n, α) with respective
error fractions αand, αxor and αmaj respectively, and using an
(dv, dc)-regular LDPC code which satisfies (A.3) to implement
ENCODED-F with group size ds, as long as
(ds − 1)αand + [D(1−R) + 1]αxor + αmaj < θα0, (26)
the binary linear transformation r = sA can be computed
using N AND gates, (N + P ) XOR gates, and N major-
ity gates, and the number of operations per bit is at most
2N+P
K
⌈
L
ds−1
⌉
+ NLK = Θ(
LN
K ). Further, the error fraction of
the final output is at most α0.
Proof of Theorem 2. We use induction on the stage index l to
derive an upper bound on the number of errors. In the first
stage, N(ds − 1) AND gates and N XOR gates introduce at
most N [(ds − 1)αand + αxor] errors, which is upper bounded
by
(ds − 1)αand + αxor < θα0, (27)
which can be obtained by combining (26). Suppose in the
(l − 1)-th stage, after adding a set of (ds − 1) codewords
si · g˜i, i ∈ {(ds− 1)(l− 2) + 1, (ds− 1)(l− 2) + 2, . . . , (ds−
11
1)(l−1)} to the N -bit register, the number of errors is strictly
less than Nα0.
Then, according to condition (A.3), if no computation errors
occur during execution of one iteration of PBF algorithm, the
fraction of errors can be reduced to Nα0(1−θ). Whenever an
XOR gate flips the corresponding parity check value during
the PBF algorithm, it affects at most dc majority gates. In
total, there are P XOR gates used in one iteration of the
PBF algorithm, so there are at most αxorPdc errors due to
XOR errors in the PBF algorithm. There are at most αmajN
errors due to majority gate failures. After this iteration of bit
flipping, another set of (ds − 1) codewords sigi is added to
the N -bit register with N(ds − 1) AND gates and N XOR
gates. These two operations introduce (αxor +αand(ds− 1))N
errors. Therefore, the total error fraction before the next PBF
algorithm is upper bounded (using the union bound) by
αPBF ≤ Nα0(1− θ)+[dc(1−R) + 1]αxor
+αmaj + (ds − 1)αand,
(28)
where R is the code rate (R = N−PN ). As long as (26) holds
and dc ≤ D, before the next bit flipping, it holds that
αPBF ≤ Nα0(1− θ) + θNα0 = α0N. (29)
Therefore, the induction can proceed.
In each stage, we need N + P XOR-operations and N
majority-operations. During the entire computation, we need
NL AND-operations. Therefore, the computational complex-
ity per output bit, which is the total number of opera-
tions in dL/(ds − 1)e stages divided by K bits, is (2N +
P ) dL/(ds − 1)e /K + NLK .
ENCODED-F can be applied to Gate Model I as well, which
is characterized in the following theorem.
Theorem 3 (Error Suppression Using PBF algorithm for Prob-
lem 2). Using unreliable AND gates, majority gates and
XOR gates from Gate Model I with respective gate error
probabilities pand, pxor and pmaj, and using an (dv, dc)-regular
LDPC code which satisfies (A.3) to implement ENCODED-F
with group size ds, as long as
max{pand, pxor, pmaj}
< λ :=
θα0/2
(ds − 1) + [dc(1−R) + 1] + 1 ,
(30)
the binary linear transformation r = s · A can be computed
using 2N+PK
⌈
L
ds−1
⌉
+ NLK = Θ(
LN
K ) operations per bit.
Further, the final error fraction δfrace satisfies
Pr(δfrace < α0) > 1− P blke , (31)
where the probability P blke satisfies
P blke < 3L exp (−λ∗N) , (32)
where
λ∗ = D(2λ‖λ) = (2 log 2− 1)λ+O(λ2). (33)
However, in probabilistic settings, the number of errors at
any stage could exceed Nα0. In what follows, we use large
deviation analysis to show that the probability of exceeding
Nα0 is small. First, we review the large deviation result for
binomial distribution [78, page 502, Example 3].
Lemma 3. Let Xi, i = 1, . . . , N be N i.i.d. binary random
variables with Pr[Xi = 1] = p. Then
Pr
[
1
N
N∑
i=1
Xi > (p+ λ)
]
< exp [−D(p+ λ‖p)N ] , (34)
where D(p+λ‖p) = (p+λ) loge p+λp +(1−p−λ) loge 1−p−λ1−p .
Further, if p < λ,
Pr
[
1
N
N∑
i=1
Xi > (p+ λ)
]
< exp [−D(2λ‖λ)N ] . (35)
Proof. The inequality (34) is the large deviation bound for
binomial distribution and is presented in [78, page 502,
Example 3]. Note that D(p+λ‖p) is monotone non-increasing
for p ∈ (0, λ). When p < λ, we have D(p+λ‖p) > D(2λ‖λ).
Therefore, (34) holds for p < λ.
Then, Theorem 3 follows from Theorem 2 and Lemma 3.
Proof of Theorem 3. Using Theorem 2, we know that if the
error fraction in all stages is bounded by the inequality (26),
the final error fraction is at most Nα0.
From Lemma 3, we know that
Pr(αand > pand + λ) < exp [−D(pand + λ‖pand)N ]
< exp[−D(2λ‖λ)N ], (36)
Pr(αxor > pxor + λ) < exp [−D(pxor + λ‖pxor)N ]
< exp[−D(2λ‖λ)N ], (37)
Pr(αmaj > pmaj + λ) < exp [−D(pmaj + λ‖pmaj)N ]
< exp[−D(2λ‖λ)N ]. (38)
Setting λ = θα0/2(ds−1)+[dc(1−R)+1]+1 as in the condition (30), we
have
(ds − 1)pand + [dc(1−R) + 1] pxor + pmaj
< (ds − 1)λ+ [dc(1−R) + 1]λ+ λ = θα0
2
.
Therefore,
Pr ((ds − 1)αand + [dc(1−R) + 1]αxor + αmaj > θα0)
<Pr ((ds − 1)αand + [dc(1−R) + 1]αxor + αmaj
> (ds − 1)pand + [dc(1−R) + 1] pxor + pmaj + θα0
2
)
<Pr((ds − 1)αand > (ds − 1)pand + (ds − 1)λ)
+ Pr([dc(1−R) + 1]αxor > [dc(1−R) + 1] pxor
+ [dc(1−R) + 1]λ)
+ Pr(αmaj > pmaj + λ)
= Pr (αand > pand + λ) + Pr (αxor > pxor + λ)
+ Pr (αmaj > pmaj + λ)
<3 exp(−ND(2λ‖λ)),
12
where
D(2λ‖λ) = 2λ log 2 + (1− 2λ) log 1− 2λ
1− λ
= 2λ log 2− (1− 2λ) log
(
1 +
λ
1− 2λ
)
= 2λ log 2− (1− 2λ)
(
λ
1− 2λ +O(λ
2)
)
= (2 log 2− 1)λ+O(λ2).
(39)
Since (ds − 1)αand + αmaj < (ds − 1)αand +
[dc(1−R) + 1]αxor + αmaj, we also have
Pr ((ds − 1)αand + αmaj > θα0) < 3 exp (−D(2λ‖λ)N) .
Therefore, using the union bound for the L stages, the
total error probability is upper bounded by P blke <
3L exp (−D(2λ‖λ)N).
Remark 7. The analysis of the PBF algorithm (see Ap-
pendix D) still requires randomized code constructions. An-
other method to analyze the bit flipping algorithm is to use
Expander codes (also see Appendix D). However, hardware-
friendly expander codes tend to be hard to construct and
use in practice, while many hardware-friendly LDPC codes
have been designed [79]–[82]. In fact, the ENCODED-T and
ENCODED-F both have some advantages over the other, so
none of them universally outperforms the other. On the one
hand, ENCODED-T works for all regular LDPC codes, but it
requires a tree-structure for its theoretical analysis to work. On
the other hand, ENCODED-F does not require a tree-structure,
and requires less redundancy than ENCODED-T (it does not
need to maintain dv copies of the computation), but it requires
the LDPC code to satisfy certain properties. More concretely,
it requires that each single iteration of the simple Bit Flipping
algorithm corrects a constant fraction of errors (say, 75%) for
any combination of less than some constant number of errors
(say, 20 errors). This property is hard to verify in practice and
only existence result can be obtained. This further makes it
hard to say which one is better than the other. This is the
reason that we keep both the density evolution analysis for
general LDPC codes in Section IV-B under the assumption of
large girth, and the PBF analysis for ENCODED-F.
The following converse result holds for all computation
schemes. Although this converse result does not match with
any of the achievable results listed above, it matches with
an achievable result when a “noiseless decoder” is available
(details will be provided in Section VI) in the scaling of the
target error probability ptar. Thus, we believe the converse
result captures the required computational complexity for the
beginning stages of the linear transform computation.
Theorem 4 (Converse result). For Gate Model I with error
probability , maximum fan-in D, and linear transformation
r = s · A with A having full row rank, in order to achieve
P blke smaller than ptar, the number of operations required per bit
is lower bounded as Nper-bit ≥ L log 1/ptarKD logD/ = Ω(L log 1/ptarK log 1/ ).
Proof. See Appendix C.
D. Simulation Results for ENCODED Techniques
We present simulation results in Fig. 6, which shows the
variation of bit error probability during the process of imple-
menting Algorithm 2. In the simulation, we generate random
binary A matrices where each entry takes value one with
probability 1/2. The x-axis is the computing steps from the
bottom to the top of the noisy computing tree structure. The y-
axis is the bit error probability. As we have mentioned, during
the entire computing process, computation introduces errors
and decoding suppresses errors. Thus, the bit error probability
oscillates between two limits. This is exactly the expected
behavior as shown in Fig. 3(b).
This simulation uses a randomly generated (6, 12)-regular
LDPC code of length 1200. The systematic generator matrix
G is computed by solving the equation GH> = 0 in the
binary field using Gaussian elimination, where H is the parity
check matrix. The tree in Algorithm 2 is set to be a two-
branch tree, i.e., dT = 2. The failure probability values
of different unreliable gates are set to be the same as the
threshold value computed using the condition (17) in Theorem
1: pmaj = 0.001, pxor = 0.00026 and pand = 0.002. We still
assume that each operation in the decoding process is done
by a single unreliable gate, and all gates fail independently
of each other. Notice that the error probability lower limit is
just above pmaj = 0.001, which is consistent with our analysis
in Section IV-B. The bit error probability after each decoding
iteration should be confined between pmaj and pmaj + 1dT pthr in
theory (see Lemma 2).
It is interesting to note that the computation scheme works
at fairly practical values of node degrees (dv = 6) and
blocklengths (N = 1200). The target error probabilities are
typically much smaller, but so are gate-error probabilities.
Moreover, the scheme works even though the choice of the
tree-width dT do not satisfy the constraints (18). These suggest
that the bounds in Theorem 1 are conservative. The moderate
blocklength of the code suggests that the scheme could be
applied in practice, but a deeper investigation is needed which
is beyond the scope of this work.
We also use simulations to compare ENCODED and
repetition-based schemes. In particular, we provide a com-
parison between ENCODED-F and a particular repetition-
based scheme called “distributed voting scheme” [33], that
is designed for pmaj > 0. This method repeats not only
the computation part, but also the majority voting part of
the repetition-based circuit. The illustration of the distributed
voting scheme is shown in Fig. 5. In this way, we can compare
the (repetition-coding based) distributed voting scheme with
ENCODED that both use noisy gates.
The performance comparison is shown in Fig. 7. In the
distributed majority scheme, we use three-time repetition or
four-time repetition. For ENCODED-F, we set dv = 4, dc = 8,
ds = 8, K = 2000, L = 2100, N = 4000. We set
pand = 0.000125, pmaj = 0.0005 and pxor = 0.001. We
set these error parameters because we assume that the error
probability of each gate is proportional to its fan-in number
(we use 2-input AND-gates, 4-input MAJ-gates and 8-input
XOR gates). Note that the number of compute-and-correct
13
Copy 1
Copy 2
Copy 3
si =x
(i-1)+sigijjx
(i)
j
Voter 1
Voter 2
Voter 3
Fig. 5. This is the illustration of the 3-time distributed voting scheme for
computing an inner product s · aj = s1a1j + s2a2j + . . . sLaLj , where s
is the input to the linear transform sA, and aj is the j-th column of A. The
computation is divided into L stages. In the i-th stage, the distributed voting
scheme computes x(i)j = x
(i−1)
j + sigij for three times using three sets of
AND-gates and XOR-gates, uses three noisy majority-gates (which are called
voters in [33]) to compute three copies of the majority votes. Then, the output
of each majority value is sent to the corresponding copy for the computation
in the next stage.
stages in ENCODED-F should be
⌈
L
ds−1
⌉
= 300. In one
compute-and-correct stage, we need N XOR-operations of
fan-in ds = 8 for binary addition, P XOR-operations of fan-in
dc = 8 for parity computation and N MAJ-operations of fan-
in dv = 4 for majority computation. In all 300 stages, we also
need NL AND-operations of fan-in 2. Therefore the number
of operations per output bit for ENCODED-F is
N ENCXOR−8,per-bit =
N + P
K
⌈
L
ds − 1
⌉
=
3
7
L, (40)
N ENCMAJ−4,per-bit =
N
K
⌈
L
ds − 1
⌉
=
2
7
L, (41)
N ENCAND−2,per-bit =
NL
K
= 2L. (42)
In the distributed majority voting scheme with repetition time
tm (tm can be 3 or 4 when the majority gate with fan-in 4 is
used), the number of operations per output bit is
N RepXOR−8,per-bit = tm
⌈
L
ds − 1
⌉
=
tm
7
L, (43)
N RepMAJ−tm,per-bit = tm
⌈
L
ds − 1
⌉
=
tm
7
L, (44)
N RepAND−2,per-bit = tmL. (45)
Therefore, when the repetition time tm is 3 or 4, the number of
operations per output bit for ENCODED-F is always smaller
than the number of operations per output bit for the distributed
majority voting scheme.
E. Theoretical Comparison with Repetition Coding
In this section, we provably show the advantage of EN-
CODED through theoretical analysis. We also provide a result
in an online version [83] of this paper which shows EN-
CODED beats repetition-based techniques in scaling sense.
0 2 4 6 8 10 12 14 16 18 20
Stage Index in the FOR-Loop
0
0.5
1
1.5
2
2.5
3
Bi
t E
rro
r R
at
io
×10-3
Average Bit Error Ratio (Simulation)
Theoretical Lower Bound p
maj
Theoretical Upper Bound p
maj+1/dT· pthr
Fig. 6. Simulated performance of ENCODED-T using a (6,12) regular LDPC
with branch width dT = 2, and Gallager-B threshold value b = 3. The gate
error probabilities are set to pmaj = 0.001, pxor = 0.00026 and pand =
0.002. The code length N = 1200, the size of the linear transform satisfies
L = K = 600. The theoretical upper bound and lower bound are obtained
in Theorem 1. Note that the bounds on the bit error ratio are for the error
probability after the decoding stages, so they only apply for the even stages.
0 100 200 300 400 500 600
Stage Index in the FOR-Loop
0
0.05
0.1
0.15
0.2
0.25
0.3
Bi
t E
rro
r R
at
io
  BER<0.005
ENCODED-F
Repetition-3
Repetition-4
Fig. 7. In this figure, a simulation result of ENCODED-F using a (4,8) regular
LDPC with ds = 8 is shown. The code length N = 4000, the size of the
linear transform satisfies L = 2100 and K = 2000. A comparison with the
distributed majority voting schemes with repetition time 3 and 4 is also shown.
The gate error probabilities are set to pand = 0.000125, pmaj = 0.0005
and pxor = 0.001 in both ENCODED-F and the distributed majority voting
scheme.
14
In this paper, although we obtain results on the number
of operations for ENCODED in Theorem 1-3, the results are
biased for the comparison between ENCODED and repetition-
based schemes, because the number of operations do not
take into account the gate fan-in. Therefore, to compare the
complexity of operations with different fan-in, we define a new
concept called “effective number of operations”. We assume
that the “effective number of operations” for an operation with
fan-in c isNc-fan-in = c (the analysis for a differentNc-fan-in can
be done similarly). We show that if we consider the problem
of find a binary linear transform scheme that achieves
target error probability ptar = 5.1 · 10−4 using only noisy
gates with max(pxor, pmaj, pand) < 1.3 · 10−6, the effective
number of operations of ENCODED-F is smaller than
that of distributed majority voting, provided that the size
of the linear transform satisfies N = 2K > 9.85 · 107,
and L > ptarpand . We choose these parameters only to show
that ENCODED can provably beat repetition-based schemes
in situations when the parameters are not absurdly large, and
hence the theoretical analysis here has potential to provide
practical insight. Here max(pxor, pmaj, pand) is interpreted as
the maximum error probability over all types of different gates,
which allows the same type of gates (i.e., MAJ-gates) with
different fan-in to have different error probabilities.
1) Counting the effective number of operations: First, we
compare the effective number of operations in both schemes.
For ENCODED-F, we use a (9, 18) code. To make the compar-
ison fair, we allow the distributed majority voting scheme to
group several stages into one stage as well. Recall that we use
ds to denote the number of stages that ENCODED-F groups
into one stage. Therefore, we use d′s to denote the number of
stages that distributed majority voting groups into one stage.
In general, ds 6= d′s.
We compare ENCODED-F using (9,18) LDPC codes with
distributed majority voting with three-time repetition. We show
when
ds > 14, (46)
the “effective” number of operations in ENCODED-F is less
than that of distributed majority voting. Note that d′s can be
arbitrary.
The number of compute-and-correct stages in ENCODED-F
is
⌈
L
ds−1
⌉
, and that of distributed majority voting is
⌈
L
d′s−1
⌉
.
For the ease of analysis, assume L is a multiple of both
ds − 1 and d′s − 1. For ENCODED-F, in each compute-and-
correct stage, we need N XOR-operations of fan-in ds for
binary addition, P XOR-operations of fan-in dc and N MAJ-
operations of fan-in dv for LDPC decoding. In all compute-
and-correct stages, the overall number of AND-operations of
fan-in 2 is NL. Then,
N ENCXOR−ds,per-bit =
N
K
⌈
L
ds − 1
⌉
=
2
ds − 1L, (47)
N ENCXOR−dc,per-bit =
P
K
⌈
L
ds − 1
⌉
=
1
ds − 1L, (48)
N ENCMAJ−dv,per-bit =
N
K
⌈
L
ds − 1
⌉
=
2
ds − 1L, (49)
N ENCAND−2,per-bit =
NL
K
= 2L. (50)
In the distributed majority voting scheme with repetition
time 3, the number of operations per output bit is
N RepXOR−d′s,per-bit = 3
⌈
L
d′s − 1
⌉
=
3
d′s − 1
L, (51)
N RepMAJ−3,per-bit = 3
⌈
L
d′s − 1
⌉
=
3
d′s − 1
L, (52)
N RepAND−2,per-bit = 3L. (53)
Therefore, from (47)-(50), for ENCODED-F with dc = 18
and dv = 9, the effective number of operations is
N ENCeff = ds ·
2
ds − 1L+ dc ·
1
ds − 1L+ dv ·
2
ds − 1L+ 2 · 2L
=
36 + 2ds
ds − 1 L+ 4L =
38
ds − 1L+ 6L.
(54)
From (51) to (53), for distributed majority voting, the effective
number of operations is
N repeff = d
′
s ·
3
d′s − 1
L+ 3 · 3
d′s − 1
L+ 2 · 3L
=
3d′s + 9
d′s − 1
L+ 6L =
12
d′s − 1
L+ 9L > 9L.
(55)
Therefore, when ds > 14,
N ENCeff <
38
13
L+ 6L < 3L+ 6L = 9L < N repeff . (56)
2) Analyzing the probability of error: Now, we analyze the
error probability of ENCODED-F for ds = 14, dc = 18 and
dv = 9. From Lemma 7 in Appendix D, using almost all codes
in a (dv, dc)-regular LDPC random code ensemble with dv > 4
and N large enough, after one iteration of the PBF algorithm,
one can reduce the number of errors by at least θα0N for
any α0N worst-case errors if α0 and θ are small enough.
That is, using a (dv, dc)-regular LDPC code, the number of
errors after one iteration of noiseless PBF algorithm will be
smaller than α0 · (1 − θ). Recall that this is the condition
(A.3) that we require on the utilized LDPC code. In Example
1 in Appendix D, for the (9, 18)-regular LDPC code, we
computed numerically the threshold value of α0 for θ = 0.15
and obtained α0 = 5.1 · 10−4. We also obtained finite-length
bounds which state that there exist (9, 18)-regular LDPC codes
with length N = 50, 000 that can reduce the number of errors
by 15% for an arbitrary pattern of at most 20 errors, which
corresponds to the case when α0 = 4 · 10−4 and θ = 0.15.
From Theorem 3, using the (9,18) code, when the maximum
gate error probability  = max(pxor, pmaj, pand) satisfies the
condition
 < λ =
θα0/2
(ds − 1) + [dc(1−R) + 1] + 1
=
θα0/2
(14− 1) + [18(1− 12 ) + 1]+ 1 = θα054 ,
(57)
15
ENCODED-F has bounded final error fraction with high
probability, which is
1− P blke > 1− 3L exp (−D(2λ‖λ)N) , (58)
where λ = θα054 and D(2λ‖λ) = (2 log 2− 1)λ+O(λ2).
In particular, if we choose  = 160θα0 <
1
54θα0 = λ,
the final error fraction satisfies δfrace < α0 =
60
θ ·  with
probability 1−3L exp (−D(2λ‖λ)N). As we have mentioned,
for θ = 0.15, we obtain α0 = 5.1 · 10−4. Therefore, when
the gate error probabilities satisfy max(pxor, pmaj, pand) =  =
1
60θα0 = 0.0043α0 = 1.3·10−6, the obtained error probability
is smaller than α0 = 60/θ0 ·  = 400 = 5.1 · 10−4 with
probability 1−3L exp (−D(2λ‖λ)N), which is approximately
1 with reasonably large N , which can be guaranteed12 if
N > 50D(2λ‖λ) ≈ 50(2 log 2−1)·λ = 50(2 log 2−1)· 160 θα0 =
5.02·104
α0
=
9.85 · 107.
Therefore, if we consider the problem “find a binary
linear transform scheme that achieves target error probabil-
ity ptar = α0 = 5.1 · 10−4 using only noisy gates with
max(pxor, pmaj, pand) < 1.3 · 10−6”, ENCODED-F has smaller
“effective number of operations” than that of distributed ma-
jority voting. Additionally, one-time repetition or two-time
repetition cannot obtain ptar = α0 = 5.1 · 10−4 when L is
reasonably large so that 12 [1 − (1 − 2pand)L] ≈ Lpand > α0.
Thus, we conclude that ENCODED-F beats repetition-based
schemes under this circumstance. Here, we acknowledge that
the problem parameters (such as max(pxor, pmaj, pand) < 1.3 ·
10−6 and N > 9.85 · 107) are chosen to show that the
theoretical analysis works even when the parameter sizes are
not extremely large, and thus the theoretical analysis technique
has the potential to provide practical insight.
V. COMPUTING A LINEAR TRANSFORMATION RELIABLY
AND ENERGY-EFFICIENTLY WITH VOLTAGE SCALING
In this section, we consider unreliable gates with tunable
failure probability [84] when supply voltage, and hence energy
consumed by gates, can be adjusted to attain a desired gate-
reliability. To model this property within Gate Model I in (1),
we assume that the added noise zg ∼ Bernoulli(g(Eg)), in
which g(Ev) is a function that depends on the supply energy
Ev . We assume that Ev is identical for all gates at any stage
of the computation, while it can vary across stages. Intuitively,
g(·) should be a monotonically decreasing function, since the
error probability should be smaller if more energy is used.
Suppose the energy-reliability tradeoff functions of AND-
gates, XOR-gates and majority-gates are and(·), xor(·) and
maj(·) respectively. Then, the failure probability of these three
types of gates are pand = and(Ev), pxor = xor(Ev) and
pmaj = maj(Ev).
A. Uncoded Matrix Multiplication vs ENCODED-T
In this section, we compare the required energy for
ENCODED-T with that for ‘uncoded’ matrix multiplication
r = sA, where the circuit voltage is maintained high to ensure
12We believe that further optimization in code design can provide tech-
niques for error suppression for even smaller value of N .
overall error probability is smaller than target error probability.
The uncoded matrix multiplication is how almost all circuits
today operate. Here, we only provide a scaling sense compar-
ison to show the advantage of ENCODED techniques.
Proposition 1. When max{and(Ev), xor(Ev)} < 12L−2 ,
to achieve bit error probability P bite < ptar,
the energy consumption per output bit is
Ω(L · max{−1and( 2ptarL ), −1xor ( 2ptarL )}) for the uncoded matrix
multiplication scheme, while that for ENCODED-T is
O
(
LN
K max{−1maj( 12ptar), −1xor ( 12ptar), −1and( 12ptar)}
)
and
Ω
(
LN
K 
−1
maj(ptar)
)
.
Proof. “Uncoded” scheme: To compute each output bit using
straightforward dot-product-based multiplication, one needs to
compute a dot product of the message s with one column in the
matrix A, which needs 2L-1 unreliable operations (L AND-
operations and L − 1 XOR-operations). From Lemma 1, we
know that the bit error probability is
P bite =
1
2
[1− (1− 2pand)L(1− 2pxor)L−1]. (59)
Since
(1− 2pand)L < 1−2Lpand+2L(L−1)p2and
(a)
< 1−Lpand, (60)
where step (a) follows from pand < 12L−2 , and
(1− 2pxor)L−1 < 1− 2(L− 1)pxor + 2(L− 1)(L− 2)p2xor
(b)
< 1− Lpxor,
(61)
where step (b) follows from pxor < 12L−2 , we get
P bite >
1
2
[1− (1− Lpand)(1− Lpxor)]
=
L
2
pand +
L
2
pxor − L
2
2
pandpxor
(c)
>
L
2
max{pand, pxor}.
(62)
where step (c) follows from max{pand, pxor} < 12L−2 <
1
L . Thus, to attain a target bit error probability ptar, it
must hold that max{and(Ev), xor(Ev)} < 2ptarL . Therefore,
the total energy required for each output bit is Ω(L ·
max{−1and( 2ptarL ), −1xor ( 2ptarL )}).
From Theorem 1, we know that in the ENCODED-T
technique, Nper-bit = Θ(L) is sufficient to achieve bit error
probability smaller or equal to pmaj + 1dT pthr. From (17), it is
reasonable to make pxor = pand = pmaj = pthr = 12ptar, in which
case pmaj + 1dT pthr < 2pmaj = ptar. Since there are Θ(
LN
K )
AND-, XOR- and majority-operations in the ENCODED-T
technique (see the Computational Complexity Analysis part in
the proof of Theorem 1), the total energy required for each
output bit is O
(
LN
K
(
−1maj(pmaj) + 
−1
xor (pxor) + 
−1
and(pand)
))
=
O
(
LN
K max{−1maj( 12ptar), −1xor ( 12ptar), −1and( 12ptar)}
)
.
Furthermore, pmaj < ptar due to the ‘last-gate effect’.
Therefore, the total energy required for each bit is at least
Ω
(
LN
K 
−1
maj(pmaj)
)
= Ω
(
LN
K 
−1
maj(ptar)
)
.
16
Remark 8. We show an illustrative example when and(·) =
xor(·) = maj(·) = g(·). Because −1g (u) typically decreases
monotonically in u, we consider three specific cases: expo-
nential decay, polynomial decay and sub-exponential decay.
For exponential decay, we assume g(u) = exp(−cu), c > 0.
Therefore, the total energy required for each output bit for
the ‘uncoded’ matrix multiplication is Ω(L log Lptar ), while that
for ENCODED-T is Θ(LNK log
1
ptar
). For polynomial decay,
g(u) = (
1
u )
c, c > 0, the total energy required for each output
bit for the ‘uncoded’ matrix multiplication is Ω(L( Lptar )
1
c ),
while that for ENCODED-T is Θ(LNK (
1
ptar
)
1
c ). For sub-
exponential decay, we assume g(u) = exp(−c
√
u), c > 0. By
sub-exponential we mean the delay is slower than exponential
but faster than polynomial. The sub-exponential decay model
is inspired and obtained from [85] on spintronic devices [86]–
[88]. Therefore, the total energy required for each output bit
for the ‘uncoded’ matrix multiplication is Ω
(
L(log Lptar )
2
)
,
while that for ENCODED-T is Θ
(
LN
K (log
1
ptar
)2
)
. In all cases,
if K = RN for some constant ‘rate’ R, the scaling of the
required energy consumption of ENCODED-T is smaller than
uncoded.
In the next subsection, we will show that using ‘dy-
namic’ voltage scaling, we can achieve even lower energy by
using a two-phase computation scheme called ENCODED-
V. For example, when g(u) = ( 1u )
c, c > 0, the energy
consumption per output bit is O
(
N
K max
{
L,
(
1
ptar
) 1
c
})
.
B. ENCODED-V: Low-energy Linear Transformations Using
Dynamic Voltage Scaling
In this part, we modify the ENCODED-F technique in Sec-
tion IV-C with ‘dynamic’ voltage scaling to obtain arbitrarily
small output error fraction. The gate model here is Model
I. The original ENCODED-F technique has dL/(ds − 1)e
stages, where in each stage, a noisy decoder of the utilized
LDPC code is used to carry out one (noisy) iteration of PBF
decoding. In the original ENCODED-F technique, we assumed
that gate failure probability is constant (and equal for all
gates) throughout the duration of the computation process.
Here, we partition the entire ENCODED-F technique into two
phases. In the first phase, we use constant supply energy,
while in the second phase, we increase the supply energy as
the computation proceeds, so that the gate failure probability
decreases during the computation process, in order to achieve
the required output error fraction with high probability.
For ease of presentation, we consider the case when ds = 2,
i.e., we only add ds − 1 = 1 codeword to the N -bit storage
at each stage. The extension to general ds is straightforward.
We partition the entire ENCODED-F so that there are L−Lvs
stages in the first phase and Lvs stages in the second phase,
where Lvs is defined as
Lvs =
⌈
log 1ptar + logα0
log 1
1− 12 θ
⌉
, (63)
where ptar is the required final output error fraction. In the i-th
stage of the last Lvs stages, we assume that the supply energy
is increased to some value to ensure that
[dc(1−R) + 1]p(i+1)xor + p(i+1)maj + p(i+1)and ≤
1
4
θα0
(
1− 1
2
θ
)i
.
(64)
We call this (dynamic) voltage-scaling scheme the
ENCODED-V technique.
Theorem 5. (Using dynamic voltage scaling for Problem 2)
Using unreliable AND gates, majority gates and XOR gates
defined from Gate Model I (D, ) with maximum fan-in
D and error probability pand, pxor and pmaj, and using a
regular LDPC code that satisfies assumption (A.3), the binary
linear transformation r = s · A can be computed using the
ENCODED-F technique with dynamic voltage scaling, with
per-bit energy consumption
Eper-bit =
L− Lvs
K
[
N−1and (pand) +N
−1
maj (pmaj) + (N + P )
−1
xor (pxor)
]
+
N
K
Lvs∑
i=1
−1and
(
p
(i)
and
)
+
N
K
Lvs∑
i=1
−1maj
(
p
(i)
maj
)
+
N + P
K
Lvs∑
i=1
−1xor
(
p(i)xor
)
,
(65)
where Lvs, which is a function of ptar, is defined in (63).
Further, the output error fraction is below ptar with probability
at least 1− P blke , where the probability P blke satisfies
P blke < 3(L− Lvs) exp (−λ∗N) + 3
Lvs∑
i=1
exp
(
−λ˜(i+1)N
)
,
(66)
where
λ∗ = D(2λ‖λ) = (2 log 2− 1)λ+O(λ2),
λ˜(i+1) = D(2λ(i+1)‖λ(i+1))
= (2 log 2− 1)λ(i+1) +O((λ(i+1))2),
λ = θα0/2[dc(1−R)+1]+2 ,
λ(i+1) =
θα0(1− 12 θ)
i
/4
[dc(1−R)+1]+2 .
(67)
Proof. See Appendix E.
As the analysis in Section V-A, we consider three specific
cases of energy-reliability tradeoff: exponential decay model
and(u) = xor(u) = maj(u) = exp(−cu), c > 0, polynomial
decay model and(u) = xor(u) = maj(u) = ( 1u )
c, c > 0 or
sub-exponential decay model and(u) = xor(u) = maj(u) =
exp(−c√u), c > 0. We evaluate the total energy consumption
per output bit in these two cases under a specific choice of
supply energy that ensures the condition (64).
Corollary 1. (Using dynamic voltage scaling for Problem 1)
Using a (dv, dc) regular LDPC code that satisfies assumption
17
(A.3) (with parameters α0 and θ) and has length N >
1
θ∗ log
(
6L
ptar
)
, where
θ∗ = min {λ∗,
D
(
2
θptar
(
1− 12θ
)
/4
[dc(1−R) + 1] + 2
∥∥∥∥∥ θptar
(
1− 12θ
)
/4
[dc(1−R) + 1] + 2
)}
,
(68)
and λ∗ is defined in (67), the ENCODED-V technique can
achieve output bit error probability ptar with total energy con-
sumption pet bit Eper-bit: When the energy-reliability tradeoff
function and(u) = xor(u) = maj(u) = ( 1u )
c, c > 0, Eper-bit =
O
(
N
K max
{
L,
(
1
ptar
) 1
c
})
; when the energy-reliability trade-
off function and(u) = xor(u) = maj(u) = exp(−cu), c >
0, Eper-bit = O
(
N
K max{L, log2 1ptar }
)
; when the energy-
reliability tradeoff function and(u) = xor(u) = maj(u) =
exp(−c√u), c > 0, Eper-bit = O
(
N
K max{L, log3 1ptar }
)
.
Proof. See Appendix F.
We use Table I to show the energy-reliability trade-
off of “uncoded” matrix multiplication, ENCODED-T and
ENCODED-V.
VI. WHEN A NOISELESS DECODER IS AVAILABLE
The conclusion in Theorem 3 can be further tightened if
we use a noiseless PBF decoder after the noisy computation.
Although the assumption that the last step of the entire
computation process is fault-free is not valid under our Gate
Model I or Gate Model II, it is often adopted in existing
literature on computing with noisy components [33], [34],
[89].
Theorem 6 (What if we have a noiseless decoder). Suppose the
unreliable gates are drawn from Gate Model I (D, ). Further
assume that a noiseless PBF decoder is available. Then, the
linear transformation r = s · A that outputs K bits can be
computed with P blke < ptar using
1
λ∗ log
3L
ptar
= Θ(log 1ptar )
unreliable operations per output bit and extra Θ(log log 1ptar )
noiseless operations per output bit, where the parameter λ∗ is
defined in (33) in Theorem 3.
Proof. We use the ENCODED-F technique to do noisy linear
transformations. That is, instead of using Gallager-B decod-
ing algorithm to correct errors, we use the PBF algorithm.
Theorem 3 shows that the final error fraction can be upper
bounded by a small constant α0 with high probability as
long as (30) holds. The total number of operations per bit
is (3N+P )LK ≤ 4NLK = Θ(NLK ).
If we require the error probability ptar to be arbitrarily small,
we have to use a noiseless decoder to correct residual errors
in the final output. We can use the noiseless decoder to carry
out Θ(logN) iterations of noiseless PBF algorithms to correct
all errors, which introduces an additional Θ(logN) operations
per bit.
The overall error probability is the same as (32). To ensure
that P blke is smaller than ptar, it suffices (see (32)) to let
3L exp (−λ∗N) < ptar.
This is satisfied when
N ≥ 1
λ∗
log
3L
ptar
= Θ(log
L
ptar
).
Thus, we need 4LKλ∗ log
3L
ptar
= Θ( LK log
L
ptar
) unreliable oper-
ations per bit and extra Θ(logN) = Θ(log log Lptar ) noiseless
operations per bit.
Remark 9. As discussed in Remark 6, the output error proba-
bility is at least pmaj, the error probability of a majority gate,
due to the ‘last-gate effect’. Therefore, in order to achieve
arbitrarily small error probability, the noiseless operations in
Theorem 6 are necessary.
In fact, the bound in Theorem 4 is a lower bound on the
number of operations that are used at the entrance stage, i.e.,
operations that have one of the L inputs (s1, s2, ...sL) as an
argument, of the computation scheme. Therefore, Theorem 6
and Theorem 4 together assert that the number of noisy
operations scales as Θ(log 1ptar ) under the setting of Problem 1,
if the ‘last-gate effect’ can be addressed using a few noiseless
operations which scales as Θ(log log 1ptar ).
VII. CONCLUSIONS AND FUTURE WORK
Can reliable computation be performed using gates that are
all equally unreliable? As we discussed, the error probability
is lower bounded by the last gate’s error probability . We
provide LDPC codes-based strategies (called ENCODED) that
attain error probability close to  (which we bound by 2).
Further, we show that these strategies outperform repetition-
based strategies that are commonly used today.
The key idea that ENCODED relies on is to repeatedly
suppress errors in computation process by, in a sense, encoding
the computation matrix of the linear transformation, instead
of encoding inputs (as is done in traditional communication).
Using ENCODED, both probabilistic errors and worst-case
errors can be kept suppressed.
Inspired by voltage-scaling techniques commonly used to
reduce power in circuit design, we also analyzed possible
gains attainable using ‘static’ and ‘dynamic’ voltage scaling in
conjunction with our ENCODED technique. It would be mean-
ingful to experimentally model the power-reliability tradeoffs
of voltage scaling to give more insights to the designer. On the
modeling side, it would also be important to include energy
consumed in wiring [41], [43], [44] (which can be a significant
chunk of the total energy in decoding circuits [90]) in these
models, and observe if predicted gains due to coding are
significantly reduced. Perhaps wiring energy will also motivate
design of novel coding techniques that attempt to correct errors
with local information as much as possible.
There are many coding-theoretic problems that fall out
naturally. What are practical codes that can be used to reduce
computational errors? Are there benefits to applying more
recent discoveries, such as spatially coupled LDPC codes [91],
instead of expander codes?
18
TABLE I
THIS TABLE SHOWS THE ENERGY-RELIABILITY TRADEOFFS OF DIFFERENT COMPUTING SCHEMES UNDER DIFFERENT GATE ERROR PROBABILITY
MODELS.
uncoded ENCODED-T ENCODED-V
 = exp(−cu) Ω
(
L log L
ptar
)
Θ
(
LN
K
log 1
ptar
)
O
(
N
K
max{L, log2 1
ptar
}
)
 = ( 1
u
)c Ω
(
L( L
ptar
)
1
c
)
Θ
(
LN
K
( 1
ptar
)
1
c
)
O
(
N
K
max
{
L,
(
1
ptar
) 1
c
})
 = exp(−c√u) Ω
(
L log2 L
ptar
)
Θ
(
LN
K
log2 1
ptar
)
O
(
N
K
max{L, log3 1
ptar
}
)
More broadly, the problem of computation with noisy gates
is of considerable practical and intellectual interest. It is
widely accepted that biological systems operate with noisy
computational elements, and yet provide good performance at
low energy. In engineered systems, with saturation of Den-
nard’s scaling and Moore’s law, new device technologies are
being used to design circuits that are invariably error-prone. A
comprehensive understanding of reliability-resource tradeoffs
in error-correction coding in computing could give these novel
technologies (e.g. carbon nanotubes and mechanical switches)
a better chance to compete with established ones (i.e., CMOS).
To that end, it will be key to identify what causes faults in these
novel technologies so that they can be modeled and analyzed,
and appropriate codes be designed for them. Intellectually, it
is interesting (and widely acknowledged) that the remarkable
gains that coding brings to communications, especially at long-
range, are not easy to obtain in computational settings. The
theoretical reasoning for this thus far rests on simplistic models
and has rather loose bounds [45]. Improved strategies and
improved outer bounds will go a long way in characterizing
how large these gains can be.
A. Connections with coded computing with stragglers and
“exascale computing”
We note an important connection between ENCODED
and the recent works on coded computation in presence of
“stragglers” [92]–[96]. These works focus on “processor-level”
(rather than gate-level) noise, e.g. it is assumed in [92] that
the product of input s with each column of A is “erasure-
prone” with some probability (which depends on models
of time required for computation). The formulation there
is not applicable in two ways that are crucial for modern
“exascale” computing systems: (i) there is an increasing trend
in distributed systems community to consider “soft-errors”
that are undetectable [97], whereas [92]–[96] largely focus
on erasures; and more importantly, (ii) there is an increasing
need for understanding scalability when the number of (fixed
memory) processors increases for a fixed total problem size
(to understand the limits of gains with parallelization of a
problem). This is called “strong scaling” [98, Chapter 9],
whereas “weak scaling” allows for increasing problem size and
number of processors, while keeping memory of each proces-
sor fixed. The works [92]–[96] only examine a fixed number
of processors with increasing memory of each processor with
problem size, which is, strictly speaking, “weaker than weak”
scaling.
For the specific problem of matrix-vector multiplication,
strong scaling allows adding more processors than the number
of rows and/or columns of the matrix to increase paralleliza-
tion. For example, when computing s × A, the matrix A
is often split into both horizontal and vertical pieces (as
used in the algorithm “SUMMA” [99]), ENCODED can be
adapted to this split to suppress error propagation (horizontal
decomposition is similar to (12), and vertical decomposition is
imposed by using limited memory gates). In ENCODED-T, the
tree-structure helps introduce increased parallelism, speeding
up the computation, while keeping errors in check through
repeated error suppression. However, the algorithms in [92]–
[96] do not easily adapt to horizontal division. If one naively
uses strategies in [92]–[96] for strong scaling with soft errors,
errors will accumulate and cause the resulting output to be far
from the correct output.
One limitation of the technique proposed here is that it is
limited to finite fields instead of real number coding. It is
important to extend ENCODED to real number codes, and a
preliminary attempt is made in [100] on iterative algorithms
for logistic regression, where LDPC-type real-number coding
techniques (inspired from [101]) are used for error-correction
over reals.
VIII. ACKNOWLEDGEMENTS
We thank the SONIC center members, especially Ameya
Patil, Naresh Shanbhag, Andrew Bean, and Andy Singer, for
discussions that motivated this paper, and pointers to spin-
tronics models and connections with practice. We also thank
Shawn Blanton and Franz Franchetti for useful discussions
regarding practical aspects of the problem. We thank David
Burshtein for discussions regarding the analysis of the flipping
algorithm. We thank Tze Meng Low for useful discussions on
strong scaling and the reference to SUMMA. We also thank
Zhuo Jiacheng (Carlson) and Paul Griffioen for their careful
reading of the paper, pointing out errors and typos, and helpful
comments.
APPENDIX A
DETAILS OF GALLAGER-B DECODING ALGORITHM
Assume a variable node v is connected to dv parity check
nodes in Nv and a parity check node c is connected to dc vari-
able nodes inNc. Suppose the received bits are r = (r1, ...rN ).
The decoding algorithm we use is the Gallager-B algorithm:
• From variable node to check node:
– Iteration 0: m(0)v→c = rv is transmitted from v to
every check node c ∈ Nv .
19
– Iteration i: m(i)v→c is transmitted from v to c ∈ Nv ,
m(i)v→c =
{
x,
z,
if |c′ ∈ Nv \ c : m(i−1)c′→v = x| ≥ b,
otherwise,
(69)
where b =
⌊
dv+1
2
⌋
and z is a randomly generated
bit.
• From check node to variable node:
– Iteration i: m(i)c→v is transmitted from check node c
to variable node v ∈ Nc,
m(i)c→v = ⊕
v′∈Nc/v
m
(i−1)
v′→c . (70)
Remark 10. Note that the updating rule m(i)v→c = z in (69),
which is used to break ties, is different from the original
rule m(i)v→c = yv in [19], in which yv is the channel output
associated with the variable node v. This is because the
problem that we consider is a computing problem, instead of
a communication problem, and hence there are no channel
outputs. Note that the analysis is also done for the modified
updating rule. Although the modified updating rule is theo-
retically sound, we acknowledge that the cost of generating a
random bit may be higher than that of the majority rule.
APPENDIX B
PROOF OF LEMMA 2
In this section, we prove that bit error probability can be
made below a small constant pmaj + 1dT pthr after one iteration
of Gallager-B decoding. We use density evolution to analyze
the change of error probability before and after decoding.
A. Density Evolution Analysis
We examine the m-th level in the tree structure of the
ENCODED-T. After the outputs from the (m + 1)-th level
are obtained, they are forwarded to the m-th level of the tree
structure to perform a component-wise XOR-operation. The
results of this XOR-operation are stored in the E-bit registers
at a node vlm (a compute-and-correct unit) in the m-th level
and is decoded using C iterations of the Gallager-B algorithm.
Now, we focus on the C iterations of Gallager-B decoding
done at the node vlm. For simplicity, we write the message-
passing result after the i-th iteration as x˜(i) = (x(i)v→c), which
is the vector constituted by the messages sent from variable
nodes to parity check nodes. The initial input x˜(0) is the
output of the unreliable XOR-gates in the node vlm. Denote
the correct message-passing bits by w˜(i) = (w(i)v→c), i.e., if
no computing errors are introduced in the entire computation
process, in contrast to just iteration i. We write p(i)v→c as the
bit error probability of x(i)v→c, that is,
p(i)v→c = Pr(x
(i)
v→c 6= w(i)v→c)
We want to calculate the evolution of p(i)v→c with i.
From [26], [77] we know that in density-evolution analysis,
the bit error probability does not depend on the transmit-
ted codeword, based on the check node and variable node
symmetry of the message-passing algorithm, and the channel
symmetry and the message noise symmetry [26, Def. 5]. In our
problem, the channel symmetry comes from the fact that the
AND gates flip different outputs with the same probability. The
message wire symmetry comes from the fact that the majority
gates and XOR gates flip outputs with the same probability.
Note that we do not need the source symmetry, and hence we
can use the proof of Theorem 1 in [26] to show that the bit
error probability P bite does not depend on the correct codeword
at the node vlm. Therefore, we can assume without loss of
generality that the correct input (and hence output) in the linear
computation s · G˜ is an all-zero codeword and hence
p(i)v→c = Pr(x
(i)
v→c 6= 0). (71)
From assumption (A.3) we know that when the number of
levels in the tree structure in the ENCODED-T technique is
smaller or equal to logN2 log(dv−1)(dc−1) , we can assume that the
decoding neighborhood for each variable node is cycle-free
and all bits entering a majority-gate or an XOR-gate are inde-
pendent of each other. In our case, choosing C = 1 iterations at
each level, we can indeed ensure that the constraint on number
of decoding iterations holds, since the tree structure in the
ENCODED-T technique makes the total number of decoding
iterations equal to C · (M − 1) =
⌈
logL
log dT
⌉
, (see (20), we use
M−1 because only non-leaf nodes have embedded decoders),
and the tree-width dT can be set large enough so that (18) is
satisfied.
Therefore, based on symmetry and independence, we can
use the analysis for the noisy Gallager-B decoder in [27]
and attain the performance predicted by density evolution for
regular LDPC codes. For b =
⌊
dv+1
2
⌋
, l ∈ Z ∩ [1, dv − 1],
define four functions:
α¯(u) :=
1− (1− 2u)dc−1
2
, (72)
γ¯(u) := (1− pxor)α¯(u) + pxor(1− α¯(u)), (73)
Λl(γ¯) :=
(
dv − 1
l
)
(1− γ¯)lγ¯dv−1−l, (74)
η¯(γ¯) := (1− p0)
dv−b−1∑
l=0
Λl(γ¯) + p0
b−1∑
l=0
Λl(γ¯). (75)
Intuitively, γ¯(u) denotes the error probability after the XOR-
operation at a check node and η¯(u) denotes the error prob-
ability after the majority-operation at a variable node. These
functions are borrowed from [27] and are crucial for analyzing
noisy Gallager-B density evolution. Note that we change the
form of functions α(·), γ(·), η(·) in [27] into 1−α¯(·), 1−γ¯(·),
1 − η¯(·), in correspondence with the usual goal of analyzing
error probability, instead of correctness probability.
We first state a result from [27], and then simplify the result
using an upper bound. Note that the LDPC decoding rule used
in [27] is slightly different from ours, as stated in Remark 10.
We will address this issue in the proof of the upper bound.
Lemma 4 ( [27], pp. 1662, Theorem 1). For regular LDPC
codes with check node degree dc and variable node degree dv
20
p(i+1) =f(p(i))
:=pmaj
[
1− η¯(γ¯(p(i)))
]
+ (1− pmaj)η¯
(
γ¯(p(i))
)
=pmaj + (1− 2pmaj)η¯
(
γ¯(p(i))
)
,
(76)
where η¯(·) is defined in (75) and γ¯(·) is defined in (73).
The lower bound pdece > pmaj in (24) follows from the
fact that pmaj < 12 (see Gate Model I) and p
dec
e = p
(C) =
f(p(C−1)), where C is the total number of iterations of
decoding at each level. In what follows, we upper-bound the
RHS of (76) by upper-bounding the functions γ¯(·) and η¯(·)
defined in (73) and (75). The result is shown in Lemma 5.
Lemma 5. For regular LDPC codes with check node degree
dc and variable node degree dv
p(i+1) < f0(p
(i)) := pmaj + η¯0(γ¯0(p
(i))), (77)
where
η¯0(γ¯) =
(
dv − 1
bdv+12 c
)
γ¯d dv−12 e. (78)
γ¯0(u) = (dc − 1)u+ pxor. (79)
Proof. First, note that the decoding algorithm used in [27] is
slightly different from ours in that the tie-breaking rule in [27]
is m(i)v→c = yv , which is different from our rule m
(i)
v→c = z,
where z is a randomly generated bit (see Remark 10). It can be
shown that, if our updating rule is used, the η¯ function in the
density evolution function (76) should be changed from (75)
to
η¯(γ¯) :=
1
2
dv−b−1∑
l=0
Λl(γ¯) +
1
2
b−1∑
l=0
Λl(γ¯). (80)
When dv is an even number, b = bdv+12 c = dv2 = dv − b.
When dv is an odd number, b = bdv+12 c = dv+12 = dv−b+1.
In both cases, we have dv − b ≤ b. Therefore, (80) can be
upper bounded by
η¯(γ¯) =
1
2
dv−b−1∑
l=0
Λl(γ¯) +
1
2
b−1∑
l=0
Λl(γ¯)
≤ 1
2
b−1∑
l=0
Λl(γ¯) +
1
2
b−1∑
l=0
Λl(γ¯) =
b−1∑
l=0
Λl(γ¯).
(81)
Further
b−1∑
l=0
Λl(γ¯) =
b−1∑
l=0
(
dv − 1
l
)
(1− γ¯)lγ¯dv−1−l
= γ¯dv−b
b−1∑
l=0
(
dv − 1
l
)
(1− γ¯)lγ¯b−1−l
≤ γ¯dv−b
b−1∑
l=0
(
dv − 1
b− 1
)
(1− γ¯)lγ¯b−1−l
≤ γ¯dv−b
b−1∑
l=0
(
dv − 1
b− 1
)(
b− 1
l
)
(1− γ¯)lγ¯b−1−l
=
(
dv − 1
b− 1
)
γ¯dv−b
(a)
=
(
dv − 1
b− 1
)
γ¯d dv−12 e,
where step (a) follows from dv − b =
⌈
dv−1
2
⌉
, which can be
readily checked by b = bdv+12 c. Therefore,
η¯(γ¯) ≤
(
dv − 1
bdv−12 c
)
γ¯d dv−12 e = η¯0(γ¯). (82)
For the function γ¯(u) in (73), we upper bound it with the
following two inequalities:
α¯(u) =
1− (1− 2u)dc−1
2
≤ 1− [1− (dc − 1)2u]
2
= (dc−1)u,
γ¯(u) = α¯(u) + pxor − 2α¯(u)pxor < α¯(u) + pxor.
Therefore
γ¯(u) < α¯(u) + pxor ≤ (dc − 1)u+ pxor = γ¯0(u). (83)
Combining (83) and (82)
p(i+1) = pmaj
[
1− η¯(γ¯(p(i)))
]
+ (1− pmaj)η¯
(
γ¯(p(i))
)
= pmaj + (1− 2pmaj)η¯
(
γ¯(p(i))
)
≤ pmaj + η¯
(
γ¯(p(i))
)
≤ pmaj + η¯0
(
γ¯(p(i))
) (a)
≤ pmaj + η¯0
(
γ¯0(p
(i))
)
,
where step (a) is due to the fact that η¯0(γ¯) is monotonically
increasing.
B. Completing the Proof of Lemma 2
We need to prove that if the bit error probability before
decoding is smaller than preg = (D + 1)pthr + pxor, the bit
error probability after decoding is smaller than pmaj + 1dT pthr.
Using Lemma 5, we only need to prove
pmaj +
(
dv − 1
bdv−12 c
)
[(dc − 1)preg + pxor]d
dv−1
2 e < pmaj + 1
dT
pthr,
(84)
which is equivalent to
[(dc − 1)preg + pxor]d
dv−1
2 e < 1
dT
(
dv − 1
bdv−12 c
)−1
pthr. (85)
We know that
(dc − 1)preg + pxor (a)= (dc − 1) [pxor + (dT + 1)pthr] + pxor
= (dc − 1)(dT + 1)pthr + dcpxor
(b)
≤ (dc − 1)(dT + 1)pthr + (dT + 1)pthr
≤ dc(dT + 1)pthr,
where step (a) is from the definition preg = (dT + 1)pthr + pxor
and step (b) is from the second condition in (17). Thus, to
prove (85), it suffices to prove
(dc(dT + 1)pthr)
d dv−12 e < 1
dT
(
dv − 1
bdv−12 c
)−1
pthr,
which is equivalent to
pthr
d dv−32 e <
(
dv − 1
bdv−12 c
)−1
dc
−d dv−12 ed−1T (dT + 1)−d
dv−1
2 e.
21
We have defined d =
⌈
dv−1
2
⌉
= dv − b in Theorem 1. Thus,
the above inequality is ensured by
pthr <
(
dv − 1
bdv−12 c
)− 1d−1
dc
− dd−1 d
− 1d−1
T (dT + 1)
− dd−1 ,
which is the first condition in (17).
APPENDIX C
PROOF OF THEOREM 4
Theorem 4 provides a lower bound on the number of oper-
ations by lower-bounding the operations done at the entrance
stage of the noisy circuit, i.e., operations that have one of
the L inputs (s1, s2, ...sL) as an argument. In order to prove
Theorem 4, we need the following lemma (stated implicitly
in [49, Proposition 1]) which characterizes the equivalence of
a noisy-gate model and a noisy-wire model.
Lemma 6. For each unreliable gate from Gate Model I (D, )
with error probability  and fan-in number ≤ D, its output
variable can be stochastically simulated by (equivalent in
distribution to) another unreliable gate g˜ that computes the
same function but with the following property: each input wire
flips the input independently with probability /D and the gate
has additional output noise independent of input wire noise.
Proof. For an arbitrary unreliable gate
y = g(u1, u2, ..., ud)⊕ zg, d ≤ D,
consider another unreliable gate together with noisy wires
y˜ = g˜(u1, u2, ..., ud)⊕ z˜g
= g(u1 ⊕ w1, u2 ⊕ w2, ..., ud ⊕ wd)⊕ z˜g, d ≤ D,
where wj is the noise on the j-th input wire and takes value 1
with probability /D. The probability that all d wires convey
the correct inputs is (1− /D)d > 1−d D > 1− . Therefore,
if z˜g is 0 w.p.1, the error of g˜ will be smaller than . Thus,
using standard continuity arguments, we can find a random
variable z˜g which equals to 1 w.p. ′ < , while making y˜ and
y equivalent in distribution.
Based on this lemma, we know that a noisy network defined
in Section II-A can always be replaced by another network,
where each wire has an error probability D . Before a specific
input sk enters the noisy circuit, it is always transmitted along
the wires connected to the entrance stage of the gates in the
circuit. Because of the assumption that gates after the inputs
are noisy, the bit will be ‘sampled’ by the noisy wires. For
convenience of analysis, we assume each gate can only be used
once so that the number of operations is equal to the number
of unreliable gates. Now that each gate only computes once,
each noisy wire can only carry information once as well. We
assume each sk is transmitted on Tk distinct wires. Then, the
probability that the message on all Tk wires flips is
pk = (/D)
Tk . (86)
Therefore, the error probability of the input bit sk satisfies
P kin > pk. Since matrix A is assumed to have full row rank,
if the linear transformation computation is noiseless, even a
single input bit error leads to an output block error. Therefore,
even when the linear transformation computation is noiseless,
the output block error probability P blke is greater than the input
error probability P kin . Since the computation is noisy, P
blk
e is
still greater than P kin , and hence is greater than pk. Therefore,
if (/D)Tk = pk > ptar, the block error probability P blke >
P kin > pk > ptar, which contradicts with the aim to make the
block error probability smaller than ptar. Thus,
ptar > (/D)
Tk ,
which means that for any bit sk
Tk >
log 1/ptar
logD/
. (87)
Therefore, the number of wires connected to each input bit
must be at least log 1/ptarlogD/ . Since the number of input bits is
L, the total number of wires connected to all input bits is at
least L log 1/ptarlogD/ . Since we are using gates with bounded fan-in
smaller than D, the number of gates is at least L log 1/ptarD logD/ , so
does the number of operations. Since there are K output bits,
the number of operations per output bitNper-bit >
L log 1/ptar
KD logD/ .
APPENDIX D
CODES THAT SATISFY THE REQUIREMENT (A.3)
The existence of codes that satisfy the requirement (A.3)
follows from a result in [70]. We first present the result from
[70].
Define β0, β1, β2, β3 respectively as the largest integer less
than dv/2, the largest integer less than or equal to dv/2, the
smallest integer greater than or equal to dv/2, and the smallest
integer greater than dv/2. Create four real parameters γ12, δ12,
pi0 and ω0 that satisfy the following inequalities
(1− θ)αN ≤ γ12N + δ12N, (88)
0 ≤ γ12N ≤ αN, (89)
0 ≤ pi0(1−R)N ≤ ω0dvN ≤ αdvN, (90)
β3(α− γ12)N ≤ω0dvN
≤min
(
d′
dc
pi0dvN, γ12β1N + dv(α− γ12)N
)
,
(91)
where d′ is the largest odd number which is less than or equal
to dc, and
0 ≤ δ12Nβ2 ≤ (pi0 − ω0)dvN. (92)
Define the following polynomials
F0(x)
∆
=
β0∑
j=0
(
dv
j
)
xj , (93)
F1(x)
∆
=
β1∑
j=0
(
dv
j
)
xj , (94)
22
F2(x)
∆
=
dv∑
j=β2
(
dv
j
)
xj , (95)
F3(x)
∆
=
dv∑
j=β3
(
dv
j
)
xj , (96)
Go(x)
∆
=
∑
j=1,3,...,d′
(
dc
j
)
xj , (97)
Ge(x)
∆
=
c∑
j=0,2,...,d′′
(
dc
j
)
xj , (98)
where d′′ is the largest even number less than or equal to dc.
Then we define
ψ(α, γ12, δ12, pi0, ω0)
∆
= h(γ12, α− γ12, δ12) + (1−R)h(pi0)
+t1 + t2 + u1 + u2 − dvh(ω0, α− ω0, pi0 − ω0),
(99)
where h(·) is the entropy function defined as
h(τ1, τ2, . . . , τi) =−
i∑
j=1
τj log τj
−
1− i∑
j=1
τj
 log
1− i∑
j=1
τj
 ,
(100)
and
t1 = inf
x>0
{γ12 logF1 + (α− γ12) logF3 − ω0dv log x} ,
(101)
t2 = inf
x>0
{δ12 logF2 + (1− α− δ12) logF0
−(pi0 − ω0)dv log x} ,
(102)
u1 = inf
x>0
{pi0(1−R) logGo − ω0dv log x} , (103)
u2 = inf
x>0
{(1− pi0)(1−R) logGe − (α− ω0)dv log x} .
(104)
The base of all the logarithms is e. Then, Theorem 1 of
[70] and the last paragraph on page 521 of [70] implies the
following result:
Lemma 7. ( [70, Theorem 1]) Consider the random ensemble
of (dv, dc)-regular LDPC codes with dv > 4 and block length
N . Let α0 be the smallest positive root of the function f(α)
which is defined by
f(α) = max
γ12,δ12,pi0,ω0
ψ(α, γ12, δ12, pi0, ω0), (105)
where the maximization is over all values of γ12, δ12, pi0, ω0
that satisfy (88)-(92). Then, for any α¯0 < α0, if N is
sufficiently large, then except for almost all codes in this
ensemble can correct at least θα¯0N errors out of any arbitrary
α¯0N errors using one iteration of the PBF algorithm.
Proof: Here we briefly summarize the proof in [70].
Denote by p¯e(α¯0N) the fraction of (bad) codes in the (dv, dc)-
regular ensemble that cannot correct a linear fraction θα¯0N of
all combinations of α¯0N errors or less using one iteration of
the PBF algorithm. Then, according to (38) in [70], p¯e(α0N)
is upper-bounded by
p¯e(α¯0N) ≤
∑
αN≤α¯0N
C(αN)11/2eNf(α), (106)
where the summation is over all integer values of αN ≤ α¯0N ,
and C = (2pi)3/2e1/3 d
9/2
v d
3/2
c
β2
. Therefore, when α¯0 is suffi-
ciently small so that f(α) < 0 for all α < α¯0, p¯e(α¯0N)→ 0
as N → ∞, which means that almost all codes in the
(dv, dc)-regular ensemble can correct θ fraction of all possible
combinations of α¯0N errors using one iteration of the PBF
algorithm.
Theorem 1 in [70] was stated for θ = 0 and the original
constraint corresponding to the constraint (88) ((1− θ)αN ≤
γ12N + δ12N ) was αN ≤ γ12N + δ12N . In this paper,we use
the result for θ = constant > 0. This result can be obtained by
directly changing the original constraint αN ≤ γ12N + δ12N
in [70] to the new constraint (1−θ)αN ≤ γ12N + δ12N (this
direct change is also stated at the bottom of page 521 in [70]
after the proof of Theorem 1). A refined bound for (106) can
be obtained using (22)(23)(25) and (33) in [70], which shows
p¯e(α¯0N) ≤
∑
αN≤α¯0N
 ∑
γ12N,δ12N,pi0(1−R)N,ω0dvN
(2piNdv)
3/2e1/3
√
ω0(α− ω0)(pi0 − ω0)eNψ(α,γ12,δ12,pi0,ω0)
)
,
(107)
where the outer summation is over all integer values of αN ≤
α¯0N , and the inner summation is over all integer values of
γ12N, δ12N, pi0(1−R)N,ω0dvN that satisfy (88) to (92). We
will use this refined bound to obtain finite-length result in the
following example.
Example 1. One example of the parameter choice is dv = 9,
dc = 18 and θ = 0.15. In this case, we computed the first
positive root of f(α) = 0 using MATLAB and obtained
α = 5.1 · 10−4. This means that using one iteration of
the PBF algorithm, we can correct a fraction θ = 0.15 of
5.1 · 10−4 ·N worst-case errors using a (9, 18) regular LDPC
code when N is sufficiently large. We can also use this result
to obtain finite-length bounds (computing an upper bound on
the fraction of bad codes using (107)). We obtained that at least
4.86% of (9, 18) regular LDPC codes of length N = 50, 000
in the random LDPC ensemble can reduce the number of
errors by 15% using one iteration of the PBF algorithm, when
the number of errors is smaller than or equal to 20, which
corresponds to the case when α0 = 0.0004.
The existence of codes that satisfy requirement (A.3) can
also be established using Expander LDPC codes. Here, we
review some results on expander LDPCs [20].
Definition 7. (Expander Graph) An (N,P, dv, γ, α) bipartite
expander is a dv-left-regular bipartite graph G(VL ∪ VR, E)
where |VL| = N and |VR| = P . In this bipartite graph, it
23
holds that ∀S ⊂ VL with |S| ≤ γN , N (S) ≥ αdv|S|, where
N (S) denotes the neighborhood of the set S, i.e., the set of
nodes in VR connected to S.
An (N,P, dv, γ, α) expander LDPC code is a length-N
LDPC code, where the Tanner graph of the code is the
corresponding expander graph with VL corresponding to the
set of variable nodes and VR the parity check nodes. We use
dc = dvN/P to denote the right-degree of the expander code.
Lemma 8. ( [20, Thm11]) Using an (N,P, dv, γ, 34 + e)
regular expander LDPC code with parity check node degree
dc = dvN/P , one can use one iteration of noiseless PBF
algorithm to bring the fraction of errors down from α to
(1− 4e)α provided that the original corrupted codeword has
at most γ(1 + 4e)/2 fraction of errors.
Example 2. The construction of a good Expander code has
been investigated for a long time. Constructive approaches for
Expander codes can be found in [102], [103]. In [104]–[106],
it is shown that random regular LDPC codes are expanders
with high probability when the code length N →∞. In [106,
Theorem 8.7] it is shown that, suppose γmax is the positive
solution of the equation
dv − 1
dv
h2(γ)− 1
dc
h2(γdc ∗ (3/4 + e))
− γ(3/4 + e)dch2
(
1
(3/4 + e)dc
)
= 0,
(108)
then, for 3/4 + e < dv−1dv and γ ∈ (0, γmax), a random
regular (dv, dc) LDPC Tanner graph is a (dv, dc, γ, 34 +
e) expander with probability 1 − O(N−β), where β =
dv [1− (3/4 + e)] − 1 is a constant greater than 0 when
3/4 + e <
dv−1
dv
(which means that all sets of left nodes
with cardinality smaller than γN have an expansion factor at
least 34 +e). For dv = 16, dc = 32, and e = 0.0375 (which is
equivalent to 4e = 0.15, the same as θ = 0.15 in Example 1),
we use MATLAB to numerically solve the above equation and
obtained γmax ≈ 4.1∗10−5, which means the fraction of errors
α can be as large as γmax(1 + 4e)/2 = 2.3575 · 10−5.
APPENDIX E
PROOF OF THEOREM 5
We tune the energy supply such that
max(pand, pxor, pmaj) ≤ λ = θα0/2
[dc(1−R) + 1] + 2 , (109)
is satisfied for the first L − Lvs stages (first phase), which
ensures that
pand + [dc(1−R) + 1] pxor + pmaj ≤ θα0/2, (110)
is satisfied. We tune the energy supply such that
max(p
(i+1)
and , p
(i+1)
xor , p
(i+1)
maj ) ≤ λ(i+1) =
θα0
(
1− 12θ
)i
/4
[dc(1−R) + 1] + 2 ,
(111)
is satisfied for the last Lvs stages (second phase), which
ensures that
[dc(1−R) + 1]p(i+1)xor + p(i+1)maj + p(i+1)and ≤
1
4
θα0
(
1− 1
2
θ
)i
,
(112)
is satisfied (we have mentioned this in (64)). Since this version
of ENCODED-V technique with dynamic voltage scaling has
the same procedure and constant supply energy during the first
L − Lvs stages (first phase) as the ENCODED-F technique,
from Theorem 3, we know that after the first (L−Lvs) stages,
the output error fraction is smaller than α0 with probability at
least 1− P blke , where P blke < 3(L− Lvs) exp (−λ∗N) and λ∗
is defined in (33).
We will prove that, after the i-th stage of the remaining Lvs
stages, the error fraction is upper bounded by
α
(i)
PBF ≤ α0(1− θ/2)i, (113)
with high probability. Thus, after Lvs iterations, we obtain
α
(Lvs)
PBF ≤ α0(1− θ/2)Lvs ≤ ptar, (114)
where the last step can be verified by plugging in (63).
The case for i = 0 is already true as argued above.
Suppose (113) holds for some i ≥ 0, then, we prove (113)
also holds for the (i + 1)-th stage of the second phase. Note
that from (112), the probability that the number of new errors
introduced during the PBF decoding at the (i + 1)-th stage,
which is [dc(1−R) + 1]α(i+1)xor + α(i+1)maj + α(i+1)and , satisfies
Pr
(
[dc(1−R) + 1]α(i+1)xor + α(i+1)maj + α(i+1)and
>
1
2
α0θ(1− θ/2)i
)
(a)
< Pr
(
[dc(1−R) + 1]α(i+1)xor + α(i+1)maj + α(i+1)and
> [dc(1−R) + 1]p(i+1)xor + p(i+1)maj + p(i+1)and
+
1
4
α0θ(1− θ/2)i
)
<Pr
(
α
(i+1)
and > p
(i+1)
and + λ
(i+1)
)
+ Pr
(
α(i+1)xor > p
(i+1)
xor + λ
(i+1)
)
+ Pr
(
α
(i+1)
maj > p
(i+1)
maj + λ
(i+1)
)
(b)
< 3 exp
(
−λ˜(i+1)N
)
,
(115)
where step (a) follows from (112), step (c) follows from
the large deviation bound in Lemma 3 and λ˜(i+1) is de-
fined in (67). Therefore, with probability at least 1 −
3 exp
(
−λ˜(i+1)N
)
,
α
(i+1)
PBF ≤ α(i)PBF(1− θ) + [dc(1−R) + 1]α(i+1)xor
+ α
(i+1)
maj + α
(i+1)
and
(a)
≤ α0(1− θ/2)i(1− θ) + 1
2
α0θ(1− θ/2)i
= α0(1− θ/2)i+1,
(116)
where step (a) can be obtained by combining (116) and (113).
Now that we have proved (113) for the (i+1)-th stage, we can
carry out the math induction for all i that satisfies 1 ≤ i ≤ Lvs.
If (113) holds for all i, the final error fraction is smaller than
ptar. Thus, the overall probability that the final error fraction
is greater than ptar is upper bounded by the summation of
24
3(L − Lvs) exp (−λ∗N) in the first L − Lvs stages and the
RHS of (115) for the last Lvs stages, which is
P blke < 3(L− Lvs) exp (−λ∗N) + 3
Lvs∑
i=1
exp
(
−λ˜(i+1)N
)
.
(117)
Thus, (66) is proved.
Finally, we compute the overall energy consumption. The
energy consumed in the i-th stage can be written as
Ei = N
−1
and
(
p
(i)
and
)
+N−1maj
(
p
(i)
maj
)
+ (N + P )−1xor
(
p(i)xor
)
.
(118)
By summing over all stages both in the first phase and the
second phase and normalizing by the number of outputs K,
the total energy consumption per output bit can be written as
in (65).
APPENDIX F
PROOF OF COROLLARY 1
We choose
pand = pxor = pmaj = λ =
θα0/2
[dc(1−R) + 1] + 2 , (119)
in the first L− Lvs stages and
p
(i+1)
and = p
(i+1)
xor = p
(i+1)
maj = λ
(i+1) =
θα0
(
1− 12θ
)i
/4
[dc(1−R) + 1] + 2 ,
(120)
in the i-th stage of the last Lvs stages (defined in (63)).
By plugging in (119), (120) and Lvs =
⌈
log 2ptar
+logα0
log 1
1− 1
2
θ
⌉
into the error probability expression (66), we know that the
ENCODED-V technique has output error fraction smaller than
α0(1− θ/2)Lvs ≤ 12ptar with probability at least 1 − P blke ,
where P blke satisfies
P blke <3(L− Lvs) exp (−λ∗N) + 3Lvs exp(−λ˜(Lvs+1)N),
(121)
where λ˜(i+1) = D(2λ(i+1)‖λ(i+1)) = (2 log 2− 1)λ(i+1) +
O((λ(i+1))2) and λ∗ = D(2λ‖λ) = (2 log 2− 1)λ +O(λ2).
Since λ(Lvs+1) =
θα0(1− 12 θ)
Lvs/4
[dc(1−R)+1]+2 and α0(1− θ/2)
Lvs−1 >
1
2ptar, we have λ
(Lvs+1) >
θptar(1− 12 θ)/8
[dc(1−R)+1]+2 . Therefore,
P blke < 3L exp(−θ∗N), (122)
where θ∗ = min
{
λ∗, D
(
2
θptar(1− 12 θ)/4
[dc(1−R)+1]+2
∥∥∥∥ θptar(1− 12 θ)/4[dc(1−R)+1]+2 )}.
Denote the output error fraction by δfrace , which is a random
variable supported on [0, 1]. We know that Pr(δfrace >
1
2ptar) <
P blke . Thus, the output bit error probability is upper bounded
by
E[δfrace ] < Pr(δfrace >
1
2
ptar)E
[
δfrace |δfrace >
1
2
ptar
]
+ Pr(δfrace ≤
1
2
ptar))E
[
δfrace |δfrace ≤
1
2
ptar
]
< P blke +
1
2
ptar.
(123)
When N > 1θ∗ log
(
6L
ptar
)
, P blke <
1
2ptar, and hence the output
bit error probability E[δfrace ] satisfies E[δfrace ] < ptar.
In this corollary, we only examine the case when and(u) =
xor(u) = maj(u) = (u) (either polynomial decay or expo-
nential decay). We also choose the same gate error probabil-
ities pand = pxor = pmaj = λ and p
(i+1)
and = p
(i+1)
xor = p
(i+1)
maj =
λ(i+1) for different types of unreliable gates (see (119) and
(120)). Therefore, the energy consumption (65) is simplified
to
Eper-bit ≤ 3N + P
K
(L− Lvs) −1 (λ) +3N + P
K
Lvs∑
i=1
−1
(
λ(i)
)
,
(124)
where Lvs =
⌈
log 2ptar
+logα0
log 1
1− 1
2
θ
⌉
.
When the energy-reliability tradeoff function and(u) =
xor(u) = maj(u) = (
1
u )
c, c > 0, the total energy consumption
per bit
Eper-bit ≤ 3N + P
K
λ−
1
c
(L− Lvs) + (1− 12θ)−Lvsc − 1(
1− 12θ
)− 1c − 1

≤ 3N + P
K
λ−
1
c [(L− Lvs)
+
(
1− 12θ
)− 1c( ptar
2α0
)− 1c − 1(
1− 12θ
)− 1c − 1

= Θ
(
N
K
max
{
L,
(
1
ptar
) 1
c
})
.
(125)
When the energy-reliability tradeoff function and(u) =
xor(u) = maj(u) = exp(−cu), c > 0, the total energy
consumption per bit
Eper-bit =
3N + P
cK
(
L log
1
λ
+
1
2
Lvs (Lvs + 1) log
1
1− 12θ
)
= Θ
(
N
K
max
{
L, log2
1
ptar
})
.
(126)
When the energy-reliability tradeoff function and(u) =
xor(u) = maj(u) = exp(−c
√
u), c > 0, the total energy
consumption per bit
Eper-bit =
3N + P
K
(L− Lvs)
(
1
c
log
1
λ
)2
+
3N + P
K
Lvs∑
i=1
(
1
c
log
1
λ
+
i− 1
c
log
1
1− 12θ
)2
=
3N + P
K
[
L
(
1
c
log
1
λ
)2
+
2
c
log
1
λ
· 1
c
log
1
1− 12θ
(Lvs − 1)Lvs
2
+
1
c2
log2
1
1− 12θ
1
6
(Lvs − 1)Lvs(2Lvs − 1)
]
= Θ
(
N
K
max
{
L, log3
1
ptar
})
.
(127)
25
REFERENCES
[1] Y. Yang, P. Grover, and S. Kar, “Computing linear transforms with un-
reliable components,” in Proc. IEEE Int. Symp. Inf. Theory, pp. 1934–
1938, IEEE, 2016.
[2] S. Borkar, “Designing reliable systems from unreliable components:
the challenges of transistor variability and degradation,” IEEE Micro,
vol. 25, no. 6, pp. 10–16, 2005.
[3] N. R. Shanbhag, S. Mitra, G. de Veciana, M. Orshansky, R. Marculescu,
J. Roychowdhury, D. Jones, and J. M. Rabaey, “The search for
alternative computational paradigms,” IEEE Des. Test. Comput, vol. 25,
no. 4, pp. 334–343, 2008.
[4] I. S. Haque and V. S. Pande, “Hard data on soft errors: A large-scale
assessment of real-world error rates in GPGPU,” in Proc. IEEE/ACM
Int. Conf. Cluster Cloud Grid Comput., pp. 691–696, IEEE Computer
Society, 2010.
[5] R. H. Dennard, V. L. Rideout, E. Bassous, and A. R. Leblanc, “Design
of ion-implanted MOSFET’s with very small physical dimensions,”
IEEE J. Solid-State Circuits, vol. 9, no. 5, pp. 256–268, 1974.
[6] R. Nathanael and T.-J. K. Liu, CMOS and Beyond: Logic Switches for
Terascale Integrated Circuits, ch. 11 Mechanical switches. Cambridge
University Press, 2014.
[7] P. Pillai and K. G. Shin, “Real-time dynamic voltage scaling for low-
power embedded operating systems,” ACM SIGOPS Operating Systems
Review, vol. 35, no. 5, pp. 89–102, 2001.
[8] C. Zhao, X. Bai, and S. Dey, “Evaluating transient error effects in
digital nanometer circuits,” IEEE Trans. Rel, vol. 56, no. 3, pp. 381–
391, 2007.
[9] M. Miranda, “The threat of semiconductor variability: As transistors
shrink, the problem of chip variability grows.” http://spectrum.ieee.
org/semiconductors/design/the-threat-of-semiconductor-variability,
June 2012.
[10] M. M. Shulaker, G. Hills, N. Patil, H. Wei, H. Chen, H.-S. P. Wong,
and S. Mitra, “Carbon nanotube computer,” Nature, vol. 501, no. 7468,
pp. 526–530, 2013.
[11] N. Patil, J. Deng, A. Lin, H.-S. Wong, and S. Mitra, “Design methods
for misaligned and mispositioned carbon-nanotube immune circuits,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27,
no. 10, pp. 1725–1736, 2008.
[12] C. Shannon, “A mathematical theory of communication,” Bell Syst.
Tech. J., vol. 27, pp. 623–656, Oct 1948.
[13] J. V. Neumann, “Probabilistic logics and the synthesis of reliable
organisms from unreliable components,” Automata studies, vol. 34,
pp. 43–98, 1956.
[14] R. A. Abdallah and N. R. Shanbhag, “An energy-efficient ecg processor
in 45-nm CMOS using statistical error compensation,” IEEE J. Solid-
State Circuits, vol. 48, no. 11, pp. 2882–2893, 2013.
[15] J. Han, E. Leung, L. Liu, and F. Lombardi, “A fault-tolerant technique
using quadded logic and quadded transistors,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. PP, no. 99, pp. 1–1, 2014.
[16] J. Han, J. Gao, P. Jonker, Y. Qi, and J. Fortes, “Toward hardware-
redundant, fault-tolerant logic for nanoelectronics,” IEEE Des. Test.
Comput, vol. 22, pp. 328–339, July 2005.
[17] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,
D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “Razor: A low-power
pipeline based on circuit-level timing speculation,” in Proc. 36th Int’l
Symp. Microarchitecture, pp. 7–18, 2003.
[18] H. Cho, L. Leem, and S. Mitra, “Ersa: Error resilient system architec-
ture for probabilistic applications,” IEEE Trans. Comput.-Aided Design
Integr. Circuits Syst., vol. 31, no. 4, pp. 546–558, 2012.
[19] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inf.
Theory, vol. 8, pp. 21–28, January 1962.
[20] M. Sipser and D. A. Spielman, “Expander codes,” IEEE Trans. Inf.
Theory, vol. 42, no. 6, 1996.
[21] M. G. Taylor, “Reliable information storage in memories designed from
unreliable components,” Bell Syst. Tech. J., vol. 47, no. 10, pp. 2299–
2337, 1968.
[22] A. V. Kuznetsov, “Information storage in a memory assembled from
unreliable components,” Probl. Inf. Transm., vol. 9, no. 3, pp. 100–114,
1973.
[23] S. K. Chilappagari and B. Vasic, “Fault tolerant memories based on
expander graphs,” in Proc. IEEE Inf. Theory Workshop, 2007.
[24] K. Gunnam, G. Choi, M. Yeary, S. Yang, and Y. Lee, “Next generation
iterative LDPC solutions for magnetic recording storage,” in Proc.
Asilomar Conf. Signal, Syst. Comput., pp. 1148–1152, Oct 2008.
[25] C.-H. Huang, Y. Li, and L. Dolecek, “ACOCO: Adaptive coding for
approximate computing on faulty memories,” IEEE Trans. Commun.,
vol. 63, no. 12, pp. 4615–4628, 2015.
[26] L. Varshney, “Performance of LDPC codes under faulty iterative
decoding,” IEEE Trans. Inf. Theory, vol. 57, pp. 4427–4444, July 2011.
[27] S. M. S. Tabatabaei, H. Cho, and L. Dolecek, “Gallager B decoder on
noisy hardware,” IEEE Trans. Commun., vol. 61, pp. 1660–1673, May
2013.
[28] C.-H. Huang, Y. Li, and L. Dolecek., “Gallager B LDPC decoder with
transient and permanent errors,” IEEE Trans. Commun., vol. 62, pp. 15–
28, January 2014.
[29] A. Balatsoukas-Stimming and A. Burg, “Density evolution for min-
sum decoding of LDPC codes under unreliable message storage,” IEEE
Commun. Lett., vol. 18, no. 5, pp. 849–852, 2014.
[30] E. D. C. Kameni Ngassa, V. Savin and D. Declercq, “Density evolution
and functional threshold for the noisy min-sum decoder,” IEEE Trans.
Commun., vol. 63, no. 5, pp. 1497–1509, 2015.
[31] T. Richardson and R. Urbanke, “The capacity of low-density parity-
check codes under message-passing decoding,” IEEE Trans. Inf. The-
ory, vol. 47, pp. 599–618, Feb 2001.
[32] M. Lentmaier, D. Truhachev, K. Zigangirov, and D. Costello, “An anal-
ysis of the block error probability performance of iterative decoding,”
IEEE Trans. Inf. Theory, vol. 51, pp. 3834–3855, Nov 2005.
[33] C. Hadjicostis and G. C. Verghese, “Coding approaches to fault
tolerance in linear dynamic systems,” IEEE Trans. Inf. Theory, vol. 51,
pp. 210–228, Jan 2005.
[34] D. A. Spielman, “Highly fault-tolerant parallel computation,” in Proc.
Symp. Foundations Comput. Sci., pp. 154–163, Oct 1996.
[35] E. Rachlin and J. E. Savage, “A framework for coded computation,”
in Proc. IEEE Int. Symp. Inf. Theory, pp. 2342–2346, July 2008.
[36] D. Bertozzi, L. Benini, and G. De Micheli, “Error control schemes
for on-chip communication links: the energy-reliability tradeoff,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, pp. 818–
831, June 2005.
[37] F. Simon, “On the capacity of noisy computations,” in Proc. IEEE Inf.
Theory Workshop, pp. 185–189, Oct 2011.
[38] F. Simon, “Capacity of a noisy function,” in Proc. IEEE Inf. Theory
Workshop, pp. 1–5, Aug 2010.
[39] T. Koch, A. Lapidoth, and P. Sotiriadis, “Channels that heat up,” IEEE
Trans. Inf. Theory, vol. 55, pp. 3594–3612, Aug 2009.
[40] P. Grover, K. Woyach, and A. Sahai, “Towards a communication-
theoretic understanding of system-level power consumption,” IEEE J.
Sel. Areas Commun, vol. 29, pp. 1744–1755, September 2011.
[41] P. Grover, A. Goldsmith, and A. Sahai, “Fundamental limits on the
power consumption of encoding and decoding,” in Proc. IEEE Int.
Symp. Inf. Theory, pp. 2716–2720, IEEE, 2012.
[42] P. Grover, “Information friction and its implications on minimum
energy required for communication,” IEEE Trans. Inf. Theory, vol. 61,
pp. 895–907, Feb 2015.
[43] C. Blake and F. R. Kschischang, “Energy of decoding algorithms,” in
Canadian Workshop on Inf. Theory, pp. 1–5, June 2013.
[44] C. G. Blake and F. R. Kschischang, “Energy consumption of VLSI
decoders,” IEEE Trans. Inf. Theory, vol. 61, pp. 3185–3198, June 2015.
[45] P. Grover, “Is ‘shannon-capacity of noisy computing’ zero?,” in Proc.
IEEE Int. Symp. Inf. Theory, pp. 2854–2858, June 2014.
[46] T. Karnik and P. Hazucha, “Characterization of soft errors caused by
single event upsets in CMOS processes,” IEEE Trans. Dependable
Secure Comput., vol. 1, pp. 128–143, April 2004.
[47] R. L. V. Dobrushin and S. I. Ortyukov, “Upper bound on the
redundancy of self-correcting arrangements of unreliable functional
elements,” Probl. Inf. Transm., vol. 13, no. 3, pp. 56–76, 1977.
[48] N. Pippenger, “On networks of noisy gates,” in Proc. Symp. Founda-
tions Comput. Sci., pp. 30–38, IEEE, 1985.
[49] N. Pippenger, G. Stamoulis, and J. Tsitsiklis, “On a lower bound for
the redundancy of reliable networks with noisy gates,” IEEE Trans. Inf.
Theory, vol. 37, pp. 639–643, May 1991.
[50] C. N. Hadjicostis, “Nonconcurrent error detection and correction in
fault-tolerant linear finite-state machines,” IEEE Trans. Autom. Control,
vol. 48, no. 12, pp. 2133–2140, 2003.
[51] N. C. Laurenciu, T. Gupta, V. Savin, and S. Cotofana, “Error correction
code protected data processing units,” in Proc ACM/IEEE NANOARCH,
pp. 37–42, IEEE, 2016.
[52] Y. Yang, P. Grover, and S. Kar, “Can a noisy encoder be used to
communicate reliably?,” in Proc. Allerton Conf. on Commun., Control
and Comput., pp. 659–666, Sept 2014.
26
[53] E. Dupraz, V. Savin, S. K. Grandhi, E. Popovici, and D. Declercq,
“Practical LDPC encoders robust to hardware errors,” in 2016 IEEE
International Conference on Communications (ICC),, pp. 1–6, IEEE,
2016.
[54] J. Hachem, I.-H. Wang, C. Fragouli, and S. Diggavi, “Coding with
encoding uncertainty,” in Proc. IEEE Int. Symp. Inf. Theory, pp. 276–
280, July 2013.
[55] K.-H. Huang and J. Abraham, “Algorithm-based fault tolerance for
matrix operations,” IEEE Trans. Comput., vol. C-33, pp. 518–528, June
1984.
[56] S.-J. Wang and N. K. Jha, “Algorithm-based fault tolerance for fft
networks,” IEEE Trans. Comput., vol. 43, pp. 849–854, Jul 1994.
[57] C. Ding, C. Karlsson, H. Liu, T. Davies, and Z. Chen, “Matrix
multiplication on GPUs with on-line fault tolerance,” in Proc. IEEE
Int’l Symp. Parallel Distrib. Process. Appl., pp. 311–317, May 2011.
[58] C. Anfinson and F. Luk, “A linear algebraic model of algorithm-based
fault tolerance,” IEEE Trans. Comput., vol. 37, pp. 1599–1604, Dec
1988.
[59] Z. Chen, “Optimal real number codes for fault tolerant matrix op-
erations,” in Proc. Conf. High Performance Computing Networking
Storage and Analysis, SC ’09, (New York, NY, USA), pp. 29:1–29:10,
ACM, 2009.
[60] C. Radhakrishnan and A. C. Singer, “Recursive least squares filtering
under stochastic computational errors,” in Proc. Asilomar Conf. Signal,
Syst. Comput., pp. 1529–1532, 2013.
[61] K. Bowman, J. Tschanz, C. Wilkerson, T. K. S.-L. Lu, V. De, and
S. Borkar, “Circuit techniques for dynamic variation tolerance,” in
Proc. 46th ACM/IEEE DAC, pp. 4–7, 2007.
[62] J. W. Choi, B. Shim, A. Singer, and N. I. Cho, “Low-power filtering via
minimum power soft error cancellation,” IEEE Trans. Signal Process.,
vol. 55, pp. 5084–5096, Oct 2007.
[63] C. N. Hadjicostis, “Non-concurrent error detection and correction in
discrete-time LTI dynamic systems,” in Proc. 40th IEEE Conf. Decision
and Control, vol. 2, pp. 1899–1904, 2001.
[64] I. Nahlus, E. P. Kim, N. R. Shanbhag, and D. Blaauw, “Energy-efficient
dot product computation using a switched analog circuit architecture,”
in Proc. Int. Symp. Low Power Electron. Des., pp. 315–318, ACM,
2014.
[65] A. van de Goor, “Using march tests to test SRAMs,” IEEE Des. Test.
Comput, vol. 10, pp. 8–14, March 1993.
[66] I. Kim, Y. Zorian, G. Komoriya, H. Pham, F. P. Higgins, and J. L.
Lewandowski, “Built in self repair for embedded high density SRAM,”
in Proc. IEEE Int’l Test Conf., pp. 1112–1119, Oct 1998.
[67] R. Ahlswede and J. Wolfowitz, “The capacity of a channel with
arbitrarily varying channel probability functions and binary output
alphabet,” Z WAHRSCHEINLICHKEIT, vol. 15, no. 3, pp. 186–194,
1970.
[68] K. Marton, “Error exponent for source coding with a fidelity criterion,”
IEEE Trans. Inf. Theory, vol. 20, pp. 197–199, March 1974.
[69] D. E. Knuth, The Art of Computer Programming, volume 1: Fundamen-
tal Algorithms Addison-Wesley. Addison-Wesley Professional, 1997.
[70] D. Burshtein, “On the error correction of regular LDPC codes using the
flipping algorithm,” IEEE Trans. Inf. Theory, vol. 54, no. 2, pp. 517–
530, 2008.
[71] E. E. X.-Y. Hu and D. M. Arnold, “Regular and irregular progressive
edge-growth tanner graphs,” IEEE Trans. Inf. Theory, vol. 51, no. 1,
pp. 386–398, 2005.
[72] C. Berrou and A. Glavieux, “Near optimum error correcting coding
and decoding: turbo-codes,” IEEE Trans. Commun., vol. 44, pp. 1261–
1271, Oct 1996.
[73] A. Abbasfar, D. Divsalar, and K. Yao, “Accumulate-repeat-accumulate
codes,” IEEE Trans. Commun., vol. 55, pp. 692–702, April 2007.
[74] S. Lin and D. Costello, Error control coding. 2004.
[75] B. Vasic and S. K. Chilappagari, “An information theoretical framework
for analysis and design of nanoscale fault-tolerant memories based on
low-density parity-check codes,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 54, no. 11, pp. 2438–2446, 2007.
[76] J. Du and Y.-C. Wu, “Network-wide distributed carrier frequency
offsets estimation and compensation via belief propagation,” IEEE
Trans. Signal Process, vol. 61, no. 23, pp. 5868–5877, 2013.
[77] T. Richardson and R. Urbanke, “Efficient encoding of low-density
parity-check codes,” IEEE Trans. Inf. Theory, vol. 47, pp. 638–656,
Feb 2001.
[78] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypoth-
esis based on the sum of observations,” Ann. Math. Stat., pp. 493–507,
1952.
[79] Y. Kou, S. Lin, and M. Fossorier, “Low-density parity-check codes
based on finite geometries: a rediscovery and new results,” IEEE Trans.
Inf. Theory, vol. 47, pp. 2711–2736, Nov 2001.
[80] J. Moura, J. Lu, and H. Zhang, “Structured low-density parity-check
codes,” IEEE Signal Process. Mag., vol. 21, pp. 42–55, Jan 2004.
[81] J. Thorpe, “Low-density parity-check (ldpc) codes constructed from
protographs,” IPN progress report, vol. 42, no. 154, pp. 42–154, 2003.
[82] M. Fossorier, “Quasicyclic low-density parity-check codes from circu-
lant permutation matrices,” IEEE Trans. Inf. Theory, vol. 50, pp. 1788–
1793, Aug 2004.
[83] Y. Yang, P. Grover, and S. Kar, “Computing linear transformations with
unreliable components,” arXiv:1506.07234, 2015.
[84] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler,
D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “Razor: a low-power
pipeline based on circuit-level timing speculation,” in Proc. 36th Int’l
Symp. Microarchitecture, pp. 7–18, Dec 2003.
[85] A. D. Patil, S. Manipatruni, D. Nikonov, I. A. Young, and N. R.
Shanbhag, “Shannon-inspired statistical computing to enable spintron-
ics,” arXiv preprint arXiv:1702.06119, 2017.
[86] W. H. Butler, T. Mewes, C. K. A. Mewes, P. B. Visscher, W. H. Rippard,
S. E. Russek, and R. Heindl, “Switching distributions for perpendicular
spin-torque devices within the macrospin approximation,” IEEE Trans.
Magn., vol. 48, no. 12, pp. 4684–4700, 2012.
[87] J. Kim, A. Paul, P. A. Crowell, S. J. Koester, S. S. Sapatnekar, J.-P.
Wang, and C. H. Kim, “Spin-based computing: device concepts, current
status, and a case study on a high-performance microprocessor,” Proc.
IEEE, vol. 103, no. 1, pp. 106–130, 2015.
[88] S. Manipatruni, D. E. Nikonov, and I. A. Young, “Modeling and design
of spintronic integrated circuits,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 12, pp. 2801–2814, 2012.
[89] S. Chilappagari, M. Ivkovic, and B. Vasic, “Analysis of one step
majority logic decoders constructed from faulty gates,” in Proc. IEEE
Int. Symp. Inf. Theory, pp. 469–473, July 2006.
[90] K. Ganesan, P. Grover, and J. Rabaey, “The power cost of over-
designing codes,” in Proceedings of IEEE Workshop on Signal Pro-
cessing Systems (SiPS), pp. 128–133, Oct 2011.
[91] S. Kudekar, T. Richardson, and R. Urbanke, “Spatially coupled en-
sembles universally achieve capacity under belief propagation,” IEEE
Trans. Inf. Theory, vol. 59, pp. 7761–7813, Dec 2013.
[92] K. Lee, M. Lam, R. Pedarsani, D. Papailiopoulos, and K. Ramchandran,
“Speeding up distributed machine learning using codes,” in Proc. IEEE
Int. Symp. Inf. Theory, pp. 1143–1147, IEEE, 2016.
[93] Y. Yang, P. Grover, and S. Kar, “Fault-tolerant parallel linear filtering
using compressive sensing,” in Proc. Int. Symp. Turbo Codes &
Iterative Information Processing, pp. 201–205, IEEE, 2016.
[94] S. Dutta, V. Cadambe, and P. Grover, “Short-dot: Computing large
linear transforms distributedly using coded short dot products,” in
Advances In Neural Information Processing Systems, pp. 2092–2100,
2016.
[95] S. Dutta, V. Cadambe, and P. Grover, “Coded convolution can provide
arbitrarily large gains in successfully computing before a deadline,”
in IEEE International Symposium on Information Theory (ISIT), July
2017.
[96] A. Reisizadehmobarakeh, S. Prakash, R. Pedarsani, and S. Avestimehr,
“Coded computation over heterogeneous clusters,” in 2017 Proc. Work-
shop Inf. Theory Appl., IEEE, 2017.
[97] F. Cappello, A. Geist, W. Gropp, S. Kale, B. Kramer, and M. Snir,
“Toward exascale resilience: 2014 update,” Supercomputing frontiers
and innovations, vol. 1, no. 1, pp. 5–28, 2014.
[98] A. Kaminsky, “BIG CPU, BIG DATA: Solving the world’s toughest
computational problems with parallel computing,” 2016.
[99] R. A. Van De Geijn and J. Watts, “Summa: Scalable universal ma-
trix multiplication algorithm,” Concurrency-Practice and Experience,
vol. 9, no. 4, pp. 255–274, 1997.
[100] Y. Yang, P. Grover, and S. Kar, “Fault-tolerant distributed logistic
regression using unreliable components,” in Proc. Allerton Conf. on
Commun., Control and Comput., pp. 940–947, IEEE, 2016.
[101] F. Zhang and H. D. Pfister, “Verification decoding of high-rate LDPC
codes with applications in compressed sensing,” IEEE Trans. Inf.
Theory, vol. 58, no. 8, pp. 5042–5058, 2012.
[102] M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson, “Randomness
conductors and constant-degree lossless expanders,” in Proc. 34th ACM
Symp. Theory Comput., pp. 659–668, ACM, 2002.
[103] V. Guruswami, C. Umans, and S. Vadhan, “Unbalanced expanders and
randomness extractors from Parvaresh-Vardy codes,” J. ACM, vol. 56,
no. 4, p. 20, 2009.
27
[104] D. Burshtein and G. Miller, “Expander graph arguments for message-
passing algorithms,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp. 782–
790, 2001.
[105] J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright,
“LP decoding corrects a constant fraction of errors,” IEEE Trans. Inf.
Theory, vol. 53, no. 1, pp. 82–89, 2007.
[106] T. Richardson and R. Urbanke, Modern coding theory. Cambridge
University Press, 2008.
28
