On the Total-Power Capacity of Regular-LDPC Codes with Iterative
  Message-Passing Decoders by Ganesan, Karthik et al.
On the Total Power Capacity of Regular-LDPC Codes with
Iterative Message-Passing Decoders
Karthik Ganesan†, Pulkit Grover‡, Jan Rabaey§, and Andrea Goldsmith†
Abstract—Motivated by recently derived fundamental limits
on total (transmit + decoding) power for coded communication
with VLSI decoders, this paper investigates the scaling behavior
of the minimum total power needed to communicate over AWGN
channels as the target bit-error-probability tends to zero. We
focus on regular-LDPC codes and iterative message-passing
decoders. We analyze scaling behavior under two VLSI complex-
ity models of decoding. One model abstracts power consumed
in processing elements (“node model”), and another abstracts
power consumed in wires which connect the processing elements
(“wire model”). We prove that a coding strategy using regular-
LDPC codes with Gallager-B decoding achieves order-optimal
scaling of total power under the node model. However, we also
prove that regular-LDPC codes and iterative message-passing
decoders cannot meet existing fundamental limits on total power
under the wire model. Further, if the transmit energy-per-bit is
bounded, total power grows at a rate that is worse than uncoded
transmission. Complementing our theoretical results, we develop
detailed physical models of decoding implementations using
post-layout circuit simulations. Our theoretical and numerical
results show that approaching fundamental limits on total power
requires increasing the complexity of both the code design and
the corresponding decoding algorithm as communication distance
is increased or error-probability is lowered.
Index Terms—Low-density parity-check (LDPC) codes; Itera-
tive message-passing decoding; Total power channel capacity;
Energy-efficient communication; System-level power consump-
tion; Circuit power consumption; VLSI complexity theory.
I. INTRODUCTION
Intuitively, the concept of Shannon capacity captures how
much information can be communicated across a channel
under specified resource constraints. While the problem of
approaching Shannon capacity under solely transmit power
constraints is well understood, modern communication often
takes place at transmitter-receiver distances that are very
short (e.g., on-chip communication [3], short distance wired
communication [4], and extremely-high-frequency short-range
wireless communication [5]). Empirically, it has been observed
that at such short distances, the power required for processing
a signal at the transmitter/receiver circuitry can dominate the
power required for transmission, sometimes by orders of mag-
nitude [4], [6], [7]. For instance, the power consumed in the
decoding circuitry of multi-gigabit-per-second communication
systems can be hundreds of milliwatts or more (e.g., [4], [8]),
while the transmit power required is only tens of milliwatts [7].
† Electrical Engineering, Stanford University. ‡ Electrical and Computer
Engineering, Carnegie Mellon University. § Electrical Engineering and Com-
puter Science, University of California at Berkeley. (Email correspondence
should be addressed to karthik3@stanford.edu.)
Early results related to this paper were presented at the 2012 Allerton
Conference [1] and IEEE Globecom 2012 [2].
Thus, transmit power constraints do not abstract the relevant
power consumed in many modern systems.
Shannon capacity, complemented by modern coding-
theoretic constructions [9], has provided a framework that is
provably good for minimizing transmit power (e.g., in power-
constrained AWGN channels). In this work, we focus on a ca-
pacity question that is motivated by total power: at what max-
imum rate can one communicate across a channel for a given
total power, and a specified error-probability? Alternatively,
given a target communication rate and error-probability, what
is the minimum required total power? The first simplifying
perspective to this problem was adopted in [10], [11], where
all of the processing power components at the transmitter and
the receiver were lumped together. However, processing power
is influenced heavily by the specific modulation choice, coding
strategy, equalization strategy, etc. [4], [6]. Even for a fixed
communication strategy, processing power depends strongly
on the implementation technology (e.g., 45 nm CMOS) and
the choice of circuit architecture.
Using theoretical models of VLSI implementations [12],
recent literature has explored fundamental scaling limits [6],
[13], [14], [15] on the transmit + decoding power consumed by
error-correcting codes. These works abstract energy consumed
in processing nodes [6] and wires [14], [13], [15] in the
VLSI decoders, and show that there is a fundamental tradeoff
between transmit and decoding power.
In this work, we examine the achievability side of the
question (see Fig. 1): what is the total power that known code
families and decoding algorithms can achieve? To address this
question, we first provide asymptotic bounds (Sections IV-V)
on required decoding power. To do so, we restrict our analysis
to binary regular-LDPC codes and iterative message-passing
decoding algorithms. Our code-family choice is motivated by
both the order-optimality of regular-LDPC codes in some
theoretical models of circuit power [6], and their practical
utility in both short [16] and long [17] distance settings. Recent
work of Blake and Kschischang [18] also studied the energy
complexity of LDPC decoding circuits, and an important
connection to this paper is highlighted in Section VII.
Within these restrictions we provide the following insights:
1) Wiring power, which explicitly brings out physical con-
straints in a digital system [19], costs more in the
order sense than the power consumed in processing
nodes. Thus, the commonly used metric for decoding
complexity — number of operations — underestimates
circuit energy costs.
2) Shannon capacity is the maximal rate one can communi-
cate at with arbitrary reliability while the transmit power
ar
X
iv
:1
50
4.
01
01
9v
3 
 [c
s.I
T]
  1
8 N
ov
 20
15
2is held fixed. However, when total power minimization
is the goal, keeping transmit power fixed while bit-error
probability approaches zero can lead to highly subopti-
mal decoding power. For instance, we prove that (Theo-
rems 3, 4, 5) at sufficiently low bit-error probability, it is
more total power efficient to use uncoded transmission
than regular-LDPC codes with iterative message-passing
decoding, if using fixed transmit power. However, if
transmit power is allowed to diverge to infinity, we
show that regular-LDPC codes can outperform uncoded
transmission in this total power sense.
3) We prove (Corollary 2) that a strategy using regular-
LDPC codes and the Gallager-B decoding achieves
order-optimal scaling of total power when processing
power is dominated by nodes as opposed to wires (see
Section IV-C).
4) However, we also prove a lower bound (Theorem 3) that
holds for all regular-LDPC codes with iterative message-
passing decoders for the case where processing power is
dominated by wires, and we show that a large gap exists
between this lower bound and existing fundamental
limits (see Section V-C).
To obtain insights on how an engineer might choose a
power-efficient code for a given system, we then develop
empirical models of decoding power consumption of 1-bit
and 2-bit message-passing algorithms for regular-LDPC codes
(Section VI-C). These models are constructed using post-
layout circuit simulations of power consumption for check-
node and variable-node sub-circuits, and generalizing the
remaining components of power to structurally similar codes.
Shannon-theoretic analysis yields transmit-power-centric re-
sults, which are plotted as “waterfall” curves (with correspond-
ing “error-floors”) demonstrating how close the code performs
to the Shannon limit. There, the channel path-loss can usually
be ignored because it is merely a scaling factor for the term
to be optimized (namely the transmit power), thereby not
affecting the optimizing code. Since we are interested in total
power, the path-loss impacts the code choice. For simplicity
of understanding, path-loss is translated into a more relatable
metric — communication distance — using a simple model
for path-loss. The resulting question is illustrated in Fig. 1(b):
At a given data-rate, what code and corresponding decoding
algorithm minimize the transmit + decoding power for a given
transmit distance and bit-error probability?
In Section VI-C, we present optimization results for this
question in a 60 GHz communication setting using our models.
This particular setting is chosen not just because of the
short distance, but also because the results highlight another
conceptual point we stress in this paper:
5) Approaching total power capacity requires an increase
in the complexity of both the code design and the
corresponding decoding algorithm as communication
distance is increased, or bit-error probability is lowered.
The results presented in this paper have some limi-
tations. First, we only consider a limited set of coding
strategies, and while the results and models presented here
extend easily to irregular LDPC constructions, they are
Fig. 1: a). The question explored in Sections II-V: How fast does total power
diverge to ∞ as bit-error probability Pe → 0 for regular-LDPC codes and
iterative message-passing decoding algorithms? b). The question explored in
Section VI-C: what is the most power-efficient pairing of a code and decoding
algorithm for a given distance and bit-error probability?
not necessarily applicable to all decoders. Second, mod-
ern transceivers [20] contain many other processing power
sinks, including analog-to-digital converters (ADCs), digital-
to-analog converters (DACs), power amplifiers, modulation,
and equalizers, and the power requirements of each of these
components can vary1 based on the coding strategy. While
recent works have started to address fundamental limits [21]
and modeling [22] of power consumption of system blocks
from a mixed-signal circuit design perspective, tradeoffs with
code choice of these components remain relatively unex-
plored. Hence, while analyzing decoding power is a start,
other system-level tradeoffs should be addressed in future
work. It is also of great interest to understand tradeoffs at a
network level (see [6]), where multiple transmitting-receiving
pairs are communicating in a shared wireless medium. In
such situations, one cannot simply increase transmit power
to reduce decoding power: the resulting interference to other
users needs to be accounted for as well.
The remainder of the paper is organized as follows. Sec-
tion II states the assumptions and notation used in the paper.
Sections II-C to II-G introduce theoretical models of VLSI
circuits and decoding energy. Preliminary results are stated
in Section III, which are used to analyze decoding energy in
1For example, the resolution of ADCs used at the receiver may vary with
the code choice by virtue of the fact that changing the rate of the code may
require a change in signaling constellation (when channel bandwidth and data-
rate are fixed).
3Sections IV and V, in the context of the question illustrated
in Fig. 1a) (obtaining the scaling behavior). Section VI dis-
cusses circuit-simulation-based numerical models of decoding
power, in the context of the question illustrated in Fig. 1b).
Section VII concludes the paper.
II. SYSTEM AND VLSI MODELS FOR ASYMPTOTIC
ANALYSIS
Throughout this paper, we rely on Bachmann-Landau nota-
tion [23] (i.e. “big-O” notation). We first state a preliminary
definition that is needed in order to state a precise definition
of the big-O notation that we use in this paper.
Definition 1. X ⊆ R is a right-sided set if ∀x ∈ X , ∃y ∈ X
such that y > x.
Some examples of right-sided sets include R, N, and inter-
vals of the form [a,∞), where a is a constant. We now state
the Bachmann-Landau notation for non-negative real-valued
functions defined on right-sided sets2.
Definition 2. Let f : X → R≥0 and g : X → R≥0 be two
non-negative real-valued functions, both defined on a right-
sided set X . We state
1) f(x) = O(g(x)) if ∃x1 ∈ X and c1 > 0 s.t.
f(x) ≤ c1g(x), ∀x ≥ x1.
2) f(x) = Ω(g(x)) if ∃x2 ∈ X and c2 > 0 s.t.
f(x) ≥ c2g(x), ∀x ≥ x2.
3) f(x) = Θ(g(x)) if ∃x3 ∈ X and c4 ≥ c3 > 0 s.t.
c3g(x) ≤ f(x) ≤ c4g(x), ∀x ≥ x3.
We will also need a Bachmann-Landau notation for two
variable functions [24, Section 3.5]:
Definition 3. Let u : X × Y → R≥0 and v : X × Y → R≥0
be two non-negative real-valued functions, both defined on the
Cartesian product of two right-sided sets X and Y . We state
1) u(x, y) = O(v(x, y)) if ∃M ∈ R and c1 > 0 s.t.
u(x, y) ≤ c1v(x, y), ∀x, y ≥M .
2) u(x, y) = Ω(v(x, y)) if ∃M ∈ R and c2 > 0 s.t.
u(x, y) ≥ c2v(x, y), ∀x, y ≥M .
3) u(x, y) = Θ(v(x, y)) if ∃M ∈ R and c4 ≥ c3 > 0 s.t.
c3v(x, y) ≤ u(x, y) ≤ c4v(x, y), ∀x, y ≥M .
We will often apply Definitions 2 and 3 in the limit as
bit-error probability Pe → 0, where the definitions can be
interpreted as applied to a function with an argument 1Pe as
it diverges to ∞. All logarithm functions log(·) are natural
logarithms unless otherwise stated.
A. Communication channel model
We assume the communication between transmitter and
receiver takes place over an AWGN channel with fixed at-
tenuation. The transmission strategy uses BPSK modulation,
and a (dv, dc)-regular binary LDPC code of design rate
R = 1 − dvdc [25] (which is assumed to equal the code rate).
2Bounded intervals that are open on the right such as (−1, 0) or [0, 5) are
also right-sided sets. Definition 2 can still be applied to functions restricted
to such sets, but we will not consider such functions in this paper.
The blocklength of the code is denoted by n, and the number
of source bits is denoted by k = nR. The decoder performs a
hard-decision on the observed channel outputs before starting
the decoding process, thereby first recovering noisy codeword
bits transmitted through a Binary Symmetric Channel (BSC)
of flip probability p0 = Q
(√
2EsN0
)
. Here, Es is the input
energy per channel symbol and N02 is the noise power. Q (·)
is the tail probability of the standard normal distribution,
Q(x) = 1√
2pi
∫∞
x
e
−u2
2 du. The transmit power PT is assumed
to be proportional to EsN0 , modeling fixed distance and constant
attenuation wireless communication. Explicitly we assume
Es
N0
= ηPT for some constant η > 0. Using known bounds
on the Q-function [26], e
−x2/2√
2pi(x+1/x)
≤ Q(x) ≤ e−x
2/2√
2pix
:
e−ηPT
√
4piηPT +
√
pi
ηPT
≤ p0 ≤ e
−ηPT
√
4piηPT
. (1)
The focus of this paper is on the analysis of the “total”
power required to communicate on the above channel, as the
target average bit-error probability Pe → 0. Our simplified
notion of total power is defined below.
Definition 4. The total power, Ptotal, consumed in communi-
cation across the channel described in II-A is defined as
Ptotal = PT + PDec, (2)
where PT and PDec are the power spent in transmission and
decoding, respectively.
The channel model helps analyze the transmit power com-
ponent in (2), but a model for decoding power is also needed.
In the next section, we provide models and assumptions for
decoding algorithms and implementations that are used in the
paper. We allow PT and PDec to be chosen depending on Pe, η,
and the coding strategy. Throughout the paper, the minimum
total power for a strategy is denoted by Ptotal,min and the
optimizing transmit power by P ∗T .
B. Decoding algorithm assumptions
The general theoretical results of this paper (Lemma 2,
Theorem 3) hold for any iterative message-passing decoding
algorithm (and any number of decoding iterations) that satis-
fies “symmetry conditions” in [25, Def. 1] (which allow us
to assume that all-zero codeword is transmitted). Thus, each
node only operates on the messages it receives at its inputs.
We note that the sum-product algorithm [27], the min-sum
algorithm [28], Gallager’s algorithms [29], and most other
message-passing decoders satisfy these assumptions. For the
constructive results of this paper (Corollary 1, Corollary 2,
Theorem 4, Theorem 5) we focus on the two decoding
algorithms originally proposed in Gallager’s thesis [29], that
are now called “Gallager-A” and “Gallager-B” [25]. For these
results, we will use density-evolution analysis [9] to analyze
the performance3, for which we define the term “independent
3In practice, decoding is often run for a larger number of iterations because
at large blocklengths, bit-error probability may still decay as the number of
iterations increase. In that case, density-evolution does not yield the correct
bit-error probability, as it will vary based on the code construction [30].
4iterations” as follows:
Definition 5. An independent decoding iteration is a decoding
iteration in which messages received at a single variable or a
check node are mutually independent.
We will denote number of independent iterations4 that an
algorithm runs as Niter. This quantity is constrained by the
girth [25] of the code, defined as the length of the shortest
cycle in the Tanner graph of the code [31] as follows: for a
code with girth g, the maximum value of Niter is b g−24 c.
C. VLSI model of decoding implementation
Theoretical models for analyzing area and energy costs of
VLSI circuits were introduced several decades ago in computer
science. These include frameworks such as the Thompson [12]
and Brent-Kung [32] models for circuit area and energy
complexity (called the “VLSI models”), and Rent’s rule [33],
[34]. Our model for the LDPC decoder implementation in this
paper is an adaptation of Thompson’s model [12], and it entails
the following assumptions:
1) The VLSI circuit includes processing nodes which per-
form computations and store data, and wires which
connect them. The circuit is placed on a square grid
of horizontal and vertical wiring tracks of finite width
λ > 0, and contact squares of area λ2 at the overlaps of
perpendicular tracks.
2) Neighboring parallel tracks are spaced apart by width λ.
3) Wires carry information bi-directionally. Distinct wires
can only cross orthogonally at the contact squares.
4) The layout is drawn in the plane. In other words,
the model does not allow for more than two metal
layers for routing wires in the manner that modern IC
manufacturing processes do (see Section II-G1).
5) The processing nodes in the circuit have finite memory
and are situated at the contact squares of the grid. They
connect to wires routed along the grid.
6) Since wires are routed only horizontally and vertically,
any single contact has access to a maximum of 4
distinct wires. To accommodate higher-degree nodes, a
processing element requiring x external connections (for
x > 4) can occupy a square of side-length xλ on the
grid, with wires connecting to any side. No wires pass
over the large square.
λ models the minimum feature-size which is often used
to describe IC fabrication processes. We refer to this model
as Implementation Model (λ). The decoder is assumed to be
implemented in a “fully-parallel” manner [8], i.e. a processing
node never acts as more than one vertex in the Tanner
graph [31] of the code. Each variable-node and check-node of
an LDPC code is therefore represented by a distinct processing
node in the decoding circuit. As an example, Fig. 2(a) shows
the Tanner graph for a (7, 4)-Hamming code and Fig. 2(b)
shows a fully-parallel layout of a decoder for the same code.
In Sections II-E, II-F we will describe two models5 of energy
4In our constructive results, we constrain the decoder to only perform
independent iterations. Thus, the number of independent iterations is the
same as the number of iterations for those results, but it emphasizes on the
requirement on the code to ensure that the girth is sufficiently large.
V1 V5
V3
V6
V4
V7
V2
C1
C2
C3
V1
C1
C2
C3
(a) (b)
Legend:
Variable	Node
Check	Node
Wire
scale
λ
V2
V3
V4
V5
V6
V7
Fig. 2: The Tanner graph (a) of a (7, 4)-Hamming code and a fully parallel
decoder (b) drawn according to Implementation Model (λ). Each vertex in
the Tanner graph corresponds to a processing node in the layout and each
edge in the Tanner graph corresponds to a wire connecting distinct nodes.
consumption for the VLSI decoder.
D. Time required for processing
In order to translate the model of II-E to a power model,
we need the time required for computation (the computation
time is measured in seconds and is different from the number
of algorithmic iterations). The computations are assumed to
happen in clocked iterations, with each iteration consisting
of two steps: passing of messages from variable to check
nodes, and then from check to variable nodes. If the decoding
algorithm requires the exchange of multi-bit messages, we
assume the message bits can be passed using a single wire.
We denote the decoding throughput (number of source bits
decoded per second) by Rdata. Because a batch of k source
bits are processed in parallel, the time available for processing
is Tproc = kRdata seconds.
E. Processing node model of decoding power
Definition 6 (Node Model (ξnode)). The energy consumed
in each variable or check node during one decoding iteration
is Enode. This constant can depend on λ, dv and dc. The
total number of nodes at the decoder6 is nnodes = n + (n −
k) = 2n − k. The total energy consumed in τiter decoding
iterations is Enodes = Enodennodesτiter. The decoding power
is Pnodes = EnodesTproc =
Enode(2n−k)τiter
k Rdata = ξnodeτiter.
Here, ξnode = Enode
(
2
R − 1
)
Rdata =
Enode(dv+dc)
(dc−dv) Rdata.
Note that τiter need not be tied to Niter.
This model assumes that the entirety of the decoding energy
is consumed in processing nodes, and wires require no energy.
In essence, this model is simply counting the number of
operations performed in the message-passing algorithm. The
next energy model complements the node model by accounting
for energy consumed in wiring.
F. Message-passing wire model of decoding power
Definition 7 (Wire Model (ξwire)). The decoding power is
Pwires = Cunit−areaAwiresV 2supplyfclock, where Cunit−area is
the capacitance per unit-area of a wire, Vsupply is the supply
5Both models can also be used simultaneously. However, for simplicity, we
present the results for the two models separately.
6In practice, many decoder implementations actually contain more than
n − k check nodes in order to break up small stopping-sets in the code.
However, we do not consider such decoders in this paper.
5voltage of the circuit, fclock is the clock-frequency of the
circuit, and Awires is the total area occupied by the wires
in the circuit. The parameters Cunit−area and Vsupply are
technology choices that may depend on λ, dv , and dc. The
parameter fclock also may depend on λ, dv , dc, Rdata, and
the decoding algorithm. For simplicity, we write ξwire =
Cunit−areaV 2supplyfclock and Pwires = ξwireAwires.
Wires in a circuit consume power whenever they are
“switched,” i.e., when the message along the wire changes
its value7. The probability of wire switching in a message-
passing decoder depends on the statistics of the number of
errors in the received word. These statistics depend on the flip-
probability of the channel, which is controlled by the transmit
power. Further, as decoding proceeds the messages also tend to
stabilize, reducing switching and hence the power consumed
in the wires. Activity-factor [19] could therefore be introduced
in the Wire Model via a multiplicative factor between 0 and
1 that depends on PT , η, and the decoding algorithm, but
modeling it accurately would require a very careful analysis.
G. On modern VLSI technologies and architectures
The VLSI model of Section II-C and the assumptions
made about the decoding architecture may seem pessimistic
compared to the current state-of-the-art. However, in this
section we justify our choices by explaining how many of
the architecture and technology optimizations that are helpful
in current practice have no impact on the conclusions derived
by our theoretical analysis.
1) Multiple routing layers: Modern VLSI technologies
allow for upwards of 10 metal layers for routing wires [35].
While this helps significantly in reducing routing congestion
in practice (i.e., at finite blocklengths and non-vanishing error-
probabilities), it has no impact on asymptotic bounds on total
power. As proved in [12, Pages 36-37], for a process with L
routing layers, the area occupied by wires is at least AwiresL2 ,
where Awires is the area occupied by the same circuit when
only one metal layer is used. As long as L cannot grow
with the number of vertices in the graph (it would be very
unrealistic to assume it can), it has only a constant impact on
wiring area lower bounds (see Lemmas 5, 6) and no impact
(since one can always restrict routing to a single layer) on
upper bounds (see Lemma 7). It will become apparent later in
the paper therefore, that multiple routing layers do not effect
any of the theoretical results we derive.
On the other hand, having multiple active layers with
fine-grained routing between layers can lead to asymptotic
reductions in wiring area for some circuits [36]. However,
as it relates to practice, this is far beyond the reach of any
commercial foundry in existence today. Methods for designing
and fabricating such circuits (which rely on emerging nan-
otechnologies and emerging non-volatile memories [37]) are
only now starting to be considered in research settings.
2) Architectural optimizations: Fully parallel, one clock-
cycle per-iteration decoders are not commonly used in prac-
tice. Instead, serialization by dividing the number of physical
7Switching consumes energy because wires act as capacitors that need to be
charged/discharged. If voltage is maintained, little additional energy is spent.
nodes in the circuit by a constant factor and using time-
multiplexing to cut down on wiring is often performed [4],
[8]. This also requires a corresponding multiplication for the
clock-frequency fclock of the circuit to maintain the same data-
rate. Recall, that dynamic power consumed in wires is pro-
portional to CwiresV 2supplyfclock [19]. While decrease in wire
capacitance may allow the supply to be scaled down (leading
to a reduction in power) without compromising timing, it is
not possible to scale it down indefinitely, since transistors have
a nonzero subthreshold slope [38, Section 2]. In other words,
once a lower limit on supply voltage is reached, even if Cwires
can be made to decay on the order of 1n , one would no longer
achieve power savings due to the corresponding increase in
fclock. Thus, behavior of total power in the large blocklength
limit will remain unchanged. Such architectural optimizations
do however, have a big impact in practice (e.g., at finite-
blocklengths) since changes in constants matter then.
3) Leakage power: Later in the paper (Sections IV-C, V-C),
we will compare bounds on total power under the Node and
Wire Models. It will turn out that the two models lead to very
different insights, and the Wire Model results appear far more
pessimistic. Which model then is closer to reality? It turns out
that the Node Model is actually very optimistic. It assumes
that each node consumes only constant energy per-iteration,
irrespective of the clock period. From a circuit perspective,
this is equivalent to assuming that the power consumption
inside nodes is entirely dynamic [19], as the energy per-
iteration does not increase with the clock-period. This is far
from the reality in modern VLSI technologies. Transistors are
not perfect switches [39], and every check-node and variable-
node will consume a constant amount of leakage power while
the decoder is on, regardless of clock period and switching
activity. It is easy to see then, that even if the transistor leakage
is very small, the decoding power must scale as Ω (n). For
instance, even if the architecture is highly serialized, there is
still leakage in each of the Θ (n) sequential elements (e.g.,
flip-flops, latches, or RAM cells) needed to store messages.
It will become apparent later in the paper that this simple
analysis is enough to establish identical conclusions to the
lower bounds of Theorems 3, 4, and 5. Thus, the asymptotics
of total power under the Wire Model should be viewed as
much better predictions of what would actually happen inside
the circuit at infinite blocklengths.
III. PRELIMINARY RESULTS
In this section, we provide some preliminary results that
will be useful in Sections IV and V. These include general
bounds on the blocklength of regular-LDPC codes and bounds
on the minimum number of independent iterations needed for
Gallager decoders to achieve a specific bit-error probability.
A. Blocklength analysis of regular-LDPC codes
Lemma 1. For a given girth g of a (dv, dc)-regular LDPC
code, a lower bound on the blocklength n is
n ≥ [(dv − 1) (dc − 1)]b
g−2
4 c , (3)
and an upper bound on the blocklength is given by
n ≤ 2(dv + dc)dvdc(2dvdc + 1) 34 g. (4)
6Proof: For the lower bound, see [14, Appendix I], and
for the upper bound, see [14, Claim 2].
Lemma 2. For a (dv, dc)-regular binary LDPC code decoded
using any iterative message-passing decoding algorithm for
any number of iterations, the blocklength n needed to achieve
bit-error probability Pe is
n =

Ω


(
dv−2
dv(dv−1)
)2
log 1Pe
(1 + 9pi)ηPT

1+
log(dc−1)
log(dv−1)
2
 dv ≥ 3.
Ω
((
1
Pe
) 1
ηPT (1+9pi)(2+ 2log(dc−1) )
)
dv = 2.
Here, η > 0 is the constant attenuation in the AWGN channel
(see Section II-A).
Proof: See Appendix A.
PROOF OUTLINE: We use a technique for the finite-length
analysis of LDPC codes from [40]. First, the pairwise error-
probability for any iterative message-passing decoder is lower
bounded in terms of n, dv , dc, and ηPT using an expression for
the minimum pseudoweight (see Appendix A for definition)
of the code. Next, due to a simple relationship between
bit-error probability and pairwise error-probability for binary
linear codes over memoryless binary-input, output-symmetric
channels, the bit-error probability can be lower bounded in
terms of n, dv , dc, and ηPT . Finally, algebraic manipulations,
an application of (1), and an application of Definition 3
complete the proof.
B. Approximation analysis of Gallager decoding algorithms
In this section, we bound the number of independent de-
coding iterations required to attain a specific bit-error prob-
ability with Gallager decoders. These bounds are used in
Sections IV, V-B to prove achievability results for total power.
Lemma 3. The number of independent decoding iterations
Niter needed to attain bit-error probability Pe with Gallager-
A decoding is
Niter =

Θ
(
log
1
Pe
)
if PT is held constant.
Θ
(
log 1Pe
ηPT
)
if PT is not held fixed.
Here, η > 0 is the constant attenuation in the AWGN channel
(see Section II-A).
Proof: See Appendix B.
PROOF OUTLINE: We first define (based on the decoding
threshold over the BSC [25]) appropriate right-sided sets for
analyzing the asymptotics of Niter as a function of 1Pe and PT .
Then, we apply a first-order Taylor expansion to the recurrence
relation for bit-error probability under independent iterations
of Gallager-A decoding from [25, Eqn. (6)] and carefully
bound the approximation error. We then show that for small
enough Pe or large enough PT , the approximation error can
be bounded by a multiplicative factor between 12 and 1. After
some algebraic manipulations and an application of (1), we
apply Definition 2 to establish the first case and Definition 3
to establish the second case.
Lemma 4. The number of independent decoding iterations
Niter needed to attain bit-error probability Pe with a Gallager-
B decoder with variable node degree dv ≥ 4 is given by
Niter =

Θ
(
log log
1
Pe
)
if PT is held constant.
Θ
 log log 1PeηPT
log dv−12
 if limPe→0 PTlog 1Pe = 0.
Here, η > 0 is the constant attenuation in the AWGN channel
(see Section II-A).
Proof: See Appendix C. Importantly, this holds only if
dv ≥ 4, otherwise Gallager-A and Gallager-B are equivalent.
Note that in the second case, we assume PT is a function
of 1Pe , so both expressions should be interpreted with Def-
inition 2. Further, little generality is lost by the necessary
condition for the second case, since uncoded transmission
requires transmit power Θ
(
log 1Pe
)
(see (1)).
PROOF OUTLINE: We follow exactly the same steps as the
proof of Lemma 3, but instead use a higher-order Taylor
expansion of the recurrence relation for bit-error probability
under Gallager-B decoding from [29, Eqn. 4.15].
IV. ANALYSIS OF ENERGY CONSUMPTION IN THE NODE
MODEL
In this section, we investigate the question: as Pe → 0, how
does the total power under the Node Model (see Section II-E)
scale when Gallager decoders (restricted to independent iter-
ations) are used?
A. Total power analysis for Gallager-A decoding
Corollary 1. The optimal total power under Gallager-A
decoding (restricted to independent iterations) in the Node
Model (ξnode) for a binary (dv, dc)-regular LDPC code is
Ptotal,min = Θ
(√
log
1
Pe
)
which is achieved by transmit power P ∗T = Θ
(√
log 1Pe
)
.
Proof: Applying Lemma 3 to the Node Model, if PT
is held constant even as Pe → 0, the power consumed by
decoding is Θ
(
log 1Pe
)
. Since PT is constant, the total power
is also Ptotal,bdd PT = Θ
(
log 1Pe
)
. If instead PT is allowed
to grow arbitrarily, the total power is given by
Ptotal = PT + PDec = Θ
(
PT +
log 1Pe
ηPT
)
. (5)
Thus, optimizing the scaling behavior of the total power over
7transmit power functions PT
Ptotal,min = min
PT
Θ
(
PT +
log 1Pe
ηPT
)
= Θ
(√
log
1
Pe
)
, (6)
with optimizing transmit power P ∗T = Θ
(√
log 1Pe
)
.
B. Total power analysis for Gallager-B decoding
Corollary 2. The optimal total power under Gallager-B
decoding (restricted to independent iterations) in the Node
Model (ξnode) for a binary (dv, dc)-regular LDPC code is
Ptotal,min = Θ
(
log log
1
Pe
)
,
which is achieved by transmit power P ∗T = Θ (1).
Proof: If PT satisfies the condition stated in the second
case of Lemma 4, the total power in the Node Model is
Ptotal = PT + PDec = Θ
PT + log log
1
Pe
ηPT
log dv−12
 . (7)
Minimizing the scaling behavior of (7), the optimizing transmit
power is P ∗T = Θ (1). The optimal total power is then
Ptotal,min = Θ
(
log log
1
Pe
)
. (8)
In this case the optimizing transmit power is bounded even as
Pe → 0.
C. Comparison with fundamental limits
Can we reduce the asymptotic growth of total power under
the Node Model via a better code or a more sophisticated
decoding algorithm? After all, we limited our attention to reg-
ular LDPCs and simple one-bit message-passing algorithms.
It was shown in [6] that under the Node Model and a fully-
parallelized decoding implementation such as Implementa-
tion Model (λ), the optimal total power is lower bounded
by Ω
(
log log 1Pe
)
, matching Corollary 2. In fact, using a
code which performs close to Shannon capacity can even
reduce efficiency for this strategy: if a capacity-approaching
LDPC code is used instead of a regular LDPC code, the
infinite-blocklength performance under the Gallager-B decod-
ing algorithm equals that of regular-LDPCs with Gallager-
A decoding. In other words, the bit-error probability decays
only exponentially (and not doubly-exponentially) with the
number of iterations under Gallager-B decoding if degree-2
variable nodes are present [41], and [42] shows that degree-2
variable nodes are required in order to achieve capacity (the
fraction of degree-2 variable nodes required to attain capacity
is characterized in [42]). Thus, rather than searching for an
irregular code that approaches capacity, an engineer might
be better off using a simpler regular code that approaches
fundamental limits on total power.
V. ANALYSIS OF ENERGY CONSUMPTION IN THE WIRE
MODEL
A. Bounds on wiring area of decoders
To make use of the energy model of Section II-F, we must
characterize the total wiring area of the decoder. We rely on
techniques for upper and lower bounds on the total wire area
obtained for different computations in [12], [43], [44], [45].
We first introduce some graph-theoretic concepts that will
prove useful in obtaining similar bounds for our problem.
1) Lower bound on wiring area: We first provide a trivial
lower bound on the wiring area of the decoder for any regular-
LDPC code implemented in Implementation Model (λ).
Lemma 5. For a (dv, dc)-regular LDPC code of blocklength
n, the wiring area Awires under Implementation Model (λ) is
Awires ≥ λ2dvn.
Proof: There are dvn wires. Each wire has width λ and
minimum length λ (no two wires overlap completely).
In his thesis [45], Leighton utilizes the crossing number (a
property first defined by Tura´n [46]) of a graph as a tool for
obtaining lower bounds on the wiring area of circuits. Crossing
numbers continue to be of interest to combinatorialists and
graph-theorists, and many difficult problems on finding exact
crossing numbers or bounds for various families of graphs
remain open [47]. We use the following two definitions to
introduce this property.
Definition 8 (Graph Drawing). A drawing of a graph G is a
representation of G in the plane such that each vertex of G is
represented by a distinct point and each edge is represented by
a distinct continuous arc connecting the corresponding points,
which does not cross itself. No edge passes through vertices
other than its endpoints and no two edges are overlapping for
any nonzero length (they can only intersect at points).
Definition 9 (Crossing Number). The crossing number of a
graph G, cr(G), is the minimum number of edge-crossings
over all possible drawings of G. An edge-crossing is any point
in the plane other than a vertex of G where a pair of edges
intersects.
For any graph G (e.g., the Tanner graph of an LDPC code),
the wiring area of the corresponding circuit under Implemen-
tation Model (λ) is lower bounded as Awires ≥ λ2cr(G). This
is due to the fact that any VLSI layout of the type described
in Section II-C can be mapped to a drawing of G in the
sense of Definition 8, by simply replacing each processing
node with a point in the plane and replacing each wire by
line segments connecting two points. Therefore, the minimum
number of wire crossings of any layout of G is cr(G). Since
every crossing has area λ2, the inequality follows. We now
need lower bounds on the crossing number of a computation
graph. In this paper, we make use of the following result [48]
that improves on earlier results [49], [50], [45] and allows us
to tighten Lemma 5 for some codes.
Theorem 1 (Pach, Spencer, To´th [48]). Let G = {V,E} be a
graph with girth g > 2` and |E| ≥ 4 |V |. Then cr (G) satisfies
cr (G) ≥ k` |E|
`+2
|V |`+1
,
where k` = Ω
(
1
`222`+3
)
[51].
We now obtain lower bounds on wiring area given a lower
8bound on the number of independent iterations the code
allows.
Lemma 6 (Crossing Number Lower Bound on Awires). For
a (dv, dc)-regular LDPC code that allows for at least N iter
independent decoding iterations, the wiring area Awires of a
decoder in Implementation Model (λ) is
Awires =

Ω
(
eγN iter
)
for any dv , dc
Ω
(
e
N iter log
2d2vd
2
c
(dv+dc)2
)
if dvdc ≥ 4(dv + dc).
Here, γ ∈ [log [(dv − 1) (dc − 1)], 3 log(2dvdc + 1)] is a con-
stant that depends on the code construction.
Proof: Let C be a (dv, dc)-regular LDPC code that allows
for at least N iter independent decoding iterations. Since the
girth g of C must then satisfy b g−24 c ≥ N iter, g > 4N iter−2.
From Lemma 1 then, the blocklength n of the code C is
n = Ω
(
eγN iter
)
,
where γ ∈ [log ((dv − 1) (dc − 1)) , 3 log(2dvdc + 1)]. And
from Lemma 5 we then have Awires = Ω
(
eγN iter
)
. Now,
assume dvdc ≥ 4(dv + dc). This requires that dc > dv ≥ 5.
Let VC , EC denote the sets of vertices and edges in the Tanner
graph of C. The sizes are |EC | = ndv and |VC | = n
(
1 + dvdc
)
.
We then carry out the following algebra
dvdc ≥ 4(dv + dc)⇒ ndv ≥ 4n
(
1 +
dv
dc
)
.
Hence, |EC | ≥ 4 |VC |. Using the fact that g > 4N iter − 2, we
apply Theorem 1
Awires = Ω
 λ2
(2N iter − 1)2 42N iter+ 12
(ndv)
2N iter+1(
n
(
1 + dvdc
))2N iter

= Ω
(
λ2
(
eγ
16
)N iter
(2N iter − 1)2
(
dvdc
dv + dc
)2N iter)
. (9)
Then, because eγ ≥ (dv − 1)(dc − 1) = dvdc − (dv + dc) +
1
dvdc≥4(dv+dc)≥ 3(dv + dc) + 1, and because dc > dv ≥ 5, we
must have eγ ≥ 34. Substituting into (9),
Awires = Ω
(
λ2
(
34
16
)N iter
(2N iter − 1)2
(
dvdc
dv + dc
)2N iter)
= Ω
(
λ22N iter
(
dvdc
dv + dc
)2N iter)
,
and changes-of-base complete the proof.
2) Upper bound on wiring area: Since the total circuit area
is always an upper bound on the area occupied by wires, we
use an upper bound on the circuit area to obtain the following
upper bound on the wiring area based on the maximum number
of independent iterations that the code allows for.
Lemma 7 (Upper bound on Awires). For a (dv, dc)-regular
LDPC code that allows for no more than N iter independent
decoding iterations, the decoder wiring area Awires is
Awires = O
(
e2γN iter
)
.
Here, γ ∈ [log ((dv − 1) (dc − 1)), 3 log(2dvdc + 1)] is a
constant that depends on the code construction.
Proof: Let C be a (dv, dc)-regular LDPC code that allows
for no more than N iter independent decoding iterations. Since
the girth g of C must then satisfy b g−24 c ≤ N iter,
g < 4N iter + 6.
From Lemma 1, the blocklength of any such code can be upper
bounded in the order of N iter as
n = O
(
eγN iter
)
, (10)
where γ ∈ [log ((dv − 1) (dc − 1)) , 3 log(2dvdc + 1)]. Then,
consider a “collinear” VLSI layout [52] of the Tanner graph of
C which satisfies all the assumptions described in Section II-C.
Arrange all variable-nodes and check-nodes in the graph
along a horizontal line, leaving λ spacing between consecutive
nodes. The total length of this arrangement is then O(n).
Allocate a unique horizontal wiring track for each of the ndv
edges in the Tanner graph. Then, every connection in the graph
can be made with two vertical wires (one from each endpoint)
which connect to the opposite ends of the dedicated horizontal
track. The total height of this layout is then O(n), and the total
area is O(n2). An example collinear layout is given in Fig. 3.
Substituting (10) for n, we obtain the bound.
V1 V5V3 V6V4V7 V2C1C2C3
Legend:
Variable	Node
Check	Node
Wire
scale:
λ
Fig. 3: An example collinear layout for the same (7, 4) Hamming Code
depicted in Fig. 2.
We note that this upper bound is crude since the
O
(
|V |2
)
layout construction applies for any graph G =
{V,E} which satisfies |E| = O (|V |). A simple
proof [45] shows that one can create a layout of area
O ((|V |+ cr (G)) log (|V |+ cr (G))) for any graph. Thus, an
algorithm for drawing semi-regular graphs which can be
proven to yield sub-quadratic (in n) crossing numbers would
yield energy-efficient codes and decoders with short wires.
B. Total power minimization for the wire model
We now present analogues of results in Section IV, where
we instead consider decoding power described by the Wire
Model of Section II-F. We translate the wiring area bounds of
Section V-A to power bounds.
9Theorem 2 (Asymptotic bounds on Pwires). Under Implemen-
tation Model (λ) and Wire Model (ξwire), the decoding power
Pwires for a (dv, dc)-regular binary LDPC code that allows
for exactly Niter independent iterations is bounded as
Pwires =

Ω
(
eγNiter
)
for any dv , dc
Ω
(
e
Niter log
2d2vd
2
c
(dv+dc)2
)
if dvdc ≥ 4(dv + dc)
O (e2γNiter) for any dv , dc
where γ ∈ [log ((dv − 1) (dc − 1)), 3 log(2dvdc + 1)] is a
constant that depends on the code construction..
Proof: The result is a straightforward conclusion from
Lemma 6 and Lemma 7 applied in Definition 7.
Next, we present a general lower bound on the scaling
behavior of total power under the Wire Model for any binary
regular-LDPC code, decoded using any iterative message-
passing decoding algorithm, for any number of iterations.
Theorem 3 (Lower bound for regular-LDPCs). The optimal
total power in the Wire Model (ξwire) for a binary (dv, dc)-
regular LDPC code with any iterative message-passing decod-
ing algorithm to achieve bit-error probability Pe is
Ptotal,min = Ω
log 11+ 21+ log(dc−1)log(dv−1) 1
Pe
 .
Further, if PT is held fixed as Pe → 0 the total power diverges
as Ω
(
logy 1Pe
)
where y > 1, which dominates the power
required by uncoded transmission.
Proof: See Appendix D.
PROOF OUTLINE: We first substitute the result of Lemma 2
into Lemma 5, and then use the resulting lower bound on
decoding power under the Wire Model in (2). Using simple
calculus, we then derive the asymptotics of the transmit power
function that minimizes the total power, and plug it back into
Lemma 2 and (2) to obtain the result.
1) Gallager-A decoding:
Theorem 4. The optimal total power under Gallager-A decod-
ing (restricted to independent iterations) in the Wire Model
(ξwire) for a binary (dv, dc)-regular LDPC code to achieve
bit-error probability Pe is
Ptotal,min = Θ
(
γ
η log
1
Pe
log log 1Pe
)
,
Where η > 0 is the constant attenuation in
the AWGN channel (Section II-A) and γ ∈
[log ((dv − 1) (dc − 1)), 3 log(2dvdc + 1)] is a constant
that depends on the code construction. Further, if PT is held
fixed as Pe → 0, then total power diverges as Ω
(
Poly
(
1
Pe
))
,
which is an exponential function of the power required by
uncoded transmission.
Proof: See Appendix E.
PROOF OUTLINE: We first substitute the results of Lemma 3
into Theorem 2, and then plug in the resulting bounds on
decoding power in (2). Using some calculus, we then derive
the best-case and worst-case asymptotics of the transmit power
function that minimizes the total power, and show that there
is at most a constant gap between the two. We then plug
the optimizing transmit power back into Lemma 3 and (2)
to obtain the result.
2) Gallager-B decoding:
Theorem 5. The optimal total power under Gallager-B decod-
ing (restricted to independent iterations) in the Wire Model
(ξwire) for a binary (dv, dc)-regular LDPC code to achieve
bit-error probability Pe is bounded as
Ptotal,min =

Ω
(
log
2
3
1
Pe
)
dvdc
(dv+dc)
< 4
Ω
(
log
31
40
1
Pe
)
dvdc
(dv+dc)
≥ 4
O
(
log
1
1+
log(dv−1)−log 2
6 log(2dvdc+1)
1
Pe
)
any dv , dc
Further, if PT is held fixed as Pe → 0, then total power di-
verges as Ω
(
log2.48 1Pe
)
, which is a super-quadratic function
of the power required by uncoded transmission.
Proof: See Appendix F.
PROOF OUTLINE: We first substitute the results of Lemma 4
into Theorem 2, and then plug in the resulting bounds on
decoding power in (2). We then use algebraic manipulations
to bound the exponents in Theorem 2. Next, we use calculus to
derive the best-case and worst-case asymptotics of the transmit
power function that minimizes the total power. We then plug
the optimizing transmit power into Lemma 4 and (2) to obtain
the results.
C. Comparison with fundamental limits
In [13], using a more pessimistic Wire Model8, it is shown
that the total power required for any error-correcting code
and any message-passing decoding algorithm is fundamentally
lower bounded by Ω
(
log
1
3 1
P blke
)
, where P blke is the block-
error probability. Theorem 3 shows that regular-LDPC codes
with iterative message-passing decoders cannot do better than
Ω
(
log
1
2 1
Pe
)
where Pe is bit-error probability, and the ex-
ponent 12 can only be obtained in the limit of large degrees
and vanishing code-rate. Since block-error probability exceeds
bit-error probability, regular-LDPC codes do not achieve fun-
damental limits9 on total-power in the Wire Model.
Theorem 4 is the first constructive result that shows that
coding can (asymptotically) outperform uncoded transmission
in total power for the Wire Model. However, the gap in total
power between the two is merely a multiplicative factor of
log log 1Pe . While Theorem 5 proves that it is possible to
increase the relative advantage of coding to a fractional power
of log 1Pe , the difference between the upper bound and the
power for uncoded transmission is minuscule. The exponent
of log 1Pe in the upper bound is an increasing function of both
8The Wire Model of [13] assumes the power is proportional toAwiresNiter.
Here it is assumed to be simply proportional to Awires.
9Though, this may simply mean the fundamental limits [13] are not tight.
10
dv and dc, approaching 1 as either gets large. Since Gallager-
B decoding requires dv ≥ 4, the smallest exponent for regular
LDPCs occurs when dv = 4 and dc = 5. The numerical value
of the exponent for these degrees is ≈ 0.98, which suggests
little order sense improvement over uncoded transmission.
Hence, the wiring area at the decoder (particularly, how much
better it can be than the bound of Lemma 7) is crucial in
determining how much can be gained by using Gallager-B
decoding instead of uncoded transmission. Further discussion
is provided in Section VII.
VI. CIRCUIT SIMULATION BASED NUMERICAL RESULTS
At reasonable bit-error probabilities (e.g., 10−5) and short
distances (e.g., less than five meters), asymptotic bounds
cannot provide precise answers on which codes to use. For
example, consider the following problem, shown graphically
in Fig. 1b).
Problem 1. Suppose we want to design a point-to-point
communication system that operates over a given channel.
We are given a target bit-error probability Pe, communication
distance r, and system data-rate Rdata that the link must
operate at. Which code and corresponding decoding algorithm
minimize the total (i.e. transmit + decoding) power?
Since the bounds of Sections II-V are derived as Pe → 0,
they may not be applicable to many instances of Problem 1.
In this section we therefore develop a methodology for rapidly
exploring a space of codes and decoding algorithms to answer
specific instances of Problem 1. We focus on one-bit Gallager
A and B [29] and two-bit [53] decoding algorithms, restricting
the number of algorithmic iterations to b g−24 c. Because of
the effort required in implementing or even simulating a
single decoder in hardware, we construct models10 for power
consumed in decoding implementations of different algorithms
based on post-layout circuit simulations for simple check-node
and variable-node circuits. The models developed attempt to
capture detailed physical aspects (e.g., interconnect lengths
and impedance parameters, propagation delays, silicon area,
and power-performance tradeoffs) of implementations, in stark
contrast with their theoretical counterparts of Sections II-V. In
Section VI-C, we use these models to investigate solutions to
some instances of Problem 1.
A. Note on channels and constellation size
To answer Problem 1, additional physical assumptions about
the channel (e.g., bandwidth, fading, path-loss, temperature,
constellation size) are required in comparison to the model of
Section II-A. The channel is still assumed to be AWGN with
fixed attenuation. However, while Section II-A assumes BPSK
modulation for all transmissions, due to the introduction of a
data-rate constraint and fixed passband bandwidth W (for fair
comparison), the constellation size is required to vary based on
the code rate. Explicitly, the transmission strategy is assumed
to use either BPSK or square-QAM modulation, mapping
codeword bits to constellation symbols. We assume that if
10These models have been created in a open-access CMOS library [54] and
are online at [55].
square-QAM modulation is used, the information bits are
mapped onto the constellation signals using a two-dimensional
Gray code as explained in [56, Section III]. We assume the
transmitter signals at a rate of W symbols/s and that the
minimum square constellation size (M ) satisfying the system
data-rate requirement is chosen: M is always the smallest
square of an even integer for which:
M ≥ 2Rdata/(W×R).
For calculating transmit power numbers, the thermal noise
variance used is σ2z = kTW , where k is the Boltzmann
constant (1.38 × 10−23 J/K), and T is the temperature. The
power is assumed to decay according to a power-law path-
loss model 1/rα, where α is the path-loss coefficient. the
received EbN0 is obtained as a function of the system and
channel parameters:
Eb
N0
=
PT
kTW
(
r
λ
)α
log2(M)
, (11)
where λ is the wavelength of transmission at center frequency
fc in Hz (λ = 3 × 108/fc). The channel flip probability for
BPSK transmissions under this model is p0 = Q
(√
2Eb
N0
)
,
and the channel flip probability for M -ary square QAM is [56,
Section III.B]:
p0 =
1
log2(
√
M)
log2(
√
M)∑
k=1
(1− 1
2k
)
√
M−1∑
j=0
[
(−1)
⌊
j×2k−1√
M
⌋
×
(
2k−1 −
⌊
j × 2k−1√
M
+
1
2
⌋)
× 2Q
(2j + 1)
√
3EbN0 log2(M)
(M − 1)
]. (12)
Also, note that the asymptotic bounds derived in Sections II-
V remain unchanged, even if we substitute M -ary QAM for
BPSK as the signaling constellation. This follows from the
fact that the RHS of equation (12) is a linear combination of
Q (·) functions with argument linearly proportional to
√
Eb
N0
.
Hence, even for M -ary QAM, p0 = Θ
(
e−φPT√
4piφPT
)
for some
constant φ 6= η (see (1)). Since the difference is merely a
constant, the asymptotic analysis of Sections II-V holds.
For the results presented in Section VI-C, we assume the
decoding throughput is required to be equal to Rdata = 7
Gb/s. We assume a channel center frequency of fc = 60 GHz
and bandwidth of W = 7 GHz. The temperature T is 300 K.
The distances considered are much larger than the wavelength
of transmission (≈ 0.5 cm) so the “far-field approximation”
applies.
B. Simulation-based models of LDPC decoders
Given a code, decoding algorithm, and desired data-rate,
calculating the required decoding power is a difficult task.
Even within the family of regular LDPC codes and specified
decoding algorithms, the decoder can be implemented in myr-
iad ways. The choice of circuit architecture, implementation
technology, and even process-specific transistor options can
11
have a significant impact on the decoding power [8], [4]. A
comprehensive solution to Problem 1 requires optimization
of total power over not just super-exponentially many codes
and decoding algorithms, but also all decoder architectures,
implementation technologies, and process options, which could
be an impossibly hard problem. The models we present
here are based on simulations of synchronous, fully-parallel
decoding architectures in a 32/28nm CMOS process with a
high threshold voltage, and are used in Section VI-C to obtain
insights on the nature of optimal solutions. We believe that
incorporating more models of this nature and performing the
resulting optimization could be a good approach to obtain
low total power solutions. We now describe how the model
is generated.
1) Initial post-layout simulations: Our models for arbitrary-
blocklength LDPC decoders are constructed based on circuit
simulations using the Synopsys 32/28nm high threshold volt-
age CMOS process with 9 metal-layers [54]. First, post-layout
simulations of check-node and variable-node circuits for one-
bit and two-bit decoders are performed. The physical area,
power consumption, and critical-path delays of the check-
nodes and variable-nodes are used as the basis for our models.
The CAD flow used is detailed in Appendix G. The next sec-
tion details how these results are generalized to full decoders.
2) Physical model of LDPC decoding: Even within our
imposed restrictions on the LDPC code degrees, girth, and
number of message-passing bits for decoding, constructing a
decoding power model that applies to all combinations of these
code parameters requires some assumptions:
1. Decoders operate at a fixed supply voltage (chosen
as 0.78V: the minimum supply voltage of the timing
libraries included with the standard-cell library).
2. The code design space includes regular-LDPC codes
with variable-node degrees 2 ≤ dv ≤ 6, check-node
degrees 3 ≤ dc ≤ 13, and girths 6 ≤ g ≤ 10.
3. “Minimum-Blocklength” codes (found in [57]) are cho-
sen for a given g, dv, dc. Hence the blocklength is
expressed as a function of these parameters: n(min)g,dv,dc .
4. The decoding algorithm a, is chosen from the set
{A,B, T}, where A, B, T correspond to Gallager-A,
Gallager-B, and Two-bit11 [53] message-passing decod-
ing algorithms, respectively. We use #bits (a) to refer
to the number of message bits used in algorithm a.
We then model the minimum-achievable clock period TCLK,
and maximum-achievable decoding throughput RDec for each
decoder as functions of a, g, dv, dc:
TCLK (a, g, dv, dc) = TVN (a, dv)
+ 2Twire (a, g, dv, dc)
+ TCN (a, dc) (13)
RDec (a, g, dv, dc) =
n
(min)
g,dv,dc
(
1− dvdc
)
b g−24 c × TCLK (a, g, dv, dc)
(14)
In (13), TVN (·, ·) and TCN (·, ·) are critical-path de-
lays through variable and check nodes respectively and
11With fixed decoding algorithm parameters chosen as C = 2, S = 2,
W = 1, for reasons explained in [53, Section II].
Twire (·, ·, ·, ·) is the propagation delay through a single
message-passing interconnect. In essence, (13) formulates
the critical-path delay for the decoder by summing up the
propagation delays of all logic stages traversed in a single
decoding iteration. Details for each component are given in
Appendix H. We model the decoding power as
PDec (a, g, dv, dc) = n
(min)
g,dv,dc
[
PVN (a, dv) +
dvPCN (a, dc)
dc
+ 2dv#bits (a)× Pwire (a, g, dv, dc)
]
.(15)
In (15), PVN (·, ·) and PCN (·, ·) are the power consumed
in individual variable and check nodes respectively, and
Pwire (·, ·, ·, ·) is the power consumed in a single message-
passing interconnect. Note that (15) is a sum of all power
consumed in computations and wires of the decoder (the
coefficients in (15) count the number of occurrences of each
power sink in the decoder). The details of the node power
models are given in Appendix I and the details of the wire
power model are given in Appendix J.
3) Satisfying the communication data-rate: Fixing the sup-
ply voltage for a decoder and using the fastest possible clock
speed only allows for a single decoding throughput. Hence,
parallelism in order to meet the system data-rate requirement
Rdata in Problem 1 is also modeled. For example, two copies
of a single decoder can be used in parallel. Together, they
provide twice the throughput, and require twice the power
of a single decoder. In the corresponding communication
system architecture, two separate codewords are required to
be transmitted at twice the throughput of a single decoder, and
a multiplexer at the receiver must pass a separate codeword to
each of the parallel decoders, which decode the two codewords
independently. Though making such a design choice in prac-
tice would introduce additional hardware and a slight power
consumption overhead, we ignore this cost in our analysis.
In cases where integer multiples of a single decoder’s
throughput do not exactly reach Rdata, we first find the
minimum number of parallel decoders, that when combined,
exceed the required throughput. Calling this minimum number
of decoders Q, we then assume that the clock period of each of
the parallel decoders is increased until the overall throughput
of the parallel combination is exactly Rdata. Explicitly, the
formula to determine this “underclocked” period Tu is:
Tu =
Q× n(min)g,dv,dc
(
1− dvdc
)
b g−24 c ×Rdata
. (16)
Because the decoding power is modeled as inversely propor-
tional to the decoder clock period (see Appendices I-J), we
multiply each individual decoder’s power by the appropriate
scaling factor κ = TCLK(a,g,dv,dc)Tu , and then multiply the result
by the number of parallel decoders to get the total power of
the parallel combination:
Pparallel = Q× PDec(a, g, dv, dc)× κ. (17)
We substitute (14), (16), and carry out some algebra to obtain:
Pparallel = PDec(a, g, dv, dc)× Rdata
RDec(a, g, dv, dc)
. (18)
12
Hence, we assume that any (throughput, power) pair that is a
multiple of the specifications of the original decoder can be
achieved in this manner (with the obvious exception of points
that have negative throughput and power). Therefore, in our
analysis in Section VI-C, we assume the decoding throughput
is exactly Rdata and we use the decoding power numbers
obtained via this interpolation between the modeled points.
4) Comparing different coding strategies: Now, given a
subset of codes and decoders, how should a system designer
jointly choose a code and decoding algorithm to minimize
the total system power? Within the channel model of Sec-
tion VI-A, consider specific instances of Problem 1: let path-
loss coefficient α and Rdata be fixed. Then, for each choice
of (r, Pe), we can compare the required total power for
each combination of code and decoding algorithm modeled
in Section VI-B2, and find the minimizing combination.
C. Example: 60 GHz point-to-point communication
An example plot which shows the minimum achievable total
power for different Pe values at a fixed distance r = 3.2m and
α = 3 is given in Fig. 4. The plot also shows the curve of the
optimizing transmit power, P ∗T , and the Shannon-limit [58] for
the AWGN channel. The horizontal gap between the optimiz-
ing PT curve and the total power curve in Fig. 4 corresponds
to the optimizing decoding power. As Pe decreases, this gap
increases, indicating an increase in the total power-minimizing
decoder’s complexity.
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35
−20
−18
−16
−14
−12
−10
−8
−6
−4
−2
Ptotal,min (Watts)
lo
g 1
0
(P
e
) Shannon
limit
P∗T
Ptotal,min
Fig. 4: A plot of log10(Pe) vs. minimum achievable total power for α = 3
at a fixed distance of r = 3.2m. The Shannon limit for the channel and the
optimizing transmit power are also shown.
1) Joint optimization over code-decoder pairs: The form
of the total power curve varies with communication distance.
For improved understanding, we use two-dimensional contour
plots in the (r, Pe) space to evaluate choices of codes and
decoders, as suggested by Fig. 1b). An example is shown
in Fig. 5, which compares code and decoding algorithm
choices for path-loss coefficient α = 3. In the top plot,
the contours represent regions in the (r, Pe) space where
specific combinations minimize total power, and in the bottom
plot, regions in the (r, Pe) space are divided based on the
value of the minimum total power. The best choices for these
instances of Problem 1 turn out to be rate 12 codes. Lower rate
codes require large constellations for a 7 Gb/s data-rate, thus
requiring large transmit power for the same p0, and higher
rate codes require larger decoding power due to increased
complexity and size of higher degree nodes. Some tradeoffs
between total power and code and decoder complexity can
also be observed in Fig. 5: to minimize total power, algorithm
complexity a should increase with r and code girth g should
increase with decreasing Pe.
Distance (m)
lo
g
1
0
(P
e
)
2 4 6 8 10
−5
−10
−15
−20
n = 42
g = 6
dv = 3
dc = 6
a =A
n = 88
g = 6
dv = 4
dc = 8
a =A
n = 618
g = 10
dv = 3
dc = 6
a =A
n = 618
g = 10
dv = 3
dc = 6
a = T
n = 110
g = 6
dv = 5
dc = 10
a =A
Distance (m)
lo
g
1
0
(P
e
)
 
 
2 4 6 8 10
−5
−10
−15
−20
P
to
ta
l,
m
in
(d
B
m
)
0
5
10
15
20
25
30
35
40
Fig. 5: Contour plots of the optimizing code & decoding algorithm choice
(top) and the minimum total power in dBm (bottom). For these plots, α = 3.
The contours in the top plots are labeled with blocklength n, code girth g,
VN degree dv , CN degree dc, and decoding algorithm a of the optimizing
code and decoder. To interpret the plots, one can choose any point in the (r,
Pe) space and find the best coding strategy (within the search space) in the
top plot and the required total power to implement it in the bottom plot. The
plot is best viewed in color.
How does the inclusion of uncoded transmission as a
possible strategy change the picture? Contour plots with
uncoded transmission included are given in Fig. 6. Comparing
Fig. 6 with Fig. 5, we see that when uncoded transmission is
included, it overtakes areas in the (r, Pe) space where Pe is
high and r is very small. However, Fig. 6 suggests that simple
codes and decoders can still outperform uncoded transmission
at reasonably low Pe and distances of several meters or more.
VII. CONCLUSIONS AND DISCUSSIONS
In this work, we performed asymptotic analysis of the
total (transmit + decoding) power for regular-LDPC codes
with iterative-message passing decoders. While these codes
(with Gallager-B decoding) can achieve fundamental limits
in the Node Model [6], they are unable to do so for the
Wire Model [13]. This suggests that measuring complexity
of decoding by simply counting the number of operations
13
Distance (m)
lo
g
1
0
(P
e
)
2 4 6 8 10
−5
−10
−15
−20
UNCODED BPSK
n = 618
g = 10
dv = 3
dc = 6
a =A
n = 618
g = 10
dv = 3
dc = 6
a = T
Distance (m)
lo
g
1
0
(P
e
)
 
 
2 4 6 8 10
−5
−10
−15
−20
P
to
ta
l,
m
in
(d
B
m
)
−5
0
5
10
15
20
25
30
35
40
Fig. 6: Contour plots of the optimizing code & decoding algorithm choice
(top) and the minimum total power in dBm (bottom), including uncoded
transmission in the optimization space. For these plots, α = 3. The contours
in the top plots are labeled with blocklength n, code girth g, VN degree dv ,
CN degree dc, and decoding algorithm a of the optimizing code and decoder.
To interpret the plots, one can choose any point in the (r, Pe) space and
find the best coding strategy (within the search space) in the top plot and
the required total power to implement it in the bottom plot. The plot is best
viewed in color.
(e.g., [59], [60], [61]) is insufficient for understanding system-
level power consumption. In fact, for the Wire Model, even
achieving order-sense advantage over uncoded transmission
requires that both transmit and decoding power diverge to ∞
as Pe → 0, which calls into question the assumption that one
should fix the transmit power and operate near the Shannon
capacity in order communicate reliably at a low power cost.
However, this analysis also established a result of intellectual
interest: that regular-LDPC codes can achieve an order-sense
improvement in total power over uncoded transmission as
bit-error probability tends to zero. This question only arises
from the total power perspective adopted in this work, and it
suggests that these results are only scratching the surface of a
deeper theory in this direction.
To establish some constructive results, we analyzed two
strategies where the number of decoding iterations is dictated
by the girth of the code. Although this is convenient for
proving asymptotic upper bounds on total power, this is rarely
followed in practice. Typically, combinatorial properties of
the code construction are analyzed and simulations are per-
formed [30] in order to discern the error-probability behavior
in decoding iterations beyond the girth-limit. However, we are
not sure if better asymptotics for total power can be achieved
by merely adding these additional iterations.
Our work highlights an important question that has received
little attention in coding theory literature: design of codes
that have good performance while maintaining small wiring
area (see [62], [63], [64] for some heuristic approaches to
generating Tanner graphs with low wiring complexity). For
wire power consumption, there is a significant gap between
the bounds on power consumed by regular-LDPC codes and
iterative message-passing decoders derived here, and the fun-
damental limits derived in [13]. Nevertheless, even though
regular-LDPC codes might not achieve these fundamental
limits (and the fundamental limits themselves may not be
tight), it is important to investigate wiring complexity of
other coding families, such as Polar codes [61] and Turbo
codes [65].
Recent work of Blake and Kschischang [18] studied the
limiting bisection-width [12] of sequences of bipartite graphs
with the size of the left-partite set tending to infinity, when
the limiting degree-distributions of the left and right partite
sets satisfy a certain sufficient condition [18, Theorem 1]. It
is shown that when sequences satisfying this condition are
generated by a standard uniform random configuration model
(see [18, Section IV] for definition), the resulting graphs have
a super-linear (in the number of vertices) bisection-width in
the limit of the sequence with probability 1. In Corollary 2 and
Section IV. A of [18], the authors show that the Tanner graphs
of all capacity-approaching LDPC sequences as well as some
regular-LDPC sequences generated using this method will
satisfy the sufficient conditions. A super-linear bisection width
for a graph implies that the area of wires in the corresponding
VLSI circuit must scale at least quadratically in the number
of vertices [12]. If using the decoding strategy of Theorem 5
then, such sequences of codes will have minimum total power
that is Θ
(
logm 1Pe
)
, where 0.97 < m < 1, providing little
order-sense improvement over uncoded transmission.
The authors of [18] point out the fact that their result does
not rule out the possibility that there may exist a zero-measure
(asymptotically in n) subset of codes that has sub-quadratic
wiring area. One could try to extend the bisection-width12
approach of [18] to establish a negative result (i.e., prove
that no such zero-measure set exists). To establish a positive
result, one could try the open problem mentioned at the end
of Section V-A2, namely, construct a graph-drawing algorithm
that yields sub-quadratic crossing numbers for (even some
classes of) semi-regular graphs. In any case, a proof is needed
and heuristics such as those used in [64] (even if they work
well in practice) cannot establish guarantees.
The simulation-based estimates of decoding power pre-
sented in Section VI confirm that coding can be useful for
minimizing total power, even at short-distances. For instance,
they predict that regular-LDPC codes with simple message-
passing decoders can achieve lower bit-error probabilities than
uncoded transmission in short distance settings, while still
consuming the same total power (even at distances as low
as 2 meters). However, in these regimes, it is possible that
12The crossing number cr (G) and bisection width bw (G) of a bounded-
degree graph G = {V,E} are related by the inequality cr (G) + Θ (|V |) =
Ω
(
bw2 (G)) [66, Theorem 2.1].
14
“classical” algebraic codes (e.g., Hamming or Reed-Solomon
codes [67]) might be even more efficient, hence, they need to
be examined as well.
Finally, the results of Section VI point to a new problem,
that of “energy-adaptive codes”. The suggestion from these
results is that the code should be adapted to changing error-
probabilities and distances. Can a single code, with a single
piece of reconfigurable decoding hardware, enable adaptation
of transmit and circuit power to minimize total energy? Indeed,
some follow-up work [68] indicates this is possible, and it
could be a promising direction for future work.
ACKNOWLEDGEMENTS
This work was supported in part by the NSF Center for Science
of Information (CSoI) NSF CCF-0939370, as well as a seed grant
on “Understanding Information-Energy Interactions” from the NSF
CSoI. This work is also supported in part by Systems on Nanoscale
Information fabriCs (SONIC), one of the six SRC STARnet Centers,
sponsored by MARCO and DARPA. Grover’s research supported by
NSF CCF-1350314 (NSF CAREER), and NSF ECCS-1343324 (NSF
EARS) awards. We thank Anant Sahai, Subhasish Mitra, Yang Wen,
Haewon Jeong, and Jianying Luo for stimulating discussions, and
Jose Moura for code constructions that we started our simulations
with. We thank the students, faculty, staff and sponsors of the
Berkeley Wireless Research Center and Wireless Foundations group
at Berkeley. In particular, Brian Richards assisted with the simulation
flow and Tsung-Te Liu advised us on modeling circuits. Finally, we
thank the anonymous reviewers whose comments helped us greatly
in improving the manuscript.
APPENDIX A
PROOF OF LEMMA 2
Proof of Lemma 2: First, note that the Ω (·) expression
in Lemma 2 contains two variables: 1Pe and PT . We ana-
lyze blocklength as a function n : [2,∞) × R≥0 → R≥1
(Definition 3). First consider the case where dc > dv ≥ 3.
Because the codes considered are binary and linear, the
channel is memoryless, binary-input, and output-symmetric,
and the decoding computations are symmetric with respect
to codewords, we can assume without loss of generality that
the all-zero codeword was transmitted [25], [40, Page 22].
In [40, Page 37] it is shown that for any binary regular-
LDPC code with dc > dv ≥ 3 used to transmit over an
AWGN channel, the probability that any iterative message-
passing decoder incorrectly decides on pseudo-codeword13 ω
when the all-zero codeword was transmitted is
P0→ω ≥ Q
(√
2
Es
N0
wAWGNp (ω)
)
, (19)
where wAWGNp (ω) is called the AWGN pseudoweight of ω
13Defined in [40] as an error-pattern for a given code C, such that the
lifting of the error-pattern is a codeword of the binary code corresponding to
some finite graph-cover of the Tanner graph of C. It is explained in [40] that
no “locally operating” iterative message-passing decoding algorithm (“locally
operating” subsumes all algorithms satisfying the assumptions of Section II-B)
can distinguish between codewords and pseudo-codewords.
and is defined [40, Definition 31] as
wAWGNp (ω) =

||ω||21
||ω||22
=
(∑
i∈[n] ω
2
i
)2
∑
i∈[n] ω
2
i
if ω 6= 0.
0 if ω = 0.
While the channel model of Section II-A assumed a hard-
decision on the AWGN-channel outputs, (19) holds even when
log-likelihood ratios (which have no loss in optimality) are
used in message-passing [40]. Hence, we can use (19) to obtain
a lower bound for any message-passing decoder. The minimum
pseudoweight wAWGN,minp of a parity-check matrix of a given
code is defined as the minimum AWGN pseudoweight over
all nonzero pseudo-codewords of the code [40, Definition 37].
For any regular-LDPC code of blocklength n with dv ≥ 3,
the minimum AWGN pseudoweight is upper-bounded as [40,
Proposition 49], [69, Theorem 7]:
wAWGN,minp (ω) ≤
(
dv(dv − 1)
(dv − 2)
)2
n
2 log(dv−1)
log(dv−1)(dc−1) . (20)
Therefore, lower bounding the word-error probability Pworde
by the pairwise error-probability and using (20) in (19):
Pworde ≥ P0→ω ≥ Q
√2Es
N0
(
dv(dv − 1)
(dv − 2)
)2
n
2 log(dv−1)
log(dv−1)(dc−1)
 .
(21)
Using our notation from Section II-A, EsN0 = ηPT , and trivially
bounding bit-error probability Pe [70, Eqn. (2)]:
Pe ≥ P
word
e
n
≥
Q
(√
2ηPT
(
dv(dv−1)
(dv−2)
)2
n
2 log(dv−1)
log(dv−1)(dc−1)
)
n
•
>
1
ne
−ηPT ( dv(dv−1)dv−2 )
2
n
2 log(dv−1)
log(dv−1)(dc−1)(
n
1
1+
log(dc−1)
log(dv−1) 2
√
piηPT
(
dv(dv−1)
(dv−2)
)
+
√
pi
ηPT
dv−2
dv(dv−1)
)
(∗)
>
e−ηPT (
dv(dv−1)
dv−2 )
2
n
2 log(dv−1)
log(dv−1)(dc−1)
n1+
log(dv−1)
log(dv−1)(dc−1)
(
dv(dv−1)
(dv−2)
)(
2
√
piηPT +
√
pi
ηPT
) (22)
where (•) holds because of (1) and n ≥ 1, and (∗) holds
because n ≥ 1 and dv(dv − 1) > (dv − 2). It follows that
whenever PT ≥ 1η ,
Pe >
e−ηPT (
dv(dv−1)
dv−2 )
2
n
2 log(dv−1)
log(dv−1)(dc−1)
n1+
log(dv−1)
log(dv−1)(dc−1)
(
3dv(dv−1)(dv−2)
√
piηPT
)
(?)
>
e−ηPT (1+9pi)(
dv(dv−1)
dv−2 )
2
n
2 log(dv−1)
log(dv−1)(dc−1)
n
(23)
where (?) holds because e−x
2
< 1x for all x ≥ 0. Inverting
15
both sides of (23), taking log(·) and then simplifying,
log 1Pe
ηPT (1 + 9pi)
(
dv(dv−1)
(dv−2)
)2 < n 21+ log(dc−1)log(dv−1)
+
(
(dv−2)
dv(dv−1)
)2
log n
ηPT (1 + 9pi)
PT≥ 1η
< n
2
1+
log(dc−1)
log(dv−1) + log n (24)
We have shown that (24) holds for any Pe and any PT ≥
1
η , hence ignoring the non-dominating term on the RHS and
then raising both sides to the power log(dv−1)(dc−1)2 log(dv−1) , we get
the desired result. For the case when dv = 2, because the
minimum distance of regular-LDPC codes with dv = 2 is at
most 2+ 2 log
n
2
log(dc−1) (see [29, Theorem 2.5]), the pairwise error-
probability that a minimum-weight nonzero codeword x′ is
decoded when the all-zero codeword was transmitted is
P0→x′ ≥ Q
(√
2
Es
N0
(
2 +
2 log n2
log(dc − 1)
))
, (25)
Replacing EsN0 by ηPT , the word-error probability is
Pworde ≥ P0→x′ = Q
(√
2ηPT
(
2 +
2 log n2
log(dc − 1)
))
. (26)
Then applying an identical analysis to the dv > 3 case, for
any bit-error probability Pe:
Pe ≥
1
ne
−ηPT (2+ 2 log
n
2
log(dc−1) )2√piηPT (2 + 2 log n2log(dc−1))+√ pi
ηPT
(
2+
2 log n
2
log(dc−1)
)

n≥1
>
1
ne
−ηPT (2+ 2 log
n
2
log(dc−1) )2√piηPT (2 + 2 log n2log(dc−1))+
√
pi
(
2+
2 log n
2
log(dc−1)
)
ηPT

.
Then for any PT ≥ 1η , we also have
Pe >
e−ηPT (2+
2 log n
2
log(dc−1) )
3n
√
piηPT
(
2 +
2 log n2
log(dc−1)
) (?)> e−ηPT (1+9pi)(2+
2 log n
2
log(dc−1) )
n
,
(27)
where (?) holds because e−x
2
< 1x for all x ≥ 0. Inverting
both sides of (27), taking log(·) and then simplifying,
log 1Pe
ηPT (1 + 9pi)
< 2 +
2 log n2
log(dc − 1) +
log n
ηPT (1 + 9pi)
log 1Pe
ηPT (1 + 9pi)
PT≥ 1η
< 2 +
2 log n− 2 log 2
log(dc − 1) + log n
n≥1
≤ 2 + 2 log n− 2 log 2
log(dc − 1) + 2 log n
<
(
2 +
2
log(dc − 1)
)
(1 + log n) . (28)
Dividing both sides of (28) by
(
2 + 2log(dc−1)
)
, taking e(·) on
both sides, and simplifying:
n >
1
e
(
1
Pe
) 1
ηPT (1+9pi)(2+ 2log(dc−1) ) , (29)
which completes the proof of Lemma 2.
APPENDIX B
PROOF OF LEMMA 3
Proof of Lemma 3: First, note the Θ (·) expression in
Lemma 3 contains two variables: 1Pe and PT . We analyze
the minimum number of independent iterations as a function
Niter : [2,∞) × [∆A,∞) → R≥0 (Definition 3), where
∆A > 0 is the transmit power for which p0 is exactly the
threshold for decoding over the BSC [29, Section 4.3], [25].
Explicitly, if σA is the threshold for Gallager-A decoding over
the BSC, Q
(√
2η∆A
)
= σA.
Note when PT < ∆A, it is not possible to force Pe → 0,
hence Niter will be infinite for all Pe below some con-
stant [25]. No further analysis is needed for such low transmit
powers, since all Pe above said constant can be achieved
with Θ (1) transmit power and O (1) decoding iterations.
From [25, Eqn. (6)], the bit-error probability after the ith
decoding iteration, pi, is
pi = p0 − p0
[
1 + (1− 2pi−1)dc−1
2
]dv−1
+ (1− p0)
[
1− (1− 2pi−1)dc−1
2
]dv−1
. (30)
Since the RHS of (30) is differentiable with respect to (w.r.t.)
pi−1, by Taylor’s Theorem there exists a real function R1(x)
with limx→0R1(x) = 0 such that:
pi = p0(dv − 1)(dc − 1)pi−1 +R1(pi−1). (31)
The RHS of (31) is the first-order MacLaurin expansion of
pi. Further, because the RHS of (30) is a polynomial in pi−1,
it is twice continuously differentiable and by the mean value
theorem the remainder term R1(pi−1) has Lagrange form:
R1(pi−1) =
1
2
d2pi(x
∗)
dp2i−1
p2i−1, (32)
where x∗ ∈ (0, pi−1). It can be verified that the second
derivative of pi w.r.t. pi−1 is minimized at pi−1 = 0 and
maximized at pi−1 = 12 . Solving for both cases and plugging
into (32), we find
−p0(dv − 1)(dc − 1)
[
(dv − 2)(dc − 1)
2
+ (dc − 2)
]
p2i−1
≤ R1 ≤ 0 (33)
Plugging (33) into (31) and applying the RHS recursively, the
bit-error probability after ith decoding iteration pi is:
p0(dv − 1)(dc − 1)pi−1
[
1− pi−1
(
(dv − 2)(dc − 1)
2
+ (dc − 2)
)]
≤ pi ≤ [p0(dv − 1)(dc − 1)]i . (34)
16
Now, choose an arbitrary 0 < δ < 12 and choose PT (thereby
p0 as well). As explained in [29, Section 4.3], since we
are operating above the threshold, we are guaranteed that
pi < pi−1 ≤ p0 and pi i→∞→ 0. Thus, for sufficiently small pi
(thereby small pi−1) or sufficiently large PT (thereby small
p0), (34) becomes:
p0(dv − 1)(dc − 1)(1− δ)pi−1 ≤ pi ≤ [p0(dv − 1)(dc − 1)]i
(35)
Applying the relation on the LHS of (35) recursively
p0
[p0
2
(dv − 1)(dc − 1)
]i δ< 12≤ pi ≤ [p0(dv − 1)(dc − 1)]i√
η
piPT e
−ηPT
(2ηPT + 1)
[
(dv − 1)(dc − 1)
√
η
piPT e
−ηPT
2 (2ηPT + 1)
]i
(1)
≤ pi
(1)
≤
[
(dv − 1)(dc − 1) e
−ηPT
√
4piηPT
]i
. (36)
Inverting all sides of (36), taking log(·) on all sides, replacing
pi by Pe and i by Niter, and dividing all sides by ηPT :
Niter
[
1 +
log ηPT
2ηPT
− log(dv − 1)(dc − 1)
ηPT
+
log 2
√
pi
ηPT
]
≤ log
1
Pe
ηPT
≤ Niter
[
1− log
η
piPT
2ηPT
− log 0.5(dv − 1)(dc − 1)
ηPT
+
log (2ηPT + 1)
ηPT
]
+ 1 +
log (2ηPT + 1)
ηPT
− log
η
piPT
2ηPT
.
(37)
We have shown that (37) holds for any choice of PT as long
as Pe is sufficiently small, which completes the proof of the
constant PT result. Next, set PT ≥ max{ 2 log(dv−1)(dc−1)η , piη }.
As explained above, (37) also holds for any Pe as long as PT
is sufficiently large. In this case (37) simplifies to
Niter
[
1− log(dv − 1)(dc − 1)
ηPT
] PT> 1η>0≤ log 1Pe
ηPT
PT≥piη> 1η ;dc>dv≥2≤ Niter [1 + log 3] + 1 + log 3
1
2
Niter
PT≥ 2 log(dv−1)(dc−1)η≤ log
1
Pe
ηPT
< [1 + log 3]Niter + 3,
(38)
which completes the proof of the Lemma.
APPENDIX C
PROOF OF LEMMA 4
Proof of Lemma 4: We analyze the number of independent
iterations as a function Niter : [2,∞) → R≥0, since even in
the second case, the transmit power is a function of 1Pe . Let
∆B > 0 be the transmit power for which p0 is exactly the
threshold for decoding over the BSC [29, Section 4.3], [25].
As explained in the proof of Lemma 3, we need not consider
cases where PT < ∆B . Using [29, Eqn. 4.15], for dv odd, the
bit-error probability after the ith decoding iteration follows
pi = p0 − p0
2dv−1
dv−1∑
m= dv−12
(
dv − 1
m
)[
1 + (1− 2pi−1)dc−1
]m
× [1− (1− 2pi−1)dc−1]dv−1−m
+
1− p0
2dv−1
dv−1∑
m= dv−12
(
dv − 1
m
)[
1− (1− 2pi−1)dc−1
]m
× [1 + (1− 2pi−1)dc−1]dv−1−m . (39)
The RHS of (39) is a polynomial in pi−1 and the dv−12 th order
Maclaurin expansion is
pi = p0
(
dv − 1
dv−1
2
)
(dc − 1)
dv−1
2 p
dv−1
2
i−1 +RB(pi−1). (40)
Because the RHS of (39) is a polynomial, by the mean value
theorem, the remainder has a Lagrange form (where x∗ ∈
(0, pi−1)):
RB(pi−1) =
1(
dv+1
2
)
!
d
dv+1
2 pi(x
∗)
dp
dv+1
2
i−1
p
dv+1
2
i−1 . (41)
The dv+12 th derivative of pi is another polynomial; therefore
it must be bounded on the bounded interval [0, 12 ] and
− cBl p
dv+1
2
i−1 ≤ RB(pi−1) ≤ cBu p
dv+1
2
i−1 , (42)
for some constants cBl , c
B
u > 0. Then choose PT . Since
we exceed the decoding threshold [25], pi < pi−1 ≤ p0.
Now, take i to be the final iteration. Since we assumed
limPe→0
PT
log 1Pe
= 0, we will also have limPe→0
pi−1
p0
= 0
(the number of decoding iterations used in the coding strategy
eventually exceeds 1 as Pe → 0). Hence, for sufficiently small
pi (thereby small pi−1),
− 1
2
p0
(
dv − 1
dv−1
2
)
(dc − 1)
dv−1
2 p
dv−1
2
i−1 ≤ RB(pi−1)
≤ 1
2
p0
(
dv − 1
dv−1
2
)
(dc − 1)
dv−1
2 p
dv−1
2
i−1 . (43)
Plugging (43) into (40), we have
p0
[(
dv − 1
dv−1
2
)
(dc − 1)
dv−1
2
1
2
]
p
dv−1
2
i−1 ≤ pi
≤ p0
[(
dv − 1
dv−1
2
)
(dc − 1)
dv−1
2
3
2
]
p
dv−1
2
i−1 . (44)
Applying (44) recursively, we obtain[
p
1+···+( dv−12 )
i
0
((
dv − 1
dv−1
2
)
1
2
)1+···+( dv−12 )i−1
× (dc − 1)(
dv−1
2 )+···+( dv−12 )
i
]
≤ pi
≤
[
p
1+···+( dv−12 )
i
0
((
dv − 1
dv−1
2
)
3
2
)1+···+( dv−12 )i−1
× (dc − 1)(
dv−1
2 )+···+( dv−12 )
i
]
. (45)
Loosening the LHS and RHS of (45) and grouping like-terms
17
we havep1+···+( dv−12 )i0 ((dv − 1dv−1
2
)
(dc − 1)1
2
)1+···+( dv−12 )i−1
(i≥0;dc>2)≤ pi
(p0≤1;dc>2)≤
(
p0
(
dv − 1
dv−1
2
)
3
2
)1+···+( dv−12 )i−1
× (dc − 1)1+···+(
dv−1
2 )
i
. (46)
Simplifying geometric progressions, inverting all sides of (46),
taking log(·) on all sides, and replacing pi by Pe and i by
Niter:(
dv−1
2
)Niter − 1(
dv−1
2
)− 1
log 1
p0
+ log
2
3
(dv−1
dv−1
2
)
(dc − 1)
 ≤ log 1
Pe
≤
(
dv−1
2
)Niter+1 − 1(
dv−1
2
)− 1 log 1p0
+
(
dv−1
2
)Niter − 1(
dv−1
2
)− 1 log 2(dv−1dv−1
2
)
(dc − 1)
. (47)
Applying (1) on p0 terms in (47), dividing all sides by
ηPT , loosening the RHS by ignoring negative terms, and
simplifying:(
dv−1
2
)Niter − 1
(dv − 3)
2 + log 4piηPT − 2 log 32(dv−1dv−12 )(dc − 1)
ηPT

≤ log
1
Pe
ηPT
≤ log(dc − 1)
ηPT
+
(
dv−1
2
)Niter+1
(dv − 3)
2 + 2 log
(√
4piηPT +
√
pi
ηPT
)
ηPT
 . (48)
We have shown that (48) holds for any PT as long as Pe is
sufficiently small. Thus, treating PT as a constant in (48) and
taking log(·) on all sides completes the proof of the fixed PT
result. For the other case, consider the limits of the leftmost
and rightmost side of (48) as PT → ∞. For any  > 0, for
sufficiently large PT the following holds:((
dv − 1
2
)Niter
− 1
)
2− 
(dv − 3) ≤
log 1Pe
ηPT
≤
(
dv − 1
2
)Niter+1 2 + 
(dv − 3) . (49)
Taking log(·) on all sides of (49) and simplifying, we obtain:
log
((
dv − 1
2
)Niter
− 1
)
+ log
2− 
(dv − 3) ≤ log
log 1Pe
ηPT
≤ (Niter + 1) log
(
dv − 1
2
)
+ log
2 + 
(dv − 3) , (50)
which is equivalent to the desired result.
APPENDIX D
PROOF OF THEOREM 3
Proof of Theorem 3: Since we are proving a lower
bound, we can restrict ourselves to the case where dv ≥ 3
without loss of generality (the decoding power when dv = 2
grows exponentially faster). Via Lemma 2 and Lemma 5,
the total power required for a (dv, dc)-regular LDPC code
and any iterative message-passing decoder to achieve bit-error
probability Pe under the Wire Model is
Ptotal = Ω
PT +
 log 1Pe
ηPT (1 + 9pi)
(
dv(dv−1)
dv−2
)2

1+
log(dc−1)
log(dv−1)
2

(51)
First, it follows from (22) that if PT is kept fixed while
Pe → 0, the total power (and the decoding power) diverges as
Ptotal,bdd PT = Ω
(
log
1+
log(dc−1)
log(dv−1)
2
1
Pe
)
. (52)
The exponent of log 1Pe in (52) is always greater than 1 since
dc > dv for any regular-LDPC code. Next, differentiating the
expressions inside the Ω (·) of (51) w.r.t. PT , setting to zero,
and substituting the minimizing transmit power into (51) we
find that the minimum total power is:
Ptotal,min = Ω
log 11+ 21+ log(dc−1)log(dv−1) 1
Pe
 , (53)
which completes the proof of the theorem.
APPENDIX E
PROOF OF THEOREM 4
Proof of Theorem 4: Let N (Pe)iter denote the minimum num-
ber of independent Gallager-A decoding iterations required
to achieve bit-error probability Pe. Via Theorem 2, the total
power is lower bounded by Ptotal = PT + Ω
(
eγN
(Pe)
iter
)
. It
follows from Lemma 3 that if PT is kept fixed as Pe → 0, then
the required decoding power diverges atleast as fast as a power
of 1Pe , which is exponentially larger than the power required
for uncoded transmission. If instead the transmit power is
allowed to vary, it follows from Lemma 3 that
Ptotal = Ω
(
PT +
(
1
Pe
) γ
ηPT
)
. (54)
Ptotal = O
(
PT +
(
1
Pe
) 2γ
ηPT
)
(55)
In order to find the optimizing transmit power, let LPe(PT )
denote the function in the Ω (·) expression of (54) and let
UPe(PT ) denote the function in the O (·) expression of (55):
LPe(PT ) = PT +
(
1
Pe
) γ
ηPT
(56)
UPe(PT ) = PT +
(
1
Pe
) 2γ
ηPT
. (57)
We start by analyzing the lower bound. To find the PT which
18
minimizes LPe , we differentiate LPe and set it to 0
dLPe
dPT
= 1− e
γ log 1
Pe
ηPT
γ log 1Pe
ηP 2T
= 0
⇒ P
2
T
γ log 1Pe
η
= e
γ log 1
Pe
ηPT . (58)
Now, let P = PT√
γ log 1
Pe
η
. Substituting into (58), we get
P2 = e
√
γ log 1
Pe
η
P ⇒ 2P logP =
√
γ log 1Pe
η
(59)
⇒ logPelogP = 1
2
√
γ log 1Pe
η
. (60)
The positive, real valued solution to (60) is given by the
principal branch W0(·) of the Lambert W function [71].
Explicitly, when x, z ∈ R≥0 satisfy the relation x = zez ,
we say z = W0(x). Hence we can write
logP = W0
1
2
√
γ log 1Pe
η
 (61)
⇒ P = logPe
logP
logP
(60)(61)
=
1
2
√
γ log 1Pe
η
W0
(
1
2
√
γ log 1Pe
η
) . (62)
Rewriting P in terms of PT we find the optimizing transmit
power
P ∗T =
γ log 1Pe
η
2W0
(
1
2
√
γ log 1Pe
η
) . (63)
The first two terms in the asymptotic expansion of W0(x) as
x→∞ are log(x)− log log(x) [71]. In fact, ∀x ≥ e [72]:
log(x)− log log(x) ≤W0(x) ≤ log(x)− 1
2
log log(x). (64)
Using (64) in (63), and ignoring constant terms in the resulting
denominator, the optimizing transmit power is
P ∗T = Θ
(
γ
η log
1
Pe
log log 1Pe − 2 log log log
1
2 1
Pe
)
.
Plugging back into LPe in (56), and ignoring non-dominating
terms, we get the lower bound. An identical analysis of UPe
in (57) gives the upper bound and completes the proof.
APPENDIX F
PROOF OF THEOREM 5
Proof of Theorem 5: Via Theorem 2 and Lemma 4, if the
transmit power is kept fixed as Pe → 0, the total power is
Ptotal,bdd PT = Ω
(
PT + log
γ 1
Pe
)
. (65)
If instead PT is allowed to scale as a function of Pe,
Ptotal
Theorem 2
= Ω
(
PT + e
γN
(Pe)
iter
)
(66)
Lemma 4
= Ω
PT + e γlog( dv−12 ) log log
1
Pe
PT
 (67)
= Ω
PT +( log 1Pe
PT
) γ
log( dv−12 )
 .(68)
Using the upper bound from Theorem 2,
Ptotal = O
PT +( log 1Pe
PT
) 2γ
log( dv−12 )
 . (69)
Then, considering the bounds on γ in Theorem 2, we examine
the exponents of log 1Pe in (65) and
log 1Pe
PT
in (68):
γ ≥ log (dv − 1) + log (dc − 1)
dc>dv>3≥ log 12
⇒ γ
log
(
dv−1
2
) > 1 + log (dc−1)log (dv−1)
1− log 2log (dv−1)
dc>dv>3≥ 2. (70)
It follows that if the transmit power is kept fixed even
as Pe → 0, the total power diverges as Ptotal,bdd PT =
Ω
(
log2.48 1Pe
)
. Moving to the unbounded case, substitut-
ing (70) back into (68), we obtain:
Ptotal = Ω
PT +( log 1Pe
PT
)2 . (71)
Differentiating the expression inside the Ω (·) on the RHS
of (71) w.r.t. PT and setting to zero, the total power scales
like:
Ptotal,min = Ω
(
log
2
3
1
Pe
)
.
This lower bound tightens when dvdc ≥ 4(dv + dc). Using
Theorem 2 for this case,
Ptotal = Ω
PT +( log 1Pe
PT
)log 32
Ptotal,min
log 32
log 32+1>
31
40
= Ω
(
log
31
40
1
Pe
)
.
Moving to the upper bound, via Theorem 2, we find that the
exponent of
log 1Pe
PT
in (69) is
2γ ≤ 6 log(2dvdc + 1)
log(dv − 1)− log 2 . (72)
Then substituting (72) into (69), we get the bound
Ptotal = O
PT +( log 1Pe
PT
) 6 log(2dvdc+1)
log(dv−1)−log 2
 . (73)
Differentiating the expressions inside the O (·) of (73) w.r.t.
PT and setting to zero, we obtain the upper bound.
19
APPENDIX G
CAD FLOW DETAILS
The decoding implementation models are constructed in a
hierarchical manner. First, behavioral verilog descriptions of
variable and check nodes are mapped to standard cells using
logic synthesis14 and are then placed-and-routed using a phys-
ical design tool. The physical area of the individual circuits
is obtained. Post-layout simulation is then performed, using
extracted RC parasitics and typical corners for the Synopsys
32/28nm HVT CMOS process at a supply voltage of 0.78V.
The critical-path delays of the variable-nodes TVN(a, dv),
and check-nodes TCN(a, dc) for each decoding algorithm are
obtained using post-layout static timing analysis with the
parasitics included. Post-layout power analysis is performed
to obtain the average power consumption of variable-nodes
PVN(a, dv) and check-nodes PCN(a, dc) using a “virtual
clock” of period TVN(a, dv) or TCN(a, dc), respectively, over
a large number of uniformly random input patterns. In practice
however, the amount of switching activity at the decoder
depends on the number of errors in the received sequence
over the channel, and it thereby depends on the parameters of
the channel and communication system. For example, when
the transmit power is large and/or the path-loss and noise are
small, the expected number of errors in the received sequence
is small and the switching activity caused by bit-flips may be
much smaller than these simulations indicate. Nevertheless, we
assume (with slight overestimation) that the averaged power
numbers hold for the various check-nodes and variable-nodes.
APPENDIX H
CIRCUIT MODEL FOR CRITICAL-PATH DELAY
It is assumed that all decoders operate at the minimum clock
period TCLK(a, g, dv, dc) for which timing would be met at the
0.78V supply voltage. This minimum allowable clock period
that meets timing in flip-flop based synchronous circuits is
bounded by the setup time constraint [19] for each flip-flop.
The setup time is the minimum time it takes the incoming data
to a flip-flops to propagate through the input stages of the flip-
flop. The critical path for a full decoding iteration consists
of a CLK-Q delay of a message passing flip-flop inside a
variable-node, then an interconnect delay, then a check-node
delay TCN(a, dc), then another interconnect delay, and finally
a variable-node delay TVN(a, dv). In these models, the setup
and the CLK-Q delay are accounted for in TVN(a, dv).
Interconnect delay is assumed to be linearly proportional
to the resistance and capacitance of the interconnect, which
depend on the length and width of interconnects. Estimating
the length of interconnects requires an estimate of the de-
coder’s physical dimensions. The total area of the decoder,
ADecoder, is estimated as a sum of check-node and variable-
node areas, where the nodes are assumed to be placed in a
square arrangement. Best-case and worst-case estimates for
14The delay, power, area, and structure of synthesized logic depend on the
constraints and mapping effort given as inputs to the synthesis tool. To allow
for a fair comparison between codes of different degrees, we only specify
constraints for minimum delay and minimum power and use the highest
possible mapping effort for each node.
the average interconnect length lwire(a, g, dv, dc) are obtained
by the following equations [33]
lwire(a, g, dv, dc) =
 A
0.25
Decoder in best case. (74)√
ADecoder
3
in worst case. (75)
Rigorous empirical and theoretical justification for the above
estimates is provided in [33] where it is shown that (74)
is a good approximation for highly-parallel logic and (75)
is the average value for randomly-placed logic on a square
array. Since the logic functions computed by the check-nodes
and variable-nodes for the decoding algorithms considered
in this paper are intrinsically parallel and we also assume
the decoders are implemented in a fully-parallel manner, we
used (74) for the results shown in this paper. However, we
note that (75) could be a better approximation, depending on
the code construction used.
Routing for decoders is assumed to use minimum-width
wires on the lower 7 metal layers of the 9-layer CMOS pro-
cess15. The average minimum width (wavg), sheet resistance
(Rsq), and capacitance per-unit-length (Cunit) 16 for these
metal layers are calculated using design rule information [54]
and are assumed as constants. Interconnect delay is then
estimated assuming a distributed Elmore model [19]:
Rwire(a, g, dv, dc) = Rsq × lwire(a, g, dv, dc)
wavg
(76)
Cwire(a, g, dv, dc) = Cunit × lwire(a, g, dv, dc) (77)
Twire(a, g, dv, dc) =
RsqCunitl
2
wire(a, g, dv, dc)
2wavg
. (78)
APPENDIX I
CIRCUIT MODEL FOR COMPUTATION POWER
The power consumption of a logic gate consists of both
dynamic power (which is proportional to the activity-factor at
the input of the gate and the clock-frequency), and static power
(which has no dependence on the activity-factor or the clock
frequency) [19]. In post-layout simulation, the static power
consumption of variable-nodes and check-nodes at 0.78V
supply in a high threshold-voltage process is observed to be
less than 1% of the total power in check-nodes and variable-
nodes. Therefore, with little loss in accuracy, we treat the
total power consumption of check-nodes and variable-nodes as
dynamic power when considering the effect of clock-frequency
scaling. Therefore, the power consumed after clock-frequency
scaling in variable-nodes is PVN(a, dv)× TVN(a,dv)TCLK(a,g,dv,dc) and
in check-nodes it is PCN(a, dc)× TVN(a,dv)TCLK(a,g,dv,dc) .
APPENDIX J
CIRCUIT MODEL FOR INTERCONNECT POWER
Using the interconnect capacitance estimate
Cwire(a, g, dv, dc) and clock period TCLK(a, g, dv, dc)
from Appendix H, and assuming an activity factor of
1
2 , the power consumed by a single message-passing
15The top two metal layers are often used to construct a global power grid
for an entire chip.
16Including parallel-plate and fringing components [19].
20
interconnect (Pwire(a, g, dv, dc)) in the decoder is modeled
using the formula for the dynamic power consumed in
interconnects [19]:
Pwire(a, g, dv, dc) =
Cwire(a, g, dv, dc)× (0.78V )2
2TCLK(a, g, dv, dc)
. (79)
REFERENCES
[1] K. Ganesan, P. Grover, and A. Goldsmith, “How far are LDPC codes
from fundamental limits on total power consumption?” in 50th Allerton
Conference, Monticello, IL, Oct. 2012, pp. 671–678.
[2] K. Ganesan, Y. Wen, P. Grover, A. Goldsmith, and J. Rabaey, “Choosing
“green” codes by simulation-based modeling of implementations,” in
Proc. of GLOBECOM, Dec. 2012, pp. 3286–3292.
[3] V. Stojanovic´, “Channel-limited high-speed links: modeling, analysis
and design,” Ph.D. dissertation, Stanford University, 2004.
[4] Z. Zhang, V. Anantharam, M. J. Wainwright, and B. Nikolic, “An
efficient 10GBASE-T ethernet LDPC decoder design with low error
floors,” IEEE J. Solid-State Circuits, vol. 45, no. 4, pp. 843 –855, April
2010.
[5] IEEE Std. 802.11ad-2012: “Wireless LAN Medium Access Control
(MAC) and Physical Layer (PHY) Specifications: Enhancements for Very
High Throughput in the 60 GHz Band,” amendment. IEEE, Dec. 2012.
[6] P. Grover, K. A. Woyach, and A. Sahai, “Towards a communication-
theoretic understanding of system-level power consumption,” IEEE J.
Select. Areas Comm., vol. 29, no. 8, pp. 1744–1755, Sept. 2011.
[7] C. Marcu et al., “A 90 nm CMOS low-power 60 GHz transceiver with
integrated baseband circuitry,” IEEE J. Solid-State Circuits, vol. 44,
no. 12, pp. 3434–3447, Dec. 2009.
[8] A. Darabiha, A. C. Carusone, and F. R. Kschischang, “Power reduction
techniques for LDPC decoders,” IEEE J. Solid-State Circuits, vol. 43,
no. 8, pp. 1835–1845, Aug. 2008.
[9] T. J. Richardson and R. L. Urbanke, Modern Coding Theory. Cambridge
University Press, 2007.
[10] S. Cui, A. Goldsmith, and A. Bahai, “Energy Constrained Modulation
Optimization,” IEEE Trans. Wireless Comm., vol. 4, no. 5, pp. 1–11,
Sept. 2005.
[11] P. Youssef-Massaad, M. Medard, and L. Zheng, “Impact of Processing
Energy on the Capacity of Wireless Channels,” in Proc. of ISITA, Parma,
Italy, Oct. 2004.
[12] C. D. Thompson, “A complexity theory for VLSI,” Ph.D. dissertation,
Carnegie Mellon University, 1980.
[13] P. Grover, A. Goldsmith, and A. Sahai, “Fundamental limits on the
power consumption of encoding and decoding,” in Proc. of ISIT,
Cambridge, MA, July 2012, pp. 2716–2720.
[14] P. Grover and A. Sahai, “Fundamental bounds on the interconnect com-
plexity of decoder implementations,” in Proc. of 45th CISS, Baltimore,
MD, Mar. 2011, pp. 1–6.
[15] C. Blake and F. R. Kschischang, “Energy consumption of VLSI de-
coders,” Dec. 2014, http://arxiv.org/abs/1412.4130.
[16] IEEE Std. 802.3an-2006: “Physical Layer and Management Parameters
for 10 Gb/s Operation, Type 10GBASE-T,” amendment to IEEE Std.
802.3-2005. IEEE, Sept. 2006.
[17] K. S. Andrews, D. Divsalar, S. Dolinar, J. Hamkins, C. R. Jones, and
F. Pollara, “The development of Turbo and LDPC codes for deep-space
applications,” Proc. IEEE, vol. 95, no. 11, pp. 2142–2156, Nov. 2007.
[18] C. Blake and F. R. Kschischang, “On the energy complexity of LDPC
decoder circuits,” Feb. 2015, http://arxiv.org/abs/1502.07999.
[19] J. M. Rabaey, A. P. Chandrakasan, and B. Nikolic, Digital Integrated
Circuits. Prentice Hall, 2002.
[20] T. H. Lee, The design of CMOS radio-frequency integrated circuits.
Cambridge university press, 2004.
[21] T. Sundstro¨m, B. Murmann, and C. Svensson, “Power dissipation bounds
for high-speed Nyquist analog-to-digital converters,” IEEE Trans. Cir-
cuits Syst. I, vol. 56, no. 3, pp. 509–518, Mar. 2009.
[22] Y. Li, B. Bakkaloglu, and C. Chakrabarti, “A system level energy
model and energy-quality evaluation for integrated transceiver front-
ends,” IEEE Trans. VLSI, vol. 15, no. 1, pp. 90–103, Jan. 2007.
[23] D. Knuth, Art of Computer Programming Volume 1: Fundamental
Algorithms. Addison-Wesley Professional, 1997.
[24] G. Brassard and P. Bratley, Fundamentals of algorithmics. Prentice
Hall, 1996.
[25] T. J. Richardson and R. L. Urbanke, “The capacity of low-density parity-
check codes under message-passing decoding,” IEEE Trans. Inf. Theory,
vol. 47, no. 2, pp. 599–618, Feb. 2001.
[26] R. D. Gordon, “Values of mills’ ratio of area to bounding ordinate and
of the normal probability integral for large values of the argument,” Ann.
Math. Stat., vol. 12, no. 3, pp. 364–366, Sept. 1941.
[27] F. R. Kschischang, B. J. Frey, and H. Loeliger, “Factor graphs and the
sum-product algorithm,” IEEE Trans. Inf. Theory, vol. 47, no. 2, pp.
498—519, Feb. 2001.
[28] J. Zhao, F. Zarkeshvari, and A. H. Banihashemi, “On implementation
of min-sum algorithm and its modifications for decoding low-density
parity-check (LDPC) codes,” IEEE Trans. Comm., vol. 53, no. 4, pp.
549–554, April 2005.
[29] R. G. Gallager, “Low-density parity-check codes,” Ph.D. dissertation,
MIT, 1960.
[30] L. Dolecek, P. Lee, Z. Zhang, V. Anantharam, B. Nikolic, and M. Wain-
wright, “Predicting error floors of structured LDPC codes: Deterministic
bounds and estimates,” IEEE J. Select. Areas Comm., vol. 27, no. 6, pp.
908–917, Aug. 2009.
[31] R. M. Tanner, “A recursive approach to low complexity codes,” IEEE
Trans. Inf. Theory, vol. 27, no. 5, pp. 533–547, 1981.
[32] R. P. Brent and H.-T. Kung, “The chip complexity of binary arithmetic,”
in Proc. of 12th STOC, 1980, pp. 190–200.
[33] W. E. Donath, “Placement and average interconnection lengths of
computer logic,” IEEE Trans. Circuits Syst. I, vol. 26, no. 4, pp. 272–
277, April 1979.
[34] P. Christie and D. Stroobandt, “The interpretation and application of
Rent’s rule,” IEEE Trans. VLSI, vol. 8, no. 6, pp. 639–648, Dec. 2000.
[35] Wikipedia, “Back end of line,” https://en.wikipedia.org/wiki/Back end
of line, 2015.
[36] C. D. Thompson, “VLSI design with multiple active layers,” Information
Processing Letters, vol. 21, no. 3, pp. 109–111, 1985.
[37] M. Aly et al., “Energy-efficient abundant-data computing: The N3XT
1,000X,” IEEE Computer, Dec. 2015.
[38] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, “Theoretical and
practical limits of dynamic voltage scaling,” in Proc. of 41st DAC, June
2004, pp. 868–873.
[39] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leak-
age current mechanisms and leakage reduction techniques in deep-
submicrometer CMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327,
2003.
[40] P. O. Vontobel and R. Koetter, “Graph-cover decoding and finite-length
analysis of message-passing iterative decoding of LDPC codes,” Dec.
2005, http://arxiv.org/abs/cs/0512078.
[41] M. Lentmaier, D. V. Truhachev, K. S. Zigangirov, and D. J. Costello,
“An analysis of the block error probability performance of iterative
decoding,” IEEE Trans. Inf. Theory, vol. 51, no. 11, pp. 3834–3855,
Nov. 2005.
[42] A. Shokrollahi, “Capacity achieving sequences,” Codes, Systems, Graph-
ical Models, vol. 123, pp. 153–166, 2001.
[43] C. E. Leiserson, “Area-efficient graph layouts (for VLSI),” in Proc. of
21st Symp. on FOCS, Oct. 1980, pp. 270–281.
[44] R. J. Lipton and R. Sedgewick, “Lower bounds for VLSI,” in Proc. of
13th Symp. on FOCS, May 1981, pp. 300–307.
[45] F. T. Leighton, “Layouts for the shuffle-exchange graph and lower bound
techniques for VLSI,” Ph.D. dissertation, MIT, 1982.
[46] K. Zarankiewicz, “On a problem of P. Tura´n concerning graphs,” Fund.
Math., vol. 41, pp. 137–145, 1954.
[47] J. Pach and G. To´th, “Thirteen problems on crossing numbers,” Geom-
binatorics, vol. 9, no. 4, pp. 194–207, 2000.
[48] J. Pach, J. Spencer, and G. To´th, “New bounds on crossing numbers,”
Dis. Comp. Geo., vol. 24, no. 4, pp. 623–644, 2000.
[49] N. Alon and J. H. Spencer, The Probabilistic Method. John Wiley &
Sons, 2004.
[50] M. Ajtai, V. Chva´tal, M. M. Newborn, and E. Szemere´di, “Crossing-free
subgraphs,” Ann. Dis. Math., vol. 12, pp. 9–12, 1982.
[51] L. A. Sze´kely, “Short proof for a theorem of Pach, Spencer, and Toth,”
Contemporary Mathematics, vol. 342, pp. 281–283, 2004.
[52] A. Fernandez and K. Efe, “Efficient VLSI layouts for homogeneous
product networks,” IEEE Trans. Comput., vol. 46, no. 10, pp. 1070–
1082, Oct. 1997.
[53] L. Sassatelli, S. K. Chilappagari, B. Vasic, and D. Declercq, “Two-bit
message passing decoders for LDPC codes over the binary symmetric
channel,” Dec. 2009, http://arxiv.org/abs/0901.2090.
[54] Synopsys Inc., “32/28nm generic library,” https://www.synopsys.com/
Community/UniversityProgram/Pages/32-28nm-generic-library.aspx,
2015.
[55] K. Ganesan, “LDPC decoding power models,” http://web.stanford.edu/
∼karthik3/JSACPowerModels/, 2015.
21
[56] K. Cho and D. Yoon, “On the general BER expression of one and two-
dimensional amplitude modulations,” IEEE Trans. Comm., vol. 50, no. 7,
pp. 1074–1080, July 2002.
[57] Y. Wang, J. S. Yedidia, and S. C. Draper, “Construction of high-
girth QC-LDPC codes,” Mitsubishi Electric Research Lab, Tech. Rep.
TR2008-061, Sept. 2008.
[58] C. E. Shannon, “A mathematical theory of communication,” Bell Syst.
J., vol. 27, pp. 379–423,623–656, Jul./Oct. 1948.
[59] A. Viterbi, “Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm,” IEEE Trans. Inf. Theory, vol. 13, no. 2,
pp. 260–269, Apr. 1967.
[60] I. Jacobs and E. Berlekamp, “A lower bound to the distribution of
computation for sequential decoding,” IEEE Trans. Inf. Theory, vol. 13,
no. 2, pp. 167–174, Apr. 1967.
[61] E. Arikan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
[62] M. M. Mansour and N. R. Shanbhag, “High-throughput LDPC de-
coders,” IEEE Trans. VLSI, vol. 11, pp. 976–996, Dec. 2003.
[63] M. Mohiyuddin, A. Prakash, A. Aziz, and W. Wolf, “Synthesizing
interconnect-efficient low density parity check codes,” in Proc. of 41st
DAC, June 2004, pp. 488–491.
[64] J. Thorpe, “Design of LDPC graphs for hardware implementation,” in
Proc. of ISIT, Lausanne, Switzerland, July 2002.
[65] C. Berrou and A. Glavieux, “Near optimum error correcting coding and
decoding: Turbo-codes,” IEEE Trans. Comm., vol. 44, no. 10, pp. 1261–
1271, Oct. 1996.
[66] J. Pach, F. Shahrokhi, and M. Szegedy, “Applications of the crossing
number,” Algorithmica, vol. 16, no. 1, pp. 111–117, 1996.
[67] S. Lin and D. J. Costello, Error control coding. Prentice Hall, 2004.
[68] H. Jeong and P. Grover, “Energy-adaptive codes,” in 53rd Allerton
Conference, Monticello, IL, Oct. 2015.
[69] R. Koetter and P. O. Vontobel, “Graph covers and iterative decoding of
finite length codes,” in Proc. of 3rd ISTC, Brest, France, 2003.
[70] C. Desset, M. Benoit, and L. Vandendorpe, “Computing the word-
, symbol-, and bit-error rates for block error-correcting codes,” IEEE
Trans. Comm., vol. 52, no. 6, pp. 910–921, June 2004.
[71] R. M. Corless, D. J. Jeffrey, and D. E. Knuth, “A sequence of series for
the Lambert W function,” Proc. of ISSAC, pp. 197–204, 1997.
[72] A. Hoorfar and M. Hassani, “Inequalities on the Lambert W function
and hyperpower function,” J. Inequal. Pure and Appl. Math, vol. 9, no. 2,
2008.
Karthik Ganesan received the B.S. degree in EECS
and the B.A. degree in Statistics from the University
of California at Berkeley in 2013, and the M.S.
degree in EE from Stanford University in 2015,
where he is currently pursuing the Ph.D. degree.
He is interested in some aspects of coding theory,
applied probability and ergodic theory, and their
uses in fault-tolerant computing, low-power system
design, and emerging neuroscience applications.
Pulkit Grover is an assistant professor in Electrical
and Computer Engineering at Carnegie Mellon Uni-
versity (since 2013), working on information theory,
circuit design, and biomedical engineering. His fo-
cus is on developing a new theory of information for
low-energy communication, sensing, and comput-
ing by incorporating novel circuit/processing-energy
models to add to classical communication or sensing
energy models. A common theme in his work is ob-
serving when optimal designs depart radically from
classical theoretical intuition. To apply these ideas to
a variety of problems including wearables, IoT, and novel biomedical systems,
his lab works extensively with engineers, neuroscientists, and doctors.
He is a recipient of the 2010 best student paper award at the IEEE
Conference on Decision and Control; a 2010 best student paper finalist at
the IEEE International Symposium on Information Theory; the 2011 Eli Jury
Dissertation Award from UC Berkeley; the 2012 Leonard G. Abraham award
from the IEEE Communications Society; a 2014 best paper award at the
International Symposium on Integrated Circuits; a 2014 NSF CAREER award;
and a 2015 Google Research Award.
Jan Rabaey is the Donald O. Pederson Distin-
guished Professor in the Electrical Engineering and
Computer Science Department, University of Cali-
fornia at Berkeley. He is currently the Scientific Co-
director of the Berkeley Wireless Research Center
(BWRC), the director of the Berkeley Ubiquitous
SwarmLab, and the Director of the FCRP Multiscale
Systems Research Center (MuSyC). His research
interests include the conception and implementation
of next-generation integrated wireless systems.
Dr. Rabaey is the recipient of a wide range of
awards, among which are the 2008 IEEE Circuits and Systems Society Mac
Van Valkenburg Award and the 2009 European Design Automation Associ-
ation (EDAA) Lifetime Achievement award. In 2010, he was awarded the
prestigious Semiconductor Industry Association (SIA) University Researcher
Award. He is an IEEE Fellow and a member of the Royal Flemish Academy
of Sciences and Arts of Belgium. He received his Ph.D. degree in applied
sciences from the Katholieke Universiteit Leuven, Leuven, Belgium.
Andrea Goldsmith is the Stephen Harris professor
in the School of Engineering and a professor of Elec-
trical Engineering at Stanford University. She was
previously on the faculty of Electrical Engineering
at Caltech. Her research interests are in information
theory and communication theory, and their applica-
tion to wireless communications and related fields.
She co-founded and served as Chief Scientist of
Wildfire.Exchange, and previously co-founded and
served as CTO of Quantenna Communications, Inc.
She has also held industry positions at Maxim Tech-
nologies, Memorylink Corporation, and AT&T Bell Laboratories. Dr. Gold-
smith is a Fellow of the IEEE and of Stanford, and has received several awards
for her work, including the IEEE ComSoc Edwin H. Armstrong Achievement
Award as well as Technical Achievement Awards in Communications Theory
and in Wireless Communications, the National Academy of Engineering
Gilbreth Lecture Award, the IEEE ComSoc and Information Theory Society
Joint Paper Award, the IEEE ComSoc Best Tutorial Paper Award, the Alfred
P. Sloan Fellowship, the WICE Technical Achievement Award, and the Silicon
Valley/San Jose Business Journal’s Women of Influence Award. She is author
of the book “Wireless Communications” and co-author of the books “MIMO
Wireless Communications” and “Principles of Cognitive Radio,” all published
by Cambridge University Press, as well as an inventor on 28 patents. She
received the B.S., M.S. and Ph.D. degrees in Electrical Engineering from
U.C. Berkeley.
Dr. Goldsmith has served on the Steering Committee for the IEEE Trans-
actions on Wireless Communications and as editor for the IEEE Transactions
on Information Theory, the Journal on Foundations and Trends in Commu-
nications and Information Theory and in Networks, the IEEE Transactions
on Communications, and the IEEE Wireless Communications Magazine. She
participates actively in committees and conference organization for the IEEE
Information Theory and Communications Societies and has served on the
Board of Governors for both societies. She has also been a Distinguished
Lecturer for both societies, served as President of the IEEE Information
Theory Society in 2009, founded and chaired the student committee of the
IEEE Information Theory society, and chaired the Emerging Technology
Committee of the IEEE Communications Society. At Stanford she received the
inaugural University Postdoc Mentoring Award, served as Chair of Stanford’s
Faculty Senate in 2009 and currently serves on its Faculty Senate, Budget
Group, and Task Force on Women and Leadership.
