Abstract-Sequences of randomly generated bipartite configurations are analyzed; under mild conditions almost surely such configurations have minimum bisection width proportional to the number of vertices. This implies an almost sure (n 2 /d 2 max ) scaling rule for the energy of directlyimplemented low-density parity-check (LDPC) decoder circuits for codes of block length n and maximum node degree d max . It also implies an (n 3/2 /d max ) lower bound for serialized LDPC decoders. It is also shown that all (as opposed to almost all) capacity-approaching, directly-implemented non-split-node LDPC decoding circuits, have energy, per iteration, that scales as χ 2 ln 3 χ , where χ = (1 − R/C) −1 is the reciprocal gap to capacity, R is code rate, and C is channel capacity.
I. INTRODUCTION
T HIS paper uses an adaptation of Thompson's [1] VLSI model to derive lower bounds on the VLSI energy complexity of low-density parity-check (LDPC) codes, an important family of error control codes introduced by Gallager [2] .
The first result is an "almost-sure" scaling rule for the energy complexity of LDPC decoders. In particular, we analyze ensembles generated according to a uniform configuration distribution. We show, subject to some mild conditions, that the minimum bisection width of a randomly-generated bipartite configuration asymptotically almost surely has minimum bisection width proportional to the number of vertices. This implies an (n 2 /d 2 max ) lower bound on the energy of directly-implemented LDPC decoders (see Definition 5) and a (n 3/2 /d max ) lower bound on the energy of serialized decoders (see Definition 14) .
We also show that a capacity-approaching sequence of "non-split-node directly-implemented" LDPC decoders (see Definition 8) must have energy that scales as (χ 2 ln 3 (χ)), where χ = (1 − R/C) −1 is the reciprocal gap to capacity, where R is the code rate and where C is the capacity of the channel over which the code is transmitted. This lower bound contrasts with the universal lower bound of (χ 2 √ ln(χ)) derived in [3] . The (χ 2 ln 3 (χ)) result applies to decoding circuits where messages are passed on a Tanner graph induced by a paritycheck matrix of the underlying code. This lower bound does not apply to decoding algorithms that use modified Tanner graphs with punctured variable nodes like those used for the non-systematic irregular repeat accumulate (IRA) codes of [4] or the compound low-density generator matrix (LDGM) codes of [5] . However, computations show that the (n 2 /d 2 max ) and (n 3/2 /d max ) almost sure lower bounds apply to the nonsystematic IRA construction of [4] for many parameters.
We begin the paper in Section II with a discussion of prior related work. In Section III we introduce the main definitions and the circuit model considered. Then, in Section IV, after defining some properties of node-degree lists, we present the main theorem. We proceed to show how this theorem allows us to find scaling laws for the energy of LDPC decoders in Section V. In Section VI we derive a χ 2 ln 3 χ scaling rule for capacity-approaching sequences of non-split-node LDPC decoders. In Section VII we discuss some open problems related to this work and in Section VIII we make some concluding remarks.
II. PRIOR WORK

A. Related Work on Circuit Complexity and LDPC Codes
Ganesan et al. [6] assume that the average wire length in a VLSI instantiation of a Tanner graph is proportional to the longest wire, and that the length of the longest wire is proportional to the diagonal of the circuit upon which the LDPC decoder is laid out. The implication of these assumptions is an n 2 scaling rule for the area of directly-implemented LDPC decoders, which is the same result as Corollary 2 of this paper. However, these assumptions are taken as axioms without being fully justified; there certainly can exist bipartite Tanner graphs that can be instantiated in a circuit without such area. We show that, in fact, the n 2 scaling rule is justified for almost all directly-implemented Tanner graphs (so long as some mild conditions are satisfied).
More recently, Ganesan et al. [7] analyze the VLSI complexity of certain classes of LDPC decoding algorithms, including how the number of iterations required for such algorithms scales with block error probability. Moreover, the authors show that a judicious choice of node-degree distributions can optimize the total (transmit + decoding) power for coded communication using LDPC codes by simulating real circuits and their code performance. The Ganesan et al. paper complements our paper; we do not analyze how the number of iterations depends on target block error probability, 0018 -9448 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
nor do we simulate any actual circuit performance. Neither the Ganesan et al. paper nor our paper consider the performance of LDPC codes whose interconnection complexity, and not just degree distribution, is optimized. This open problem is discussed further in Section VII.
B. Related Work on Graph Theory
This paper uses a combinatorial approach to derive almost sure lower bounds on the minimum bisection width of randomly generated configuration. This contrasts with a common approach that considers a graph's Laplacian (See [8, Definition 8.6.15] ). Fiedler [9] , shows that the second largest eigenvalue of a graph's Laplacian, λ 2 , can be used to find a lower bound of λ 2 n 4 on the graph's minimum bisection width. Bezrukov et al. [10] find bounds on the bisection width of graphs that are related to this λ 2 value. Diaz et al. [11] provide almost sure upper bounds for the bisection width of randomly generated regular graphs. Luczak and McDiarmid [12] also study the minimum bisection width of graphs generated according to a distribution different from ours. Furthermore, our analysis is of random bipartite graphs, as opposed to random regular graphs. As well, our result makes only weak assumptions on the node degree distribution, without requiring a degree-regularity assumption, in contrast to previous work.
III. PRELIMINARIES
A. Graph Theory Definitions
The main result of our paper involves the minimum bisection width (MBW) of a graph.
Definition 1: Let G = (V, E) be a graph and let V ⊆ V . A subset of the edges E s ⊆ E bisects V in G if removal of E s cuts V into unconnected sets V 1 and V 2 in which
The sets V 1 ∩ V and V 2 ∩ V are considered the bisected sets of vertices. A minimum bisection is a bisection of a graph whose size is minimum over all bisections. The minimum bisection width of V is the size of a minimum bisection of V . The minimum bisection width of the graph is the minimum bisection width of all the vertices V . We let φ G (V ) denote the minimum bisection width of a set of vertices V in G.
Note that finding the minimum bisection width of a graph is NP-Complete [13] .
B. Circuit Model
In this paper, the definition of a circuit is adapted from Thompson [14] and is considered to be a mathematical object consistent with the circuit axioms which we specify in [3] . Readers should consult this reference to understand precisely the model that we are discussing; we review briefly the main parts of this model below, as well indicate some minor simplifications introduced in this paper.
• A circuit is a grid of squares with computational nodes, wires, and wire crossings. Wires connect computational nodes in the circuit. The nodes compute functions of their binary inputs synchronously at each clock cycle. Wires are bidirectional: we assume that they may pass a message in both directions each clock cycle. Inputs are injected into input nodes and outputs appear at output nodes.
• One of the key circuit parameters we consider is the circuit area (A) which is equal to the number of grid squares occupied. Note that in [3] we define area as the product of the number of grid squares occupied times the square of the wire width. Since we are concerned with scaling rules, in this paper we just assume that the wire width is unity.
• Another key circuit parameter is τ , the number of clock cycles used in the computation.
• The energy of a computation is defined as E = Aτ . Note again that in [3] we borrowed notation from [15] relating energy and the area-time product by a proportionality constant; for simplicity in this paper we just assume that this proportionality constant is unity. The modified Thompson model that we consider in this paper is meant to subsume the wiring complexity as a fundamental cost of computation. In the field of error control coding, this interconnection complexity has been shown to be a significant factor in the energy of a computation in [16] and [17] . Though this model assumes a planar implementation, [1] shows that if we allow L layers, this can decrease circuit area by a factor of at most L 2 ; thus if the model allows a constant number of layers L, our lower bound results can be modified by a factor of L 2 .
C. Relationship Between Circuit Model and Graphs
Note that a circuit is a collection of nodes connected by wires. Each of the computational nodes of a circuit can be thought of as a vertex of a graph, G = (V, E). However, since more than one wire can connect two nodes, this object may actually be a multi-graph. The wires of a circuit correspond to the edges of the graph. In particular, two vertices v 1 and v 2 are connected in the graph G by an edge if and only if there is a wire connecting the two computational nodes that correspond to v 1 and v 2 . We let d max (G) denote the maximum node degree of a graph G.
D. LDPC Decoders
LDPC codes are linear codes first studied by Gallager [2] . Given a parity-check matrix H with m rows for a code of length n, the Tanner graph of H is a bipartite graph where one part of the graph contains n vertices called variable nodes and the other part is composed of m check nodes. Each check node is associated with a row of the parity-check matrix and each variable node with a column. A check node is connected to a variable node if and only if the row associated with the check node has a 1 in the column corresponding to the variable node.
Since there are many possible parity-check matrices for a given linear code, there are many possible Tanner graphs associated with that code. An LDPC decoding algorithm for a code is a message-passing procedure where messages are passed over the edges of a particular Tanner graph of the code.
We consider two possible paradigms to implement LDPC decoding algorithms with a circuit: a directly-implemented and a serialized technique. To be precise, we will use the following graph-theoretic terminology. 
.
1+d max (G) . Note that G can be obtained by contracting the vertices of G . Consider a minimum bisection of the G-corresponding vertices of G , and place one side of the bisection on the left side and the other the right side. There will be some vertices on the left side that are descended from vertices on the right side, and vice versa. We call such vertices bisection-crossing descendants.
If such a bisection of V G has ω edges crossing it, then there are at most ω G-corresponding vertices of G that have bisection-crossing descendants. To see this, observe that the set of descendants of a vertex must be connected by paths using only their vertices, so there must be at least one unique edge crossing the bisection for each G-corresponding vertex of G that has a bisection-crossing descendant.
We shall now show how to construct a bisection of G with at most ω (1 + d max (G)) edges. Simply move all the bisection-crossing descendants of G to the side of their parent, while keeping the G-corresponding vertices of G on the same side of the bisection. Then contract all the vertices that were split in obtaining G from G. By Lemma 1, for each Hcorresponding vertex of G, we observe that at most d max edges will connect the descendants of v to vertices on the opposite side of the bisection. Thus, moving the vertices to the side of their descendant can at most add ω d max (G) edges crossing the bisection, and the resulting bisection has width at most ω + ω d max .
Thus, we have constructed a bisection of G of width less than
Note that if a graph G contains G as a minor, then it contains a subgraph G that is obtained by a sequence of vertex splits of G. This allows us to conclude:
Lemma 4: If a graph G contains a graph G = (V, E) as a minor, then the minimum bisection width of the nodes of G corresponding to G is at least
φ G (V )/(1 + d max (G)).
Thus, by applying Lemma 2, the circuit area of G is at least
A min (G ) ≥ φ 2 G (V ) 4(1 + d max (G)) 2 ≥ φ 2 G (V ) 16d 2 max (G)
E. Serialized LDPC Decoders
Not all LDPC decoders are directly-implemented. This motivates considering a more general class of LDPC decoder. Our definition of a serialized circuit includes both serialization of the message-passing step (for example, by introducing an interleaver that works over multiple clock cycles to pass messages from node to node), and serializing computation steps (by having one computational node perform the computation for multiple check or variable nodes, but at different clock cycles). The key idea is that a serialized circuit simulates a joined Tanner graph, which we will define in this section.
To do so we first define a computation's communication multi-graph. Definition 12: A graph obtained by first splitting the nodes of a Tanner graph T and then joining nodes that are not associated with variable node inputs is a joined Tanner graph obtained from T .
For a joined Tanner graph T , we let j max (T ) be the maximum number of vertices joined to form a single vertex. Often its dependence on T will be suppressed.
Definition 13: A communication graph K simulates a graph G if there is a subset of vertices of K in a one-toone correspondence with the vertices of G, and for each edge in G, there is a path connecting the two corresponding vertices in K . Moreover, these paths are mutually edge-disjoint.
We can now define a serialized LDPC decoder. Definition 14: A serialized LDPC decoder for a Tanner graph T is a circuit that simulates a joined Tanner graph obtained from T during each iteration.
Note that if a particular node of such a circuit corresponds to a vertex formed by joining j nodes then there must be at least j clock cycles performed each iteration.
In the sections that follow, we prove an (n 2 /d 2 max ) scaling rule for the energy of directly-implemented LDPC decoder circuits in Corollary 2 and an (n 3/2 /d max ) lower bound for serialized LDPC decoders in Theorem 2.
IV. MAIN THEOREM Our main theorem is fundamentally graph-theoretic in nature and applies to graphs generated according to a standard uniform random configuration distribution.
Definition 15: Consider the set of bipartite graphs
Let V L be called the left nodes and V R the right nodes. Order the left nodes and right nodes in terms of increasing node degree. Let l i be the degree of the i th left node in the graph, and let r i be the degree of the i th right node in the graph. Then we say that
n is the left node degree list and
m the right node degree list. Note that the node degree lists are non-standard; often it is the node degree distribution that is considered. However, in Appendix A we show how to present our results in terms of the more standard node degree distributions.
Given
Note that implicitly this function takes as input the size of the input vector.
Denote the set of bipartite graphs with left and right node degree lists L and R as G(L, R). Note that the number of edges in each particular graph in
For convenience of counting, we will consider not the set of graphs with a particular degree list, but rather the set of configurations with this degree list. We can associate each node in a graph with a number of sockets equal to its degree. This node and socket configuration model is a standard way to consider the set of bipartite graphs that form the Tanner graphs of LDPC ensembles, and in particular is discussed thoroughly in [18] .
Definition 16: A set of left nodes and right nodes with an ordered labeling of the sockets of each node, together with a permutation mapping the left node sockets to the right node sockets is called a configuration.
Let the set of configurations with node degree lists L and R be denoted B (L, R). Clearly, |B (L, R) | = |E|!. Since a configuration is merely a graph with a labeling of sockets for each node, graph properties, including minimum bisection width, can be extended to describe configurations in the natural way.
Let Given a left node node degree list L of length n and right node degree list R of length m, where L and R are ordered by increasing degree and m ≤ n, we define
We also let
For notational convenience we will abbreviate these two quantities as δ and σ and their dependence on the node degree distribution under discussion is to be implicit. Note that |E| = δn + σ n. Consider a configuration with left degree list L and right degree list R. For a given subset of vertices V we can divide this set into two disjoint sets, (1) is the sum of the highest degree left nodes. A collection of at most half these nodes cannot exceed this quantity, leading to a contradiction.
We will need the following lemma for our proof: Lemma 6: The quantity m!n!, subject to the conditions
and that Y , Z , m and n are all integers cannot exceed Z !(Y − Z )!.
Proof: See Appendix B.
We can now give the main technical lemma of this paper, which states that the set of configurations with a small bisection is small, which will imply that with high probability a Tanner graph has MBW proportional to n.
with |V L | = n and with degree lists L and R is generated according to the uniform configuration distribution, then the probability that this configuration is in the set B * a when
is upper bounded by
Proof: This follows from a counting upper-bounding argument, where the key idea is to overcount a set of objects that is larger than B * a , namely the set of "quadrant configurations" with a bisection of size a or less.
Let the set of configurations in B (L, R) having a bisection of size a be denoted by B a . Then we can say that, according to the uniform configuration distribution, the probability of the event of generating a configuration with a bisection of size a is given by:
recalling that |E| is the number of edges in the configurations of B (L, R).
We will now bound the number of configurations in B ( , P) with a bisection of size a, and we will assume that a < σ n. To do so, we will define a "quadrant configuration", show that the number of quadrant configurations with a bisection of size a is greater than or equal to B a , and then upper bound the number of quadrant configurations with a bisection of size a or less.
A
where the vertices are divided into 4 disjoint sets, the top left nodes (T L ), the top right nodes (T R ), the bottom left nodes (B L ), and the bottom right nodes
Vertices in T L and T R are considered to be top nodes or and similarly for the bottom nodes.
Note that every bipartite graph has at least one quadrant configuration induced by arbitrarily dividing the vertices in half, and denoting one half of these vertices top nodes and the other half bottom nodes. Thus, the set of quadrant configurations with a particular degree distribution is at least as big as the set of configurations with a particular degree distribution. Because
contains a graph G, graph properties can be extended to describe a quadrant configuration.
Denote the set of quadrant configurations with set node degree lists L and R in which a is the number of edges connecting top nodes to bottom nodes as Q a . Note that the dependence of Q a on a particular node degree distribution is implicit. Observe that every configuration with a bisection of size a has a corresponding quadrant configuration in Q a created in the natural way by denoting one bisected set of vertices as the top nodes, and the other the bottom nodes.
For ease of discussion, we will assume that the total number of nodes m + n in the set of configurations under discussion is even, so that We bound the size of Q i, j a by counting all quadrant configurations with a bisection of size a that are the edges connecting top nodes to bottom nodes. In the following, for compactness, 
The proof mostly follows from simplification of these bounds and the details of the rest of the proof are given in Appendix C.
Consider a sequence of random configurations G 1 , G 2 , . . . where each G i in the sequence is a configuration generated according to the uniform configuration distribution, in which the i th configuration is drawn according to node degree lists L i and R i . Note that the randomness for each element of such a sequence does not come from the degree lists: we are assuming that these lists are fixed. It is the interconnections between nodes that is random. We specifically concern ourselves with a sequence in which the number of left nodes n increases without bound. For such a sequence, denote the number of left nodes of the i th configuration as n i . We will abbreviate the quantities δ (L i , R i ) and σ (L i , R i ) with the symbols δ i and σ i respectively. We let B * a,i be the set of configurations with node degree lists (L i , R i ) with a bisection of size a or less. 
then there exists some β > 0 in which
where {G i ∈ B * βn i ,i } is the event that the i th configuration has a bisection of size βn i or less. In particular, this is true for any 0 < β < σ that satisfies:
Remark 1: Stated less precisely, this theorem says that in the limit, a random bipartite configuration will, with high probability, have no small bisections.
Proof: This proof involves algebraic manipulation of the expression in Lemma 7 and showing that if the conditions of the theorem are satisfied, the limit evaluates to 0. The details of this computation are given in Appendix D.
In the corollaries that follow, we consider a sequence of configurations generated according to the uniform configuration distribution. Let φ i be the minimum bisection width of the i th configuration. Note that this symbol is a random variable. Theorem 1 now has an obvious corollary. (5) is satisfied then lim i→∞ P (φ i ≥ βn i ) = 1 for some β > 0.
Corollary 1: If there is a sequence of configurations as described in Theorem 1, in which the condition in
Proof: Note that B * a is the event that a random configuration has a bisection of size a or less. The complement of this event is the event that a random configuration has no bisection of size a or less, and thus equal to the event that a random configuration has minimum bisection width greater than or equal to a. The corollary flows directly from this observation.
Remark 2: This Corollary and the results that follow can be slightly strengthened, because we know that the probability that a bisection exists with size less than βn approaches 0 exponentially quickly. Let I φ i /n i <β be the event that the graph with n i left nodes has a bisection less than βn. We easily observe that n P(φ i /n i > β) < ∞ and so by the Borel-Cantelli Lemma, the probability that a bisection of size less than βn i occurs infinitely often is 0. Thus, P(lim inf i→∞ φ i /n i ≥ β) = 1 for some β > 0.
V. ALMOST SURE BOUNDS ON SUFFICIENTLY HIGH RATE LDPC DECODER CIRCUITS
To apply our results to LDPC decoder circuits, we first define a few terms in order to make our claims precise.
Definition 17 [19] : For a given parity check matrix H for a code of block length n and rate R, we define (H ) as the number of 1s in the matrix divided by n R, and call this quantity the density of the matrix H .
Definition 18: For a code of rate R associated with a channel with capacity C, let χ = (1 − R/C) −1 be the reciprocal gap to capacity.
Definition 19: Consider a sequence of codes and decoders for a particular channel. We let the block error probability of the i th code in the sequence be P e,i . Then such a sequence is vanishing-error-probability if lim i→∞ P e,i = 0.
The following result, which is a simple implication of Sason and Urbanke [19] which we present using our notation shows that as capacity is approached the density of a code's parity check matrix must approach infinity. We will use this result in Corollary 2 and Theorem 3. 
is the maximum node degree (possibly a function of n). Energy is bounded similarly.
Proof: Note that Lemma 8 implies that as rate approaches capacity, the parity-check matrix density must approach infinity. But this implies that the associated Tanner graph has number of edges per node approaching infinity. Then obviously the quantity δ must approach infinity. We can use this observation that for codes of sufficient closeness to capacity the expression
must be satisfied. To see this, note that δ approaches ∞ for a capacityapproaching code. What happens to σ is either (a) lim n→∞ δ δ+σ < 1 or (b) lim n→∞ δ δ+σ = 1, or (c) this limit does not exist. Note that this value cannot exceed 1 because necessarily σ ≤ δ.
In the case of (c), it must be that the value of σ alternates and no limit can be defined. In this case, however, we should consider the specific subsequence of decoders in which either (a) or (b) applies. It will be clear that since for each subsequence the appropriate scaling rule holds, thus it must be true for the entire sequence.
In case (a): in the limit, ln → −∞, and thus in the limit (7) will also be satisfied.
If the scheme has asymptotic rate sufficiently close to capacity, then for sufficiently large block lengths in this scheme the node degree list satisfies the sufficient condition of Theorem 1, and the code's Tanner graph has MBW at least βn i for some β > 0 with probability approaching 1 
B. Serialized-Check Node Decoders
In this section we generalize our results to serialized circuits. To develop this theory, however, we need to define some new terminology. In particular, we will generalize the notion of minimum bisection width by considering collections of bipartitions of the nodes of a graph.
Definition 20: A bipartition of a set X is the partition of the set into two disjoint sets X 1 and X \ X 1 .
We will represent a bipartition by a single set contained within it.
Definition 21: Given a set of vertices V of a graph G, a bisection of V is a bipartition of V into V 1 and V 2 such that
We see that a bisection is an example of a bipartition. What we will be interested in is collections of bipartitions that are "zig-zaggable". It is the zig-zaggable property of the bisections of a graph that allows Thompson to prove in [1] that A ≥ φ 2
G (V )/4 for a circuit with graph G = (V, E) with MBW φ G (V ).
Definition 22: Let X be a nonempty finite set. If ∅ ⊆ A ⊂ B ⊆ X, a simple chain from A to B is a sequence
Consider a subset (denoted C) of the bipartitions of a set X. Definition 23: A subset of the bipartitions of a set X is zig-zaggable if the following conditions hold: 1) All simple chains from ∅ to X contain an element of C. Proof: A set C induces a bisection of X if an only if |C| = |X|/2 or |C| = |X|/2 . A simple chain from ∅ to X results in a sequence of bipartitions where the size of one of the sets of the bipartitions increases by 1 each time. One of these bipartitions must thus be a bisection. For property 2, suppose that a simple chain from A to B contains a set C that induces a bisection. Then, either A or B are bisections, or they are not and then |A| < |X|/2 and |B| > |X|/2 , and then any simple chain from A to B will include a bisection.
We will show however that a more general collection of bipartitions is zig-zaggable.
Definition 24:
The width of a bipartition of a set of vertices of a graph is the number of edges connecting the vertices between the two sets of the bipartition.
Definition 25: The C-bipartition width of a graph with respect to a collection of bipartitions C is the minimum width of all bipartitions in C.
Using the definition of zig-zaggable, we can now easily adapt Thompson's proof [1] and derive the following lemma:
Lemma 10: Let C be a zig-zaggable collection of bipartitions of a graph G, and let ω C be the C-bipartition width of the graph. Then
A detailed proof is given in Appendix E that essentially follows the proof of Thompson [1, Th. 2]. The author constructs on the order of ω C bisections of the nodes by drawing zig-zags across the circuit, each of which have on the order of ω C wires crossing them. These bisections must exist precisely because of the zig-zaggable property of the bisections of the graph. Thus, this proof extends to any zig-zaggable collection of bipartitions.
Consider a joined Tanner graph as in Definition 12. Such a graph is obtained by splitting a Tanner graph T to obtain T and then joining vertices. We can assign to each vertex of the joined Tanner graph a number equal to the number of T -corresponding vertices of T that were joined in forming it. Each of these values is the weight of the vertex. The weight of T -corresponding vertices that were not joined are assigned the value 1, and the others are given weight 0. For a vertex v we let w(v i ) be its weight.
Definition 26: A κ-weighted bisection of a collection of positive weighted nodes V is a bipartition {V 1 , V 2 } of the vertices such that
That is, it is a bipartition where the sum of the weights of their nodes is within κ of being equal.
Lemma 11: The collecton of κ-weighted bisections of a graph with non-negative weighted vertices with maximum weight less than or equal to κ is zig-zaggable.
Proof: This proof follows essentially the same form as Lemma 9. The key idea is that the maximum weight of a vertex is κ, so any simple path between subsets of the vertices has the weight of the subsets increase by at most κ each step.
Lemma 12: Let T be a Tanner graph with maximum node degree d max , let T be a split Tanner graph obtained from T , and let T be a joined Tanner graph obtained by joining vertices of T . Let the maximum number of vertices joined in a single vertex be j max . Let the minimum bisection width of T be ω. Then, the minimum j max -weighted bisection width of T is at least ω/(2d
Proof: Suppose not, i.e., that there is a j max -weighted bisection of width ω such that ω < ω/(2d max ) − j max d max . Note that, by Lemma 3, T has MBW of its T -corresponding vertices at least ω/(2d max ). We shall show how to construct a bisection of T with width less than this. Firstly, consider the j max -weighted bisection of T . Then, unjoin all the vertices, resulting in a bipartition of T . Form a bisection of the T -corresponding vertices of T by moving T -corresponding nodes one by one from the side with the most vertices to the side with the least vertices until a bisection is formed. Each time a vertex is moved it increases the edges crossing the bisection by at most d max . A bisection is formed by moving no more than j max nodes (since the original bipartition had difference in number of nodes at most j max ). This constructs a bisection of T with width less than ω/(2d max ), a contradiction. 
for some c > 0. Proof: From Definition 14, a serialized LDPC decoder must have a single node for each node of its joined Tanner graph. Consider a particular decoder of sufficiently large block length n. We consider two cases, that (a) j max ≥ √ n and (b) j max < √ n. Case (a): The area of the circuit is at least n because there must be at least one node for each variable node. Consider the node that joined j max nodes. Then τ ≥ √ n because at least √ n outputs must appear at that node. Thus Aτ ≥ n 1.5 . Case (b). We consider the event that the Tanner graph of this code has MBW ω = cn.
By Lemma 12, the j max -weighted bisection width of the joined Tanner graph is at least c n − √ nd max , where c = c/(2d max )
Let the j max -weighted bisection width of the circuit (and not the associated Tanner graph) be W . Now consider a minimum j max -weighted bisection of that circuit. Thus, there must be at least τ ≥ c n − √ nd max 2W clock cycles per iteration to communicate c n − √ nd max bits across the bisection (where the factor of 2 comes from the bidirectionality assumption of the wires). By Lemma 10 we have
As well, because there are at least n check nodes,
A ≥ n and so we get
where we have substituted c = c/2d max to obtain the last inequality. The theorem is then implied by Corollary 1 which shows that the MBW of the Tanner graph is proportional to n with probability approaching 1.
C. Applicability and Limitations of Result
According to the definition of the uniform configuration distribution, it is possible that two or more edges can be drawn between the same two nodes. This type of conflict is usually dealt with by deleting even multi-edges and replacing odd multi-edges with a single edge [18, Definition 3.15] . This leads to a potential problem with the applicability of our theorem: what happens if the edges that we delete form a minimum bisection of the induced graph? In that case it is possible that the graph we instantiate on the circuit has a lower minimum bisection width than that which we calculated, and thus could possibly have less area. However, in the limit as n approaches infinity for a standard LDPC ensemble, the graph is locally tree-like [18, Th. 3 .49] with probability approaching 1. This implies that the probability that the number of multi-edges in a randomly generated configuration is some fraction of n must approach 0 (or else the graph would not be locally treelike, contradicting the theorem). Hence, even if we did delete these multi-edges from the randomly generated configuration, this could at most decrease the minimum bisection width by the number of deletions, but this number of deletions, with probability 1, cannot grow linearly with n. Hence, the minimum bisection width must still, with probability 1, grow linearly with n, and our scaling rules are still applicable.
In this paper we have considered Tanner graphs generated according to the uniform random configuration distribution, a commonly used method to analyze the performance of LDPC codes [18] . This does not mean that there do not exist good LDPC coding schemes with slower scaling laws. The scaling rule might be avoided if a different random generation rule for the Tanner graph is used. For example, perhaps the variable nodes and check nodes could be placed uniformly scattered through a grid and then the randomly placed edges, instead of being chosen uniformly over all possible edges, are chosen uniformly over a choice of edges connecting variable and check nodes that are "close" to each other. Whether or not such a sequence of LDPC codes would have good performance is unclear. However, in the following section we can obtain scaling rules that are true for all directly-implemented capacityapproaching LDPC decoders with vanishing error probability, not just almost all.
VI. BOUNDS FOR ALL DIRECTLY-IMPLEMENTED NON-SPLIT-NODE LDPC DECODER CIRCUITS
Definition 27: A sequence of codes and decoders in which the i th code has rate R i for a channel with capacity C is vanishing-error-probability capacity-approaching if lim i→∞ R i = C and block error probability approaches 0 as i is increased.
Definition 28: The crossing number of a graph is the minimum number of edges that cross in any planar embedding of that graph.
Note that since a crossing takes at least one grid square in any circuit, the crossing number obviously is a lower bound on circuit area.
Ganesan et al. [7] related the following lemma from Pach et al. [20] to understand the complexity of LDPC decoding. We will also use this result to understand a scaling rule for all, as opposed to almost all, directly implemented LDPC decoders.
Lemma 13 [20] : Let G be a graph with |E| > 4|V | edges and girth greater than 2r for some integer r > 0. Then the crossing number of such a graph is bounded by
for some constant c r .
Theorem 3: The energy, per iteration, of any vanishingerror-probability capacity-approaching sequence of nonsplit-node directly-implemented LDPC decoders must have asymptotic energy per iteration lower bounded by
. Proof: Lemma 8 implies that the number of edges in a Tanner graph, per bit, scales as (ln χ). From [21] and [22] note that the minimum block length of any code must scale as
for a constant c 3 > 0. We then use Lemma 13, and the observation that a Tanner graph has girth at least 2, to conclude that a non-split-node directly-implemented decoder must have at least (n ln 3 (χ)) wire crossings.
Note that if a the LDPC codes are constrained to have girth greater than 2r then this argument can be extended to show that a sequence of such decoders must have area bounded by (χ 2 ln r+2 χ). It may be that directly-implemented LDPC decoders can improve upon this lower bound by splitting up check and variable node subcircuits (and not localizing these computations in one area of the circuit). In actual VLSI design this may happen automatically by circuit design software, so this limits the applicability of this theorem.
The lower bound of Theorem 3 is applicable to all nonsplit-node directly-implemented LDPC decoding schemes. However, using a punctured code construction, Pfister et al. [4] construct a capacity-approaching ensemble of codes that avoids the complexity blowup of Lemma 8. Theorem 3 does not apply to such constructions. We considered the checkregular ensemble of [4, Th. 2] and computed whether this ensemble satisfies the conditions of Theorem 1. By varying the parameters ε from 0.05 to 0.3 in increments of 0.05 and the parameter p from 0.05 to 0.95 in increments of 0.05, computations show that the only values of these parameters that did not satisfy the conditions were p = 0.05 when ε = 0.15, 0.2, 0.25, 0.3. Thus, for most parameters checked we conclude that decoders based on these ensembles satisfy the almost-sure scaling rules of Corollaries 2 and Theorem 2.
Comparison to Universal Lower Bounds
We note that this lower bound on directly-implemented Tanner graphs contrasts with the lower bounds in [3] , which show an (χ 2 ln 1/2 (χ)) lower bound for the energy complexity of fully-parallel decoding algorithms as a function of gap to capacity. This result means that non-split-node directlyimplemented LDPC decoders are necessarily asymptotically worse than this lower bound. Of course, it is not known whether the lower bounds of the paper in [3] are tight. It may also be that splitting check nodes in the circuit could overcome this lower bound and get closer to the universal lower bound, but our result does not address this case.
VII. OPEN PROBLEMS
There are still some unanswered questions related to the computational complexity of LDPC decoding, and error control coding in general. We discuss some of these problems below.
• This paper finds an "almost sure" scaling rule for the energy of VLSI LDPC decoders. However, it does not exclude the possibility that there are good LDPC codes whose decoding energy scales more slowly than this (they may simply occur with vanishing probability). Thus, there may be some good LDPC codes with "lower energy" Tanner graphs that still provide good code performance. Intuition suggests that for a given channel a code with a "lower energy" LDPC decoder may have higher probability of error. A general analysis of this fundamental tradeoff is an open question.
• The dependence on maximum node degree of our scaling rules is somewhat surprising. In our definitions of LDPC decoders, we consider a graph that contains the Tanner graph as a minor. It may be that high degree nodes can be split to decrease the minimum bisection width of a graph and thus possibly decrease circuit area. A formal analysis of how vertex splitting might decrease circuit area remains an open question.
• It may be that edges of a Tanner graph that connect vertices are too far away on the decoder can be modified to connect closer nodes, with a small cost in error probability. As well, there exist some algorithmic level modifications [23] that may allow energy savings. A theoretical analysis of such techniques may be informative.
• For a given application, what technique can be used to choose the "energy-optimal" error control code? Can this analysis improve the energy of real communication systems? Ganesan et al. [7] discuss this question and show how, for a reasonable system model, the performance optimal code depends on the circuit technology used and the nature of the channel.
• A theoretical energy analysis for almost any other type of error control code also remains generally unexplored (polar codes [24] and spatially coupled codes [25] may be particularly suitable for this type of analysis). Some early theoretical work on the VLSI energy complexity for polar coding exists in [26] .
• The (n 2 /d 2 max ) lower bound on directly-implemented LDPC decoders can be reached for node degree distributions of bounded degree (see for example our simple construction in [3] ). However, it is not known whether the (n 3/2 /d max ) lower bound for serialized decoders can be reached.
VIII. CONCLUSION
The main contribution of this paper is graph theoretic. We have shown that subject to a mild condition on node degrees, almost all Tanner graphs have a minimum bisection width that scales as (n) where n is the number of left nodes. We have used this to show that almost all directly-implemented LDPC decoders must have circuit area, and thus energy, that scales as n 2 . As well, we show that almost all serialized decoders have energy complexity, per iteration, that scales as (n 3/2 /d max ). We have further presented a general theorem on the area of circuits that instantiate any graph to bound the area of any sufficiently large LDPC decoder generated from a capacity-approaching distribution. Note that our results show that directly-implemented LDPC decoders cannot reach the lower bounds presented in [3] , thus indicating that either the lower bound cited is not tight, or non-split-node directlyimplemented LDPC codes are asymptotically not optimal from this energy perspective. It may also be that both are true, namely that known lower bounds are not tight and LDPC codes are not asymptotically optimal. This remains an open question.
APPENDIX A DEFINITION OF δ(L, R) IN TERMS OF NODE DEGREE DISTRIBUTIONS
In the discussions in this paper, we define a quantity δ in (2) in terms of node degree lists. Given a bipartite graph G and number of left nodes n, number of right nodes m, and node degree lists R and L, one can easily construct the more standard node degree distribution. This definition is adapted from [18] .
Definition 29: For a bipartite graph G, let ρ i be the fraction of right nodes in G of degree i and let λ i be the fraction of left nodes of degree i . Then P = {ρ 1 , . . .} is the right node degree distribution and = {λ 1 , . . .} is the left node degree distribution.
We note that the sum of the entries of both P and defined above must be 1, and that and P are functions of the left and right node degree lists. Since it is more common to consider distributions in terms of their node degree distributions, we will state the quantity δ(L, R) of (2) used in Theorem 1 in terms of and P, the left and right node degree distributions.
Consider a sequence X = {x 1 , x 2 , . . .}, such that x i = 1 and 0 ≤ x i ≤ 1 for each i . Define:
Note that for a graph with right node degree distribution P, there are at least half the right nodes with degree M(P) or greater.
We define
Then it is obvious to see that the quantity δ from (2) can be computed in terms of the graph's node degree distribution as:
APPENDIX B PROOF OF LEMMA 6
Proof: 
implying
This implies
Thus, any product m!n! in which n ≤ m < Z and m + n = Y can be increased by increasing m by 1 and decreasing n by 1 (which still preserves m + n = Y ).
APPENDIX C PROOF OF LEMMA 6 CONTINUED
For the sake of simplicity, we will further loosen these bounds by upper bounding each of the factors a, b, c, and d. Each of these bounds is easily verified: a) We note that 
We can bound |Q a | by summing over our upper bound on Q i, j a :
We of course are not concerned with the probability of a bisection of size a, but rather with the probability of a bisection of size a or less. We denote the set of configurations with a bisection of size a or less by Q * a and since Q * a = a i=0 Q a :
We will now show that the expression in (11) is an nondecreasing function of a for 0 < a ≤ |E|−1 2 . Let the right side of the expression be denoted d a , then it is easy to show that
is greater than or equal to 1. It is easy to show that
Expanding the binomial coefficients in the numerator and denominator and simplifying gives us
This quantity will be greater than or equal 1 if |E|−a ≥ a +1 and |E| − a ≥ σ n − a . Note that a < σn (an assumption of our lemma) implies 2a < 2σ n ≤ |E|. Since a and |E| are both integers, this implies 2a ≤ |E| − 1, from which we can see that the first inequality is satisfied. The second is satisfied by the fact that σ n ≤ |E|. We thus observe that,
We note that the number of possible multi-graphs with our given node degree distribution is at least (δn + σ n)!. We can now bound the probability of the event B * a with:
where we have simply applied the upper bound for the size of B * a of (12). 
Proof: Since lim sup n→∞ f (n) < 0 and the sequence n i increases without bound, then for sufficiently large i , f (n i ) < −c for some c > 0. Then, for sufficiently large i ,
Clearly, lim i→∞ g (n i ) exp (−cn i ) = 0 and because g (n) is positive for large enough n,
for large enough i . The limit thus follows from the squeeze theorem.
Consider first a specific random configuration in the sequence with block length n and node degree distributions that result in values for δ and σ . We will use the bounds of Lemma 7 and then apply well known approximations. Firstly, we use the well-known bounds derived from Stirling's approximation [27, Question 5.8 ] that e (1+n ln( and that
. We use base e as opposed to base 2 in order to conveniently simplify the expressions that follow. Applying these bounds appropriately to the bound in Lemma 7, and grouping terms that grow polynomially into an arbitrary polynomial term g (n) we get that:
We now let a = βn, which will satisfy the condition specified in (3) for β < σ. We substitute H(1/2) = ln 2 and |E| = δn + σ n, and combine polynomial terms into g(n), and then use algebraic manipulation to give us:
By factoring the n term and by applying Lemma 14, we see that the above expression will approach 0 if
where we recall again that the dependence on i in this expression comes from the n terms and the δ and σ terms (whose dependence on i we have suppressed). This is true if
Also note that this is the condition on β given in (6) . To derive the condition in (5), we find the limit as β approaches 0 of this expression, and treating the other terms as constants, giving us:
where we have applied the easily verifiable facts that lim x→0 H x c = 0 and lim x→0 x ln x σ −x = 0 to get rid of the second and third terms in the expression. Thus, if this condition is satisfied, by the definition of a limit, there exists a sufficiently small β in which lim i→∞ P B * βn = 0.
APPENDIX E PROOF OF LEMMA 10
In this proof, adapted with only slight differences from Thompson's proof [1, Th. 2], we show that if a circuit's graph has a C-bipartition width ω, then at least ω 2 /4 grid squares of the circuit are occupied. To do so, we adapt the zig-zag argument of Thompson and construct on the order of ω curves that C-bipartition the circuit, each which must have ω connections crossing the curve, implying that there are close to ω uncounted nodes adjacent to the curve. The details require defining sequences of curves which increase the number of nodes on their left side by 1 each step, which we call the "initial sweep" sequences and the "zig-zag raise" sequences.
Let the grid of a circuit form a Cartesian coordinate system so that all nodes occupied are in the top left quadrant. Draw the smallest rectangle aligned with the circuit grid that encloses the circuit. All points outside this rectangle are considered outside the circuit. The top of this rectangle is the top of the circuit, and the bottom is the bottom of the circuit.
Definition 30: A zig-zag of width a is a curve drawn on a circuit composed of a vertical line starting outside the circuit leading to coordinate (x, y), a horizontal line connecting Definition 31: A curve with a left corner at coordinate (x, y) can be indented at this coordinate by replacing the curve with a new curve where the edges connecting coordinates (x, y +1) to (x, y) and then to (x +1, y) are replaced with two edges connecting (x, y + 1) to (x + 1, y + 1) and (x + 1, y + 1) to (x + 1, y). The point (x + 1, y) is the bottom corner of the indentation.
The reader should refer to Figure 2 to see an example of a curve that is indented, and a labelling of the bottom corner of the resulting indented curve.
Definition 32: An initial sweep of a circuit is a sequence of curves beginning with a width 1 zig-zag with left corner at location (0, 0). The left corner of the zig-zag is successively indented until a zig-zag with left corner at the top of the circuit is obtained. Then, this process is repeated starting with a width 1 zig-zag with left corner at (1, 0). This process is continued until a zig-zag in which all occupied grid squares are to its left is obtained. Figure 3 show an example of the sequence of curves in an initial sweep for a small circuit.
The idea of an initial sweep is that the first curve has no circuit nodes to its left, and eventually the curve has all nodes to its left; in between the amount of nodes to the left of the curve increases by at most 1 each time. This means that there will be a C-bipartition of the circuit induced by one of the curves in the initial sweep, a consequence of property 1 of a zig-zaggable set of bipartitions.
Definition 33: A curve that C-bipartitions the circuit in an initial sweep is the initial curve. We let the left line of the initial curve have x-coordinate .
Note that the left-corner (at location (x, y) of a width a zig zag can be indented, resulting in a new curve. The resulting bottom corner can be indented again in total a times, and the end result is a new zig-zag of width a (where a is positive integer), this time with left-corner at (x, y + 1) (one unit higher than the initial zig-zag). The indenting process can be performed on this new zig-zag. This can be done repeatedly until a zig-zag with left-corner at the top of the circuit is obtained.
Definition 34: We call a curve resulting from such a sequence of indentations an indented zig-zag.
Definition 35: The sequence of curves generated by this sequence of indentations is called a zig-zag raise. The curves corresponding to a zig-zag raise for a small circuit are given in Figure 4 . figure is adapted from [1, Figs. 3 and 4 ]. An example of an initial curve with arrows crossing grid edges that could form possible connections between the left side and the right side. For all but the gray arrow, we can conclude that if a connection exists across the edge the arrow crosses, there is a unique grid square (which in the diagram contains a circle) occupied in the boundary column. Thus, if ω edges must cross the curve, at least ω − 1 nodes in the boundary column must be occupied. (b) An example of an indented zig-zag obtained from a zig-zag raise. The grid squares with circles in them are the grid squares in the boundary columns that are adjacent to an edge of the curve. Arrows cross edges where a connection between the left side and the right side can be made. For each black arrow, if a connection is there in the circuit, then the circle to which the arrow points must contain an occupied node. Note that there are at most 4 crossings that do not involve a node with an boundary column (which are denoted by gray arrows). Thus, if ω crossings must exist across the indented zig-zag, there must be at least ω − 4 circles occupied. Now, consider performing a zig-zag raise starting from a zig-zag of width 3 with left-corner at (0, − 1). Let S 1 be the left-nodes of the first curve of the zig-zag raise and S 3 be the left-nodes of the last curve of the zig-zag raise. We let S 2 be the left-nodes of the initial curve. Since obviously S 1 ⊆ S 2 ⊆ S 3 and S 2 ∈ C, applying Property 2 of zig-zaggable bipartitions (See Definition 23) means that there must be some curve of the zig-zag raise that is a C-bipartition.
By the same argument, for any j , we can construct a width 2 j + 1 indented zig-zag that C-bipartitions the circuit, by starting with a zig-zag with left line at x = − j and performing a zig-zag raise. A possible indented zig-zag that results from this process for j = 2 is given in Figure 5 (b).
Definition 38: Two grid squares of a circuit are connected across a grid edge if they are adjacent at the grid edge and either they contain wires that are connected or one square contains a node attached to a wire in the other square. Such a pair of grid squares is called a connection.
Since each of these curves C-bipartitions the circuit, there must be at least ω connections across each curve. Figure 5 (a) shows that the initial curve must have at least ω − 1 grid squares occupied in its boundary column.
As well, Figure 5 (b) shows a width 5 indented zig-zag and shows that if ω edges must cross the indented zig-zag, then there must be at least ω − 4 grid squares in the boundary column occupied. This is because for all but 4 of the possible connections, if they are connected then this implies a unique grid square in the boundary column is occupied. It is then easy to generalize that an indented zig-zag of width 2k + 1 must have at least ω − 2k occupied nodes in its boundary columns.
Since the boundary columns of each of the bipartitions constructed do not intersect, summing up the lower bound on the number of grid squares occupied implies that the number of grid squares occupied in the circuit is bounded as:
The last inequality flows from the fact that either . We see in both cases that the expression in (15) is greater than ω 2 /4 for ω > 2. As Thompson observed in his proof of the theorem, the case when ω = 1 is trivial.
