Energy-efficient Decoders for Compressive Sensing: Fundamental Limits
  and Implementations by Li, Tongxin et al.
1Energy-efficient Decoders for Compressive
Sensing: Fundamental Limits and
Implementations
Tongxin Li1 Mayank Bakshi1 Pulkit Grover2
1The Chinese University of Hong Kong 2Carnegie Mellon University
Abstract
The fundamental problem considered in this paper is “What is the energy consumed for the implemen-
tation of a compressive sensing decoding algorithm on a circuit?”. Using the “information-friction” framework
introduced in [1], we examine the smallest amount of bit-meters1 as a measure for the energy consumed by
a circuit. We derive a fundamental lower bound for the implementation of compressive sensing decoding
algorithms on a circuit. In the setting where the number of measurements scales linearly with the sparsity
and the sparsity is sub-linear with the length of the signal, we show that the bit-meters consumption for
these algorithms is order-tight, i.e., it matches the lower bound asymptotically up to a constant factor. Our
implementations yield interesting insights into design of energy-efficient circuits that are not captured by
the notion of computational efficiency alone.
Keywords: enegy-efficiency, compressive sensing, circuit implementation
I. Introduction
Compressive Sensing has emerged as an attractive paradigm in recent years [13, 14]. Motivated by
applications where processing the dataset without exploiting the underlying sparsity is prohibitively
expensive, compressive sensing aims to reduce the cost of processing through algorithms that take sparsity
into account. Initial work on compressive sensing showed that the number of measurements required to
sketch a signal of length n and sparsity k is O(k log n)[13, 14]. Subsequently, computationally efficient
algorithms for this problem have also been discovered [15–21]. The fastest of these algorithms uses a
peeling type decoder and have running time O(k) with O(k) measurements [20, 21].
In this paper, we adopt an energy-centric view of compressive sensing. Our motivation comes from
applications such as ad-hoc wireless networks [22], where decoding energy is of critical importance. In these
applications, since the decoder of often a batter-powered device, processing the received measurements
to obtain the desired reconstruction is a fundamentally limiting aspect of the system design. Notably, the
computationally efficient algorithms of [20, 21] are no longer order-optimal when the decoding energy
is the metric of interest. Therefore, we ask the question “What is the smallest amount of energy required to
decode a signal from its compressed measurements?”
As an exploratory work, we examine the problem in the information-friction framework. This framework
was introduced in [1] for finding a trade-off between the energy consumed in encoding/decoding processes
1The “bit-meters” metric was first proposed in [1], as an alternative to the VLSI model introduced by Thompson and others
in [2–7] (and explored further in [8–12]) for measuring the energy consumed in a circuit.
ar
X
iv
:1
41
1.
42
53
v4
  [
cs
.IT
]  
16
 Fe
b 2
01
5
2and transmission power in a communication system. In practice, it’s reasonable for us to relate bit-meters
with energy consumed for the decoding process, as [1] elaborated through multiple different scenarios. We
first show that for a fixed precision Q, the required bit-meters (energy) for decoding a compressed signal
can be no smaller than Ω
(√
nk
logn
√
log 1
P blke
)
asymptotically at the regime m = Θ (k) and k = n1−β where
constant β ∈ (0, 1). We show that this asymptotic lower bound is order-tight by giving two multi-stage
algorithms for each the bit-meters is O
(√
nk
logn
√
log 1
P blke
)
.
The rest of the paper is organized as follows. We begin with describing our model in Section II. The
main results of the paper stated in Section III. The key ideas in the proofs of these results are outlined in
Section IV. Finally, the main parts of the proofs are described in Appendices A-C.
II. Background and Definitions
In this section, we formalize the models used in this paper.
A. Compressive Sensing
For compressive sensing, the input vector X ∈ Rn is a real-valued vector of length n. The linear encoding
process is represented by an encoding matrix A ∈ Cm×n. The vector Y = AX ∈ Cm of length m is the
corresponding output vector. Based on Y, a recovery vector Xˆ ∈ Rn is decoded using a decoding algorithm.
For sparsity, we consider two basic models—-probabilistic and combinatorial. We assume the length of
input vector is considerably large. Therefore, the two models are asymptotically equivalent2. We adopt
the probabilistic one with independent property to simplify calculations for the ease of analysis. For the
purpose of presentation, the combinatorial model is used to give us a concise insight into both the lower
and the upper bounds.
Definition 1 (Sparsity Model (n,m,p)). A bounded length-n “compressible” vector X ∈ Rn is an Input Vector
whose each entry Xi ∈ R satisfies |Xi| ≤ U (with a constant upper bound U ≥ 0) and has probability3 p to be
non-zero.
Definition 2 (Sparsity Model (n,m,k)). A bounded length-n “compressible” vector X ∈ Rn is the k-sparse4
Input Vector if it contains exactly k non-zero entries Xi ∈ R satisfying |Xi| ≤ U (with a constant upper bound
U ≥ 0).
In this paper, we focus on the asymptotic regime where k = k(n) and p = p(n) such that k = ω(1)∩ o(n)
and p = ω(1/n) ∩ o(1), i.e., both k and np grow sub-linearly with n. Our theorem for upper bounds in
section III is restricted in the sub-linear regime k = n1−β where β ∈ (0, 1).
We define the average block error probability based on quantization and a given norm || · ||`q of interest for
0 ≤ q ≤ ∞.
Definition 3 (Reconstruction Error, Precision, Average Block Error Probability). Given the Input Vector X
and Recovery Vector Xˆ, the Reconstruction Error is defined as ||X− Xˆ||`q and the Relative Error is defined
further as ||X− Xˆ||`q/||X||`q . Let Q denote the Precision, i.e., the required number of bits for reconstructing the
2Note that by the strong law of large numbers (see, for instance the excellent textbook [23]), or even a weaker statement by
Chernoff bound in [24], the number of non-zero entries of the input vector X in the probabilistic sparsity sensing model will
be bounded in a constant range containing k (e.g. [k/2, 3k/2]) with an exponential probability of error e−Θk, which is actually,
negligible compared with the error probability we could achieve using the algorithms in Section IV-B with the upper bound on
bit-meters (see Definition 9) provided in Theorem 2 at Section III.
3We assume p = o(1) as the sparse assumption for the Sparsity Model (n,m,p).
4We assume k = o(n) as the sparse assumption for the Sparsity Model (n,m,k).
3input vector X, then the Average Block Error Probability is defined by P blke = Pr(Eblk = 1) where Eblk = 1 if
the relative error satisfies ||X− Xˆ||`q/||X||`q > 2−Q; Otherwise, Eblk = 0.
B. Implementation Model
A decoding circuit has two functionalities—-storing the output vector and processing it to obtain recovery
vector.
We think of a decoding circuit as a ”graph” whose nodes from a subset of points of a two dimensional
lattice Λ ⊂ R2. Each node can both store one real number and perform a computation. Nodes are connected
through undirectional links that represents the wiring in the circuit. Furthermore, each node on the lattice
Λ (i.e.the circuit) has a constant packing radius ρ(Λ) > 0 to ensure sufficient distance between the nodes
for real implementation5. Considering the coherence with [1], we define the generalized circuit model
succinctly using the same order of definitions as [1] did.
Definition 4 (ρ–Lattice ). A lattice Λ is a ρ–Lattice if it spans R2 and the packing radius ρ(Λ) > 0 is at least
ρ.
Definition 5 (Substrate). A Substrate V is a compact subset of R2.
Definition 6 (Grid (Λ, V )). A Grid (Λ, V ) is a sub-lattice ΛV defined as the intersection of a Lattice Λ and a
Substrate V such that ΛV = Λ ∩ V .
Definition 7 (Decoding Circuit, Sub-Circuit, Computational Nodes, Input-nodes, Output-nodes). The
Substrate V together with a collection S ⊂ ΛV of points (called Computational Nodes, or simply Nodes)
inside the Grid ΛV , is called a Decoding Circuit, and is denoted by DecCkt = (V, S). A Sub-Circuit denoted
by SubCkt = (V sub, S sub) is a bounded subset V sub of V together with a subset of Computational Nodes
S sub ⊂ Λ ∩ V sub. The Output-nodes are defined as the nodes for storing the Recovery vector Xˆ, and the
Input-nodes are defined as the nodes for storing the received Output vector Y. Nodes can accomplish noiseless
communication and with each other through undirectional Links, within which binary strings of messages are
transmitted. Nodes can also implement arithmetic calculations.
Note that we asumme that for each node (both input-node and output-node), it stores exactly one entry6
of the corresponding vector. Moreover, we assume timing is available for the output-nodes which means
only non-trivial data are required to be transmitted and if during a period of time no data was received,
the output-nodes are able to automatically declare the corresponding entries in the recovery vector zeros7.
Moreover, we assume that each node, including input and output nodes can behave as a bridge for
communication between other nodes. Thus, any computation that can be performed by an intermediate
node can be, in principle, be performed by an input or output node. Therefore, exactly n+m nodes are
sufficient for any decoding circuit. We assume that the computational nodes can communicate noiselessly
with each other through unidirectional links as defined above.
Definition 8 (Communication Distance). Given two nodes x and y, the Communication Distance D (x,y) is
the Euclidean distance ||x− y||2 between nodes x and y.
5While we describe a more general framework here, for most of our results, it suffices to restrict our attention to square lattices.
6It is reasonable to assume the one-to-one correspondence between nodes and entries. Another possible model may be a
one-to-one correspondence between bits and nodes which is a possible direction in the future.
7The assumption about timing is critical. In fact, if no timing is available, every output-node is required to receive at least 1 bit
with a constant communication distance, which implies a lower bound on bit-meters Ω (n). This violates the spirit of compressive
sensing as we want the energy (bit-meters) required to be sub-linear in n in light of the sparsity on the signal.
4Let S ⊂ ΛV be a collection of nodes. The number of bits communicated between x,y ∈ S is denoted by
B (x,y) ≥ 0. We then define our fundamental measure of energy below.
Definition 9 (Bit-meters). The Bit-meters or µ (·) is a non-negative real-valued measure from the power set
over collections of nodes P (S) to the extended real number line satisfying the following properties:
1) µ(∅) = 0.
2) For every x,y ∈ S, µ (x,y) = B (x,y)×D (x,y).
3) For every subset E ⊂ S, µ (E) = ∑x,y∈E B (x,y)×D (x,y)/2.
4) For every Decoding Circuit (or Sub-circuit) DecCkt = (V, S), µ (DecCkt) = µ (S).
Our main goals are to obtain asymptotic lower and upper bounds on bit-meters for decoding circuits
implementing compressive sensing algorithms, i.e., find the bounds on µ (DecCkt) as n→∞. Note that
all the discussions are based on the implementation model summarized below.
Definition 10 (Implementation Model (ρ,µ) ). Implementation Model (ρ, µ) denotes the pair of Decoding
Circuit DecCkt defined on a ρ–Lattice and Bit-meters µ.
III. Main Results
Let R = m/n be the rate of compressive sensing. As our first result, we obtain the following general
lower bound on bit-meters.
Theorem 1 (Lower Bound). Consider the Sparsity Model (n,m, p). For any encoding matrix A and decoding
circuit DecCkt implemented on the Implementation Model (ρ, µ), we have:
1) The average block error probability P blke ≥ 0.
2) The bit-meters
µ(DecCkt) ≥ ρC0
24
√
2
n
(
1
4
−R
)
pQ
√(
1
2R
− 1−R
2 + 2R
(1 +R)2
)√
logp 10P
blk
e . (1)
As a consequence of the above theorem, we derive the following corollary that states the asymptotic
scaling of the lower bound with respect to n. This serves as a benchmark for our algorithm design
subsequently.
Corollary 1 (Scaling for Lower Bound). Consider the Sparsity Model (n,m, k). Assume the precision
Q = Θ (1). For any encoding matrix A and decoding circuit DecCkt implemented on the Implementation
Model (ρ, µ), we have:
1) The average block error probability P blke ≥ 0.
2) The bit-meters
µ(DecCkt) = Ω
(√
k2
R log n
min
(√
k,
√
log
1
P blke
))
.
For the regime m = Θ(k), k = n1−β where β ∈ (0, 1), and P blke = e−Ω(m) of our interests, we derive the
following upper bound which matches the order of bit-meters in Corollary 1.
Theorem 2 (Upper Bound). Consider the Sparsity Model (n,m, k). Let k = n1−β for some β ∈ (0, 1), there
exist an encoding matrix A and a decoding circuit DecCkt implemented on the Implementation Model (ρ, µ)
such that
5Fig. 1. This conceptual graph illustrates the idea of the derivation of lower bound. The figure contains two types of sub-circuits:
locally decodable sub-circuits and non-locally decodable sub-circuits. In details, we have the number of sub-circuits L = 9 and m = 17
thus 2m/L = 38/9 and using Lemma 3 the classifications of sub-circuits types are provided in this figure using different colors.
As a result, the bit-meters are bounded from below as Theorem 1 states.
1) The average block error probability P blke = O
(
1/
√
k
)
.
2) The number of measurements m = Θ(k).
3) The precision Q = Θ (1).
4) The bit-meters µ (DecCkt) = O
(√
nk
)
.
As a consequence of Corollary 1 and the above upper bound, we state the following corollary as a
conclusion.
Corollary 2 (Order-tight Bound). Consider the Sparsity Model (n,m, k). Let k = n1−β for some β ∈ (0, 1)
and m = Θ (k). There exist an encoding matrix A and a decoding circuit DecCkt implemented on the
Implementation Model (ρ, µ) such that
µ(DecCkt) = Θ
(√
nk
log n
√
log
1
P blke
)
.
In the next section, we give an overview of the proofs. The detailed proofs can be found in Appendices A-
C.
IV. Main Ideas
A. Lower Bound
In this section, we describe the main ideas from the derivation of the lower bound. Using the ”Stencil-
partition” idea introduced by [1], we divide the entire circuit into several sub-circuits8 and find the
minimal number of bits communicated between each sub-circuit.
Definition 11 (Stencil-partition). For point any u ∈ Λ, a Stencil(λ, η,u) on Implementation Model(ρ(Λ), µ)
consists of the following:
1) A Sub-lattice Λ0 ⊂ Λ with order of quotient |Λ/Λ0| = λ.
2) The outer parts of sub-circuits induced by the cosets u + Λ0 = {u + v : v ∈ Λ0}.
3) The inner parts of sub-circuits induced by scaling each outer part using a fractional parameter η.
8Note that since we define the circuit using lattice framework, it natural to define Stencil-partition using sub-lattice. Hence we
call each sub-circuit ”Parallelepiped” sometimes in the lemmas at Appendix A
6Fig. 2. This conceptual graph illustrates one possible centralized arrangement of input-nodes and output-nodes on a implementation
circuit, whereby the lower bound on bit-meters could be derived. Within the decoding circuit, all k input-nodes are located in the
central part and all n output-nodes are put in the surrounding area of the central part.
Let the i-th sub-circuit have mi input-nodes and ni output-nodes within the outer part of sub-circuit. Let i-th
sub-circuit have minsidei input-nodes and ninsidei output-nodes within the inner part of sub-circuit9.
Figure 1 shows the geometric ideas. We first use a stencil to divide the decoding circuit into several
sub-circuits. Each sub-circuit consists of an inner part and outer bound (see Figure 11 for more details).
Next, based on the ratio of numbers of input-nodes and output-nodes inside the sub-circuit, we define two
types of sub-circuits: locally decodable sub-circuits and non-locally decodable sub-circuits. Then we argue
that the fraction of inner sub-circuits whose output-nodes can be fully decoded using the input-nodes
within itself is a constant smaller than one. We mainly focus on the second type of sub-circuits, since
these sub-circuits do not have enough information to fully decode all output-nodes from the input-nodes
within the sub-circuits. By using Fano’s inequality [25] we finally deduce that the inner parts of non-locally
decodable sub-circuits must communicate with other sub-circuits, giving a bound on bit-meters stated in
Theorem 1.
B. Upper Bounds
In this section, we explore different constructions for the implementation-circuits of compressive sensing
algorithms. The basic issues here are the locations of the two types of nodes (input and output) and
how they communicate with each other. We consider two types of algorithms—-algorithms with centrally
located input-nodes, and algorithms involving distributed arrangement of nodes.
In our regime of interests, i.e., m = Θ(k), the later design always dominates the former one. This gives
us the insights that local-decoding helps significantly in reducing the energy consumed and approaching
an order-optimal performance.
1) Centralized-Decoding Algorithms: Centralization of input-nodes is perhaps the simplest construction
possible. An interesting intuition here is that this design is better for those algorithms that have a relatively
higher decoding complexity and lager number of measurements and thus, require a lager number of
input-nodes talk to each other frequently. For such algorithms, centralization improves the performance in
terms of the error probability by enabling greater cooperations between nodes. Figure 2 shows the idea
of centralization.
However, for algorithms with relatively sparse encoding matrices such that the average block error
probability P blke is of the order P blke = e−Ω(m), we claim that the centralized design involves a “gap”
9If any computational node lies on the boundary of two outer parts of sub-circuits, then it is arbitrarily included in one of them.
7Fig. 3. This graph illustrates the flow of distributive-decoding algorithms. We decode local information stage by stage and end
up with a clearing stage to improve performance. As a result stated in Theorem 2, under our assumptions and fix our interested
regime m = Θ(k), the bit-meters are bounded by Θ
(√
nk
)
.
between the bit-meters consumed and the scaling lower bound stated in Corollary 1 as the following
arguments indicate.
Consider the Sparsity Model (n,m, k), for any encoding matrix A and decoding circuit DecCkt implemented
on the Implementation Model (ρ, µ), assume precision Q = Θ (1). Let SInput be the set containing all input-
nodes and SOutput be the set containing all output-nodes. If we centralize the input-nodes to make sure that
for any x ∈ SInput, there is a positive ρ = Θ (
√
m) such that the ball with packing radius ρ0 contains SInput.
Since there are n output-nodes, on average, chosen x ∈ SInput and y ∈ SOutput uniformly at random, the
expected communication distance E[D (x,y)] = Θ(√n). Since at least Θ (k) bits of information have to be
transmitted from input-nodes to output-nodes, we obtain a lower bound on bit-meters µ (DecCkt) = Ω (k
√
n).
Now based on our Corollary 1, there are two cases—-average block error probability P blke = e−Ω(m) and
P blke = e
−O(m). For the first case, we have µ(DecCkt) = Ω
(
k
√
nk
m logn
)
; for the second case, we have
µ(DecCkt) = Ω
(
k
√
nm
logn
√
log 1
P blke
)
which implies µ(DecCkt) = Ω
(
k
√
n
logn
)
. Therefore, for both cases,
we conclude that the centralization of input-nodes is not able to achieve an order-tight upper bound on
bit-meters.
2) Distributive-Decoding Algorithm: We propose two energy-efficient compressive sensing algorithms
shown in Figure 3. Both use the idea of local decoding to reduce the bit-meters required. Instead of
arranging all the input-nodes in the central part of the circuit, we distribute them throughout the circuit
with carefully designed algorithms. As an intuition, since a large fraction of the communication is carried
only in a small region, the consumed energy is reduced significantly. Leaving the formal definitions for
Section V and Appendix B. We give brief descriptions of the algorithms in the next section V using
a stage-by-stage manner accompanied by schematic graphs. The analysis of the performance is given
Appendix C.
V. Algorithms Description
A. Chain Algorithm (CA)
We describe the Chain Algorithm (CA) stage by stage as Figure 4 shows.
1) First Stage: The input vector of length n is first divided into CCAk groups which are compressed
separately. CCA is a constant chosen so as to achieve a desired error probability P blke with details provided
in the Appendix C. Each group contains n/CCAk entries. The decoding process is performed independently
for each group. Thus, the corresponding decoder only needs to process these groups locally, i.e., it only
needs to communicate within the local sub-circuits for the corresponding groups. Intuitively, this method
8Fig. 4. This figure shows first three stages of Chain Algorithm (CA), with the number of nodes contained in each sub-circuit
increase from 4 to 8 and 16.
Fig. 5. With the assumption CCA = 1 and c = 3 this schematic graph for the 1st stage illustrates the division of input vector and
the construction of encoding matrix A. In the first box with color of calamine blue, we exemplify part of the Identification Phase
by a group of five left nodes and three right nodes. Note that for each node on the right, the weights of edges connected to
it should be made unique, which is the requirement for Identification Phase. In Verification Phase, the connection is kept as the
same whereas the weights eιΘ
V
i,j (Here we keep using the same notations in [20], where i and j are indexes for entries in the
encoding matrix A, ι denotes the positive square root of −1 in order to avoid confusions) for non-zero entries in the encoding
matrix A are chosen uniformly in [0, pi/2]. The constructions are elaborated in section V-C for set-up of encoding matrix.
leads to savings in energy since the distances for communication are reduced greatly for most of the
communication between nodes. We use a measurement constant c number of measurements for each group.
For each group, our decoding algorithm aims to resolve the non-zero entry if it contains exactly one
non-zero entry. As a result, a constant proportion ρ (depending on CCA and c) of the total k non-zero
entries can be located and solved within Q-bits of precision with a high probability. Figure 5 illustrates
the partition and a possible way to construct the encoding matrix for finding the single non-zero entry in
the input vector.
Define φ = d1/(1− ρ)e as a parameter for the remaining stages.
2) Second Stage up to logφ(k/ log2 k)-th Stage: For i > 1, in the i-th stage we combine φ = d1/(1−ρ)e of the
groups coming from the (i− 1)-th stage together. Note that in the i-th stage each group is only processed
in a local region with area of order approximately φi−1n/CCAk. Thus, in the i-th stage, CCAk/φi−2 groups
from the (i− 1)-th stage merge into CCAk/φi−1 new groups. Each new group contains φi−1n/CCAk entries
of the input vector. Next the decoding algorithm from the first stage V-A1 is implemented on each new
group totally, i.e., like in the first stage, the corresponding decoders need to handle the information
for each group only. This algorithm continues up to logφ(k/ log2 k) stages using the same measurement
constant c as the number of measurements for each group. As a result, in the i-th stage approximately ρ
proportion of the total CCAk/φi−1 non-zero entries can be located and solved in Q-bits precision with a
high probability which is of order 1−√1/k. Figure 6 illustrates the combination process.
9Fig. 6. The schematic graph for the 2nd stage illustrates the combination processes in the coming stages.The groups in the left
block with gray color are combined two-by-two (assume ρ = 1/2 and set φ = 2) to form the new groups in the right block with
light blue color. Note that in a same way as the first stage exemplifies in Figure 5, the two Identification Phase and Verification
Phase are also implemented for each of the new group with the same number of measurements c as the previous stages.
Fig. 7. The schematic graph for the last stage illustrates the clearing process. This stage ends up the previous stages using
Σ(log2 k) measurements, and sue global communications between nodes for decoding the entire length-n input vector such that
all the remaining non-zero entries are resolved with a high probability of order
√
1/k.
3)
(
logφ(k/ log2 k) + 1
)
-th Stage (Clearing Stage): After logφ(k/ log2 k) stages, the algorithm stops forming
group of nodes and, instead globally decodes the remaining unsolved non-zeros of input vector X given the
information from the previous stages V-A1 and V-A2 along with Θ(
√
k) new measurements. In contrast to
the previous stages, each computed value is potentially communicated across the entire decoding circuit.
This helps improve the performance with respect to the error probability. Overall, the algorithm achieves
an average block error probability P blke of order
√
1/k and consumes bit-meters of order
(√
nk
logn
√
log 1
P blke
)
.
B. Shotgun Algorithm (SA)
Next, we describe the Shotgun Algorithm (SA) stage by stage as Figure 8 shows.
1) First Stage: In a similar way to the Chain Algorithm V-A, the input vector of length n is first divided into
CSAk groups which are compressed separately. CSA is a constant for ensuring a desired error probability
P blke . Each group contains n/CSAk entries. The decoding process is performed independently for each
group. Thus, the corresponding decoders only need to decode these groups locally, i.e., it only needs to
communicate within the local sub-circuits for the corresponding groups. The number of measurements
for each group equals the measurement constant c′. As a result, a constant proportion σ of the total k
10
Fig. 8. This figure shows first three stages of Shotgun Algorithm (SA), with the number of nodes contained in each sub-circuit
increase from 4 to 8 and 16. Note the difference between SA and CA is that the sub-circuits for SA in each stage is chosen
uniformly at random instead of by combining previous sub-circuits.
non-zero entries (i.e., a total of σk entries) can be located and solved within Q-bits of precision with a
high probability.
Define ϕ = d1/(1− σ)e as a parameter for the remaining stages.
2) Second Stage up to logϕ(k/ log2 k)-th Stage: In the i-th stage we combine ϕ = d1/(1− σ)e of the groups
coming from the (i− 1)-th stage together by choosing them uniformly at random. Note that the combination
is performed independently for each stage, and in the i-th stage the area spanned by each group is of
order approximately ϕi−1n/CSAk. Thus, in the i-th stage, CSAk/ϕi−1 new groups are formed. Each new
group contains ϕi−1n/CSAk entries of the input vector. The decoding algorithm for each group is the
same as that of the first stage of the Chain Algorithm (CA) of section V-A. Like the first stage of CA,
the corresponding decoders need to handle the information locally. The algorithm continues up to the
logϕ(k/ log2 k)-th stage. As a result, at the end of logϕ(k/ log2 k)-th stage approximately log k unsolved
non-zero entries remain with a high probability which is of order 1−√1/k.
3)
(
logϕ(k/ log2 k) + 1
)
-th Stage (Clearing Stage): After logϕ(k/ log2 k) stages, the algorithm stops combi-
nation, and globally decodes the remaining unsolved non-zeros in the input vector X given the information
from the previous stages V-A1 and V-A2. Similar to the last stage of the Chain Algorithm, as all the
information is potentially communicated across the entire decoding circuit, the error probability is
decreased. In fact, overall, the algorithm achieves an average block error probability P blke of order
√
1/k
while consuming bit-meters of order
(√
nk
logn
√
log 1
P blke
)
.
To summarize, the first Chain Algorithm combines local sub-circuits sequentially and accumulates the
information together to resolve the input vector X. While the performance of this algorithm is better than
the Shotgun Algorithm (SA), a drawback of the Chain Algorithm is that as n increases, the computation
required from the central nodes within each local sub-circuit also increases. In contrast, for SA, except
for the clearing stage, every node has the same functionality and the decoder merely needs to decode
a possible single non-zero entry in each local sub-circuit. The performance of CA and SA are stated in
Theorem 2. Note that it matches the lower bound in Corollary 1 when m = Θ (k).
C. Choice of Encoding Matrices
For our Chain Algorithm (CA) introduced in Section V-A, the encoding matrix ACA is constructed as
shown in Figure 9. Let mi denote the total number of measurements for the i-th stage.
One possible way to construct the entires of the encoding matrix is by choosing c = 2 and setting ~ai,j = 0
if j-th item is not inside i-th sub-circuit, otherwise ~ai,j = [eιΘ
I
i,j , eιΘ
V
i,j ]T with ΘIi,j = piij/
(
2n2
)
and ΘVi,j
11
Fig. 9. This figure shows in details the construction of encoding matrix ACA where φ > 1 is a constant defined at section V-A
and the number of stages is M = mlogφ(k/ log2 k) + 1.
chosen uniformly at random from [0, pi/2] where ι denotes the positive square root of −1. Therefore, the
total number of measurements m follows m = c
∑M
i=1mi for some measurement constant10.
For our Shotgun Algorithm (SA) introduced in section V-B, the encoding matrix ASA is generated
in a similar manner as above. The only difference between ACA and ASA is that the sub-matrices Ai
(i = 1, 2, . . . ,M ) are no longer related like ACA in Figure 9 since the covering sub-circuits are chosen
randomly.
D. Decoding Steps
For the Chain Algorithm (CA), the decoding circuitDecCkt storesm received output entries ~Y1, ~Y2, . . . , ~Ym
in m input-nodes. Suppose CCA = 1, the decoding starts from ~Y1 by checking each group (starting with
the group {X1, X2, . . . , Xn/k} to determine if the group contains at most one non-zero entry. If so, the
recovery vector is updated, otherwise, the involved input-nodes transmit the corresponding entries of output
vector to the input-nodes of a larger sub-circuit containing the current one. This continues until there is
a feasible solution for solving the resulting linear equations. The entire process is written formally as
Algorithm 1 in the Appendix B.
On the other hand for Shotgun Algorithm (SA), the input-nodes need not to pass information to
subsequent stages. The decoding circuit DecCkt stores m received output entries Y1, Y2, . . . , Ym in m
input-nodes. Similarly to CA, we suppose CSA = 1 and the first decoding step starts with Y1 by checking if
the group {X1, X2, . . . , Xn/k} contains at most one non-zero entry. After that, in the later stages, the size
of the sub-circuits increases by a constant ϕ for each stage and input-nodes check if the resulting group
contains at most one non-zero entry. The entire algorithm is written formally as Algorithm 2 shown in
Appendix B. The analysis of CA and SA are provided in Appendix C and follow the analysis from [20].
Appendix A
Proofs of Lower Bound
Let the packing density of a lattice Λ that spans R2 be σ(Λ) = piρ(Λ)
2
det(Λ) with det(Λ) denoting the volume of
fundamental parallelepiped of Λ. Base on Definition 11, we derive the following lemma stating the relationship
10The measurement constant c can be toned to achieve a desired block error probability P blke by using some additional measurements
to verify the linear equations.
12
Fig. 10. The Proof Map of Lower Bound.
between the paking radius of Λ and Λ0.
Lemma 1. Let L denote the number of sub-circuits by Stencil-partition. Then the packing radius ρ(Λ0) of the
outer part of sub-circuits is given by
ρ(Λ0) =
√
σ(Λ0)(n+m)
σ(Λ)L
ρ(Λ).
Proof: By Definition 10 of the Implementation Model (ρ(Λ), µ), the lattice Λ has a packing radius
ρ(Λ) > 0, and since it is a 2-D lattice, the packing density σ(Λ) is given by
σ(Λ) =
piρ2(Λ)
det(Λ)
.
Similarly for the sub-lattice Λ0, we also have a positive packing density σ(Λ0) > 0 such that σ(Λ0) =
piρ2(Λ0)/det(Λ0). Moreover, the cardinality of the quotient Λ/Λ0 equals to det(Λ0)/ det(Λ). Since L = n+m|Λ/Λ0| ,
we conclude that ρ(Λ0) =
√
σ(Λ0)(n+m)/σ(Λ)Lρ(Λ).
Lemma 2. Consider the Implementation Model (ρ, µ). For any fractional parameter η > 0, there exists a point
u ∈ Λ for Stencil (λ, η,u) such that the number of output-nodes covered by the Stencil is bounded from below by
L∑
i
ninsidei ≥ n (1− 2η)2 . (2)
Proof: Note that n (1− 2η)2 is the expected number of output-nodes covered by the Stencil, if the point
u ∈ Λ is uniformly distributed. Thus there exists at least one point u that satisfies the bound in (2).
Lemma 3. Consider the Implementation Model (ρ, µ). Let L be the number of sub-circuits. For any fractional
parameter η > 0 and any choice of u ∈ Λ for Stencil (λ, η,u), the number of sub-circuits satisfying mi ≤
min{2m/L, ni} is larger or equal to min{(1−R) / (1 +R) , 1/2}L where R is the rate of compressive sensing
13
defined by R = m/n.
Proof: Assume n > m, first we choose the point u ∈ Λ of the Stencil such that the location of output-
nodes satisfies (2). Then, for this fixed choice of u ∈ Λ, we consider the worst location of the input-nodes
which minimizes the fraction of input-nodes satisfying mi ≤ min (2m/L, ni). We call a sub-circuit non-
locally decodable if mi ≤ min (2m/L, ni) and locally decodable otherwise11. Figure 1 gives an example of locally
decodable and non-locally decodable sub-circuits. First, we note that the fraction of sub-circuits satisfying
mi ≤ 2m/L is 1/2. Similarly, the fraction of sub-circuits satisfying mi ≤ ni is at least (1−R) / (1 +R).
Thus, the fraction of sub-circuits satisfying mi ≤ min (2m/L, ni) is at least min{(1−R) / (1 +R) , 1/2}.
Below we prove the claim explicitly.
Note that the number of nodes in each sub-circuit is ni + mi = n+mL . Let α = min{2mL , n+m2L }. Since
mi ≤ α for each non-locally decodable sub-circuit, it satisfies mimi+ni ≤ 2mm+n and mimi+ni ≤ 12 .
Now we consider two cases independent with the sub-circuits:
1) m < n3 ⇒ A SubCkt is non-locally decodable if mimi+ni ≤ 2mm+n ;
2) m ≥ n3 ⇒ A SubCkt is non-locally decodable if mimi+ni ≤ 12 .
Hence the fraction fNLD of non-locally decodable sub-circuits satisfies fNLD ≥ min(1/2, (1−R)/(1 +R)).
Next we state a lemma derived from Fano’s inequality [25].
Lemma 4. If at most H (X) /3 bits of information are available to obtain an estimate Xˆ of a variable X with
entropy H (X), then Pr
[
Xˆ 6= X
]
≥ 1/9.
Proof: Similar to the proof of Fano’s inequality [25], we define the error random variable E(X, Xˆ) as
follows:
E(X, Xˆ) =
1 if Xˆ = X0 if X 6= X.
Since the input vector X, the output vector Y and the recovery vector Xˆ form a Markov chain
X→ Y → Xˆ, we get H (X) = H
(
X|Xˆ
)
+ I
(
X; Xˆ
)
≤ H
(
X|Xˆ
)
+ I (X;Y) ≤ H
(
X|Xˆ
)
+H (Y).
Thus,
H
(
X|Xˆ
)
= H
(
E(X, Xˆ),X|Xˆ
)
= H
(
E(X, Xˆ)|Xˆ
)
+ Pr
[
E(X, Xˆ) = 0
]
H
(
X|Xˆ, E(X, Xˆ) = 0
)
+ Pr
[
E(X, Xˆ) = 1
]
H
(
X|Xˆ, E(X, Xˆ) = 1
)
≤ hb (Pe) + PeH (X) .
Given the available information I of at most H (X) /3 bits, the error probability Pe := Pr
(
Xˆ 6= X
)
is
11Actually mi ≤ min (2m/L, ni) is merely a sufficient condition for a sub-circuit to be non-locally decodable, however, we will
use the term “non-locally decodable” to imply that the sub-circuit satisfies mi ≤ min (2m/L, ni).
14
lower bounded by
PeH (X) + hb (Pe) ≥ H (X)−H (Y)
≥ H (X)−H (X) /3
= 2H (X) /3,
where hb (·) on the LHS is the binary entropy function (will also appear in the later parts).
Then since n > m > 1 we have
Pe ≥ 2H (X) /3− 1
H (X)
≥ 2
3
− 1
2
>
1
9
.
Lemma 5. Consider the Sparsity Model (n,m, p). For every decoding circuit DecCkt on the Implementation
Model (ρ, µ), if the relative error satisfies ||X−Xˆ||`q/||X||`q ≤ 2−Q, then there exists a constant C0 = C0(X, q) <
1 such that asymptotically at least C0npQ bits are required by all the output-nodes.
Proof: For each i, let Qi denote the number of bits of quantization required to distinguish Xˆi and Xi
for each entry Xˆi. Thus, we have 2−Qi−1 ≤ |Xi−Xˆi||Xi| ≤ 2−Qi+1 for all i. Let ||X− Xˆ||`q/||X||`q ≤ 2−Q. Hence,(
k∑
i=1
|Xi − Xˆi|q
)1/q
≤ 2−Q
(
k∑
i=1
|Xi|q
)1/q
,
which implies that (
2−q(Qi+1)
∑k
i=1 |Xi|q∑k
i=1 |Xi|q
)1/q
≤ 2−Q.
By assumption, 0 ≤ |Xi| ≤ U for each i for some constant U ≥ 0. By Jensen’s inequality (see, for instance
in the book [26]), we get
|U |q
k∑
i
Qi ≥
k∑
i=1
|Xi|qQi ≥
k∑
i=1
|Xi|qQ.
Thus
∑k
i Qi ≥ C0kQ with C0 =
∑k
i=1 |Xi|q/k|U |q < 1. The asymptotic result follows as n→∞.
Next we combine the lemmas above to give a result connecting bit-meters and average block error probability
P blke . As mentioned before, in this lemma we call the inner part of a sub-circuit the inner parallelepipeds
and the outer part the outer parallelepipeds respectively.
Lemma 6. Consider the Sparsity Model (n,m, p). Let SubCkti be a sub-circuits with mi ≤ min (2m/L, ni) that
is obtained via stencil-partitioning a decoder circuit DecCkt implemented on the Implementation Model (ρ, µ).
If µ(SubCkti) ≤ ηρ(Λ0)C0
(
ninsidei −mi
)
pQ/3, then P blke ≥ p2m/L/9, where L is the number of sub-circuits.
Proof: In each i-th sub-circuit SubCkti, if mi ≤ min (2m/L, ni), then the number of bit-meters for
SubCkti is smaller than ηρ(Λ0)C0
(
ninsidei −mi
)
pQ/3. Further, the distance between the outer parallelepipeds
and the inner parallelepipeds is bounded from below by ηρ(Λ0). Therefore at most C0
(
ninsidei −mi
)
pQ/3
bits of information I can be communicated from outside the outer parallelepipeds to the inside of inner
parallelepipeds.
Now since mi ≤ ni, if the ni output-nodes correspond to more than mi non-zero entries in the input
15
Fig. 11. This graph illustrates the stencil-partition on the decoding circuit. The sub-lattice which has a larger fundamental
parallelepiped (sub-circuit) defines the sub-circuits. And the inner part of sub-circuits are fixed by choosing a fractional parameter
0 < η < 1. For instance, for the sub-circuit on the left-up corner contains the fundamental parallelepiped, the order of quotient
λ = |Λ/Λ0| = 9, and it has 6 input-nodes and 10 output-nodes. Moreover, it has 1 output-node in the inner part of sub-circuit.
vector, then the decoder cannot determine all Q bits in output-nodes. We denote this failure event by L.
Then L occurs with probability at least p2m/L since mi ≤ 2m/L.
Conditioning on the event L, applying Lemmas 4 and 5 using Fano’s inequality [25], as the received
entropy is smaller than C0
(
ninsidei −mi
)
pQ/3, the average block error probability is larger than 1/9. Thus,
given the assumptions of this lemma, the (unconditional) error probability for recovering the ni entries of
input vector X with precision Q in the i-th sub-circuit is lower bounded by p2m/L/9. Since the average
block error probability P blke for the entire circuit is larger than that for any sub-circuit, the claimed result
follows.
A. Proof of Theorem 1
The outer parallelepipeds (or we call it sometimes outer part of sub-circuit) of the Stencil divide the
circuit into L sub-circuits. Let the i-th sub-circuit have mi input-nodes and ni output-nodes within the outer
parallelepipeds and ninsidei output-nodes inside the inner parallelepipeds. Using Lemma 2 and Lemma 3 we
can choose a fixed origin O of the Stencil such that at least (1− 2η)2 fraction of the n output-nodes are
covered by the inner parallelepipeds. Moreover, note that the number of sub-circuits covered by the inner
parallelepipeds with mi ≤ min{2m/L, ni} is at least min{(1−R)/(1 +R), 1/2}L, which will be used in the
later part.
Next, setting L = 2m/ logp 10P blke in Lemma 6, if we assume that the bit-meters used by a non-locally
decodable sub-circuit is smaller than ηρ(Λ0)C0
(
ninsidei −mi
)
pQ/3, then the average block error probability
P blke is bounded from below as
P blke ≥ p
2m
L /9 = plogp 10P
blk
e /9 = 10P blke /9.
Since the above is a contradiction, for each non-locally decodable sub-circuit SubCkti, denote µ (i) =
µ(SubCkti) ≥ C0
(
ninsidei −mi
)
pQηρ(Λ0)/3.
16
We bound the total bit-meters in the decoding circuit by
µ(DecCkt) ≥
L∑
i=1
µ(i)
≥
∑
mi≤min{2m/L,ni}
µ(i) +
∑
mi≥2m/L
µ(i)
≥
∑
mi≤min{2m/L,ni}
C0
(
ninsidei −mi
)
pQηρ(Λ0)/3 +
∑
mi≥2m/L
µ(i).
Now we define three types of sub-circuits under the condition mi ≥ 2m/L and mi ≤ min{2m/L, ni}.
First we use LD1 to denote those values of i such that mi ≥ 2m/L and µ(i) ≥ C0
(
ninsidei −mi
)
pQηρ(Λ0)/3.
Next let LD2 denote those values of i such that mi ≥ 2m/L and µ(i) < C0
(
ninsidei −mi
)
pQηρ(Λ0)/3.
Finally, let NLD denote those values of i such that mi ≤ min{2m/L, ni}, then it follows that
µ(DecCkt) ≥
∑
i∈LD1∪NLD
C0
(
ninsidei −mi
)
pQηρ(Λ0)/3 +
∑
i∈LD2
µ(i)
(a)≥ 1−R
2(1 +R)
L∑
i=1
C0
(
ninsidei −mi
)
pQηρ(Λ0)/3.
(b)≥ 1−R
2(1 +R)
C0
(
(1− 2η)2n−m) pQηρ(Λ0)/3. (3)
In the above, (a) follows from Lemma 3 that the fraction of sub-circuits SubCkti with i ∈ LD3 is
larger than min{(1−R)/(1 +R), 1/2} hence (1−R)/2(1 +R) and (b) follows from Lemma 2 such that∑L
i n
inside
i ≥ n (1− 2η)2.
Next, by Lemma 1,
ρ(Λ0) ≤ 1
2
ρ(Λ)
√
(n+m) /L =
1
2
ρ(Λ)
√
logp 10P
blk
e
(
1 +
1
2R
)
.
Substituting ρ(Λ0) into (3), we get
µ(DecCkt) ≥ ηρ(Λ)
6
√
2
C0
(
(1− 2η)2n−m) pQ√( 1
2R
− 1−R
2 + 2R
(1 +R)2
)√
logp 10P
blk
e .
Choosing η = 1/4 yields Theorem 1.
For the regime m = Θ(k), we derive the following order expression. This serves as a benchmark for
design of our algorithms.
B. Proof of Corollary 1
In the Sparsity Model (n,m,p) , the expected number of non-zero entries in the input vector X is k = np.
By Hoeffding’s inequality, we can bound the number of non-zero entries in the input vector X in the
Sparsity Model (n,m,p) in the range [k/2, 3k/2] with probability 1 − e−Θ(k). Hence asymptotically we can
substitute k = np in the inequality 1 and get
µ(DecCkt) ≥ ρ(Λ)
24
√
2
C0
(n
4
−m
) kQ
n
√(
1
2R
− 1−R
2 + 2R
(1 +R)2
)√
logp 10P
blk
e (4)
17
which differs from the original lower bound in the inequality 1 by a constant.
Since k = o(n) by our sparse assumption, in the regime m = Θ(k), we get R = m/n = o(1). Finally, letting
R = m/n we can asymptotically bound the bit-meters as
µ(DecCkt) = Ω
√ nk2
log n
min
√ k
m
,
√
log 1
P blke
m
 .
Appendix B
Decoding Algorithms
We give the following algorithms descriptions for CA and SA.
Algorithm 1 Decoding Algorithm (CA)
1: procedure Dec(Y,ACA)
2: for i← 1,m do
3: load c′ rows of encoding matrix AiCA and ~Yi at the i-th input-node
4: load Si = {~Yt}t∈{1,2,...,i−1} received from previous input-nodes and their corresponding rows of encoding matrix AtCA
5: flag = 0
6: for j ← 1 to n do
7: if ∃ a real number b such that b~ai,j = ~Yi then
8: Xj = b
9: flag = 1
10: send b to the j-th output-node
11: update the encoding matrix ACA
12: break
13: else
14: continue
15: end if
16: end for
17: if flag = 0 then
18: if ∃ c (|Si|+ 1) where c is the measurement constant non-zero real numbers as non-zero entries of the updated
recovery vector Xˆ such that the linear equations {~Yt = AtCAXˆ}t∈{1,2,...,i−1} hold then
19: update the recovery vector Xˆ
20: update the encoding matrix ACA
21: else
22: send ~Yi to the input-node corresponding to the sub-circuit in the i+ 1-th stage covering the current one
23: end if
24: end if
25: end for
26: clearing stage
27: end procedure
18
Fig. 12. The Proof Map of Upper Bound.
Algorithm 2 Decoding Algorithm (SA)
1: procedure Dec(Y,ASA)
2: for i← 1,m do
3: load c′ rows of encoding matrix AiSA and ~Yi at the i-th input-node
4: for j ← 1 to n do
5: if ∃ a real number b such that b~ai,j = ~Yi then
6: Xj = b
7: send b to the j-th output-node
8: update the encoding matrix ASA
9: break
10: else
11: continue
12: end if
13: end for
14: end for
15: clearing stage
16: end procedure
Appendix C
Proofs of Upper Bounds
The outer bound is achieved by performing measurements according to a specially designed m× n
complex matrix Am,n, and then The perform decoding in a stage-by-stage manner. First, we state a lemma
describing some geometric properties that follow from our definitions of models and descriptions of
algorithms.
Lemma 7 (Properties of DecCkt). A decoding circuit DecCkt implementing the decoding steps defined by CA
and SA, it satisfies the following properties (here i denotes the index of stages):
• The Communication Distance DDecCkt is bounded from above by
19
D(i)DecCkt =

O(
√
φi−1n/k)
for i = 1, 2, . . . , logφ(k/ log2 k)
O(√n)
for i = logφ(k/ log2 k) + 1;
• The Number of Transmissions NDecCkt is bounded by
N (i)DecCkt =

Θ(k/φi−1)
for i = 1, 2, . . . , logφ(k/ log2 k)
Θ(
√
k)
for i = logφ(k/ log2 k) + 1;
• The Bit-precision required in each communication between nodes is bounded by
BDecCkt = Θ(1).
For the clearing stage, we use SHO-FA [20] with an appropriate parameter setting. The following
theorem states the performance guarantees of SHO-FA.
Theorem 3 (SHO-FA [20]). For the Sparsity Model (n,m, k), the SHO-FA decoding algorithm with encoding
matrix ASHO-FA has the following properties:
1) For every input vectorX ∈ Rn, with probability 1-O(1/√k) over the choice ofASHO-FA, the algorithm produces
a recovery vector Xˆ such that ||X− Xˆ||1/||X||1 ≤ 2−Q.
2) The number of measurements m ≤ 2ck +√k, where c is the measurements constant.
Lemma 8 (Error Probability: SA). For the Sparsity Model (n,m, k), the decoding circuit DecCkt for SA
implemented on the Implementation Model (ρ, µ) satisfies the following properties:
1) There is a constant ϕ > e such that if from i-th stage to (i + 1)-th stage, the area of sub-circuit increases from
det(Λi) to det(Λi+1) = ϕdet(Λi), then for each sub-circuit in the (i + 1)-th stage, it contains ϕin/CSAk
output-nodes where CSA > 0 is a constant.
2) In the (i+1)-th stage, for any sub-circuit, denote {Aj}j∈S={1,2,...,ϕin/CSAk} the set of events that the j-th output-
node corresponds to a non-zero entry, we have
Pr
∨
j∈S
 ∧
i∈S\{j}
qAi
∧
Aj
 ≥ 1− 1
ϕ
where S = {1, 2, . . . , ϕin/CSAk}.
3) An average block error probability P blke = O(1/
√
k) is achievable with a fixed precisionQ under the regime
m = Θ(k) and the sub-linear regime k = n1−β where β ∈ (0, 1).
Proof: For the first property (1), note that the event Bj =
∧
i∈S\{j} qAi
∧
Aj is the event that within
the circuit, only the j-th output-node corresponds to a non-zero entry in the input-vector X. Furthermore
20
Pr [Bi
∧
Bj ] = 0 for all possible i, j. Therefore by the chain rule
Pr
∨
j∈S
 ∧
i∈S\{j}
qAi
∧
Aj

=
|S|∑
j=1
Pr
Aj | ∧
i∈S\{j}
qAi
|S|−1∏
i=1
(
1− Pr
[
Ai|
∧
k<i
qAk
]) . (5)
Next using (5) we find bounds on Pr
[
Aj |
∧
i∈S\{j}qAi
]
and Pr
[
Ai|
∧
k<i qAk
]
. Note that after the i-th
stage, each event Aj satisfies Pr [Aj ] ≤ 1/|S| and Aj is mutually independent of all but at most ϕi−1n/CSAk
other Aj ’s and e · ϕi−1nk|S| = eϕ ≤ 1 by choosing ϕ ≥ e. Hence, by Lova`sz local lemma (see for example the
textbook [27]), we have Pr
[
Ai|
∧
k<i qAk
] ≤ CSAkϕi−1n . Further, as the sub-lattice in the i+ 1-th stage is chosen
uniformly at random, then there is a constant C ′ such that Pr
[
Aj |
∧
i∈S\{j}qAi
]
≥ Pr[Aj ]C′ = CSAkC′ϕi−1n . Thus
from Equation (5),
Pr
∨
j∈S
 ∧
i∈S\{j}
qAi
∧
Aj

≥ CSAk
C ′ϕi−1n
|S| ·
(
1− CSAk
ϕi−1n
)|S|
≥ CSAk
C ′ϕi−1n
· ϕ
i−1n
CSAk
·
(
1− CSAk
ϕi−1n
)ϕi−1n/CSAk
. (6)
Taking limit with respect to n and using the Inequality (6) we have
Pr
∨
j∈S
 ∧
i∈S\{j}
qAi
∧
Aj
 ≥ e
C ′
.
Note that the above lower bound is constant across all stages. Letting ϕ = C ′/(C ′ − e) for an appropriate
C ′ and applying concentration inequalities under the Sparsity Model (n,m, p), the probability P 1e (SA) of
the event that after first logϕ(k/ log2 k) stages more than 1/
√
k unsolved non-zero entries remain is upper
bounded as
P 1e (SA)
≤ C ′′ logϕ(k/ log2 k) exp
(
k/ϕlogϕ(k/ log2 k)
)
≤ C ′′/
√
k (7)
for some constant C ′′ > 0 since there is no intersection between sub-circuits at each stage.
For the clearing stage, by Theorem 3 proved in [20], we have if we use Θ(
√
k) measurements, then the
probability P 2e (SA) of the event that ||X− Xˆ||1/||X||1 ≥ 2−Q is P 2e (SA) = O(1/
√
k). Using union bound,
we have P blke ≤ P 1e (SA) + P 2e (SA), implying that P blke = O(1/
√
k).
Lemma 9 (Error Probability: CA). For the Sparsity Model (n,m, k), the decoding circuit DecCkt for CA
implemented on the Implementation Model (ρ, µ) achieves an average block error probability P blke = O(1/
√
k) with
a fixed precision Q under the regime m = Θ(k) and the sub-linear regime k = n1−β where β ∈ (0, 1).
Proof: Using the same argument in Lemma 8, it suffices to show the probability P 1e (CA) of the
event that more than 1/
√
k non-zero entries being left undecoded after first logφ(k/ log2 k) stages satisfies
21
P 1e (CA) = O(1/
√
k).
Hence the only thing we need to show is for some choices of the encoding matrix ACA, the probability for
at most 1/
√
k unsolved entries at the
(
logφ(k/ log2 k)− 1
)
-th stage before the clearing stage is P 1e (CA) =
O(1/√k). Therefore if at the (logφ(k/ log2 k)− 1)-th stage, there is a constant fraction ρ ≥ 1 − 1/φ of
sub-circuits which contain at most logφ(k) non-zero entries (may be solved in the former stages), we then
could claim that P 1e (CA) = O(1/
√
k) by using concentration inequalities and the fact that the sub-circuits
at the
(
logφ(k/ log2 k)− 1
)
-th stage have no intersection with one another. This is true because only
ρ
(
logφ(k)
)2 non-zero entries remain undecoded. Let the event that at the (logφ(k/ log2 k)− 1)-th stage
the j-th sub-circuit is of at most output-nodes corresponds to logφ(k) non-zero entries be Bj . Note that
by the definition of our Sparsity Model (n,m,p) in Definition 1 the probability for each sub-circuit has at
most logφ(k) non-zero entries is bounded from above by
Pr [Bj ] ≤
(
1− CCAφ
ik
n
)n/CCAφik
where i = logφ(k/ log2 k)− 1.
Taking limit with respect to n, we have limn→∞ Pr
[
B¯j
]
> 1− 1/e. Therefore by letting φ ≤ e we have
P 1e (CA) = O(1/
√
k) and hence using the same argument in Lemma 8, we conclude this lemma.
Now we prove Theorem 2 and Corollary 2 stated in Section III.
A. Proof of Theorem 2 and Corollary 2
From Lemma 7, the number of transmissions N (i)DecCkt decays geometrically. Combining this with
Lemma 7, we conclude that the total bit-meters are bounded by:
µ(DecCkt)
=
logφ(k/ log2 k)+1∑
i=1
D(i)DecCktN (i)DecCktBDecCkt
=
logφ(k/ log2 k)∑
i=1
O
(√
nk/φi−1
)
+O
(√
nk
)
(8)
= O
(√
nk
)
(9)
= O
(√
nk
log n
√
log
1
P blke
)
. (10)
We get (8) because of the assumption that the precision parameter Q is fixed. Summing up all terms in
(8) yields equation (9). By Theorem 3, Lemma 9 and Lemma 8, the average block error probability P blke
satisfies P blke = O(1/
√
k). Since O(√log n) = O(√log k) in the sub-linear regime k = n1−β where β ∈ (0, 1),
we have O(√log n) = O(
√
1/ logP blke ) implying (10). Therefore combining the above with Corollary 1, we
get µ(DecCkt) = Θ
(√
nk
logn
√
log 1
P blke
)
.
References
[1] P. Grover, “information-friction” and its impact on minimum energy per communicated bit,” in
Information Theory Proceedings (ISIT), 2013 IEEE International Symposium on. IEEE, 2013, pp. 2513–2517.
22
[2] C. D. Thompson, “Area-time complexity for vlsi,” in Proceedings of the eleventh annual ACM symposium
on Theory of computing. ACM, 1979, pp. 81–88.
[3] C. D. Thompson, “A complexity theory for vlsi,” Ph.D. dissertation, Carnegie-Mellon University, 1980.
[4] R. P. Brent and H. Kung, “The area-time complexity of binary multiplication,” Journal of the ACM
(JACM), vol. 28, no. 3, pp. 521–534, 1981.
[5] B. Chazelle and L. Monier, “Towards more realistic models of computation for vlsi,” 1981.
[6] C. E. Leiserson, “Area-efficient vlsi computation.” DTIC Document, Tech. Rep., 1981.
[7] C. Mead and L. Conway, Introduction to VLSI systems. Addison-Wesley Reading, MA, 1980, vol. 1080.
[8] B. P. Sinha and P. K. Srimani, “A new parallel multiplication algorithm and its vlsi implementation,”
in Proceedings of the 1988 ACM sixteenth annual conference on Computer science. ACM, 1988, pp. 366–372.
[9] M. R. Kramer and J. van Leeuwen, “The vlsi complexity of boolean functions,” in Logic and Machines:
Decision Problems and Complexity. Springer, 1984, pp. 397–407.
[10] S. N. Bhatt, G. Bilardi, and G. Pucci, “Area-time tradeoffs for universal vlsi circuits,” Theoretical
Computer Science, vol. 408, no. 2, pp. 143–150, 2008.
[11] R. Cole and A. Siegel, “Optimal vlsi circuits for sorting,” Journal of the ACM (JACM), vol. 35, no. 4,
pp. 777–809, 1988.
[12] C. D. Thompson, The VLSI complexity of sorting. Springer, 1981.
[13] E. J. Cande`s, J. Romberg, and T. Tao, “Robust uncertainty principles: Exact signal reconstruction from
highly incomplete frequency information,” Information Theory, IEEE Transactions on, vol. 52, no. 2, pp.
489–509, 2006.
[14] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp.
1289–1306, 2006.
[15] E. J. Cande`s, “The restricted isometry property and its implications for compressed sensing,” Comptes
Rendus Mathematique, vol. 346, no. 9-10, pp. 589–592, 2008.
[16] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin, “A simple proof of the restricted isometry
property for random matrices,” Constructive Approximation, vol. 28, no. 3, pp. 253–263, December
2008.
[17] R. Berinde, P. Indyk, and M. Ruzic, “Practical near-optimal sparse recovery in the l1 norm,” Proceedings
of the Annual Allerton conference, 2008.
[18] R. Berinde and P. Indyk, “Sequential sparse matching pursuit,” Proceedings of the Annual Allerton
conference, 2009.
[19] A. Gilbert and P. Indyk, “Sparse recovery using sparse matrices,” Proceedings of IEEE, vol. 98, no. 6,
pp. 937–947, 2010.
[20] M. Bakshi, S. Jaggi, S. Cai, and M. Chen, “SHO-FA: Robust compressive sensing with order-optimal
complexity, measurements, and bits,” in Communication, Control, and Computing (Allerton), 2012 50th
Annual Allerton Conference on. IEEE, 2012, pp. 786–793.
[21] S. Pawar and K. Ramchandran, “A hybrid dft-ldpc framework for fast, efficient and robust compressive
sensing,” in Proceedings of the 50th Allerton Conference, 2012.
[22] D. Guo, J. Luo, L. Zhang, and K. Shen, “Compressed neighbor discovery for wireless networks,”
CoRR, vol. abs/1012.1007, 2010.
[23] G. Grimmett and D. Stirzaker, Probability and random processes. Oxford Univ Press, 1992, vol. 2.
[24] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of
observations,” The Annals of Mathematical Statistics, pp. 493–507, 1952.
23
[25] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, 2012.
[26] M. Kuczma, An introduction to the theory of functional equations and inequalities: Cauchy’s equation and
Jensen’s inequality. Springer, 2008.
[27] N. Alon and J. H. Spencer, The probabilistic method. John Wiley & Sons, 2004.
