A Modulo-Based Architecture for Analog-to-Digital Conversion by Ordentlich, Or et al.
ar
X
iv
:1
80
6.
08
96
8v
1 
 [c
s.I
T]
  2
3 J
un
 20
18
1
A Modulo-Based Architecture for Analog-to-Digital
Conversion
Or Ordentlich, Gizem Tabak, Pavan Kumar Hanumolu, Andrew C. Singer and Gregory W. Wornell
Abstract—Systems that capture and process analog signals
must first acquire them through an analog-to-digital converter.
While subsequent digital processing can remove statistical cor-
relations present in the acquired data, the dynamic range of the
converter is typically scaled to match that of the input analog
signal. The present paper develops an approach for analog-to-
digital conversion that aims at minimizing the number of bits
per sample at the output of the converter. This is attained by
reducing the dynamic range of the analog signal by performing
a modulo operation on its amplitude, and then quantizing the
result. While the converter itself is universal and agnostic of
the statistics of the signal, the decoder operation on the output
of the quantizer can exploit the statistical structure in order to
unwrap the modulo folding. The performance of this method is
shown to approach information theoretical limits, as captured by
the rate-distortion function, in various settings. An architecture
for modulo analog-to-digital conversion via ring oscillators is
suggested, and its merits are numerically demonstrated.
I. INTRODUCTION
Analog-to-digital converters (ADCs) are an essential com-
ponent in any device that manipulates analog signals in a
digital manner. While digital systems have benefited tremen-
dously from scaling, their analog counterparts have become
increasingly challenging. Consequently, it is often the case that
the ADC constitutes the main bottleneck in a system, both in
terms of power consumption and real estate, and in terms of
the quality of the system’s output. Developing more efficient
ADCs is therefore of great interest [1], [2].
The quality of an ADC is measured via the tradeoff be-
tween various parameters such as power consumption, size,
cost of manufacturing, and the distortion between the input
signal and its digitally-based representation. For the sake of
a unified, technology-independent, discussion, it is convenient
to restrict the characterization of an ADC quality to three basic
parameters: 1) The number of analog samples per second FS ;
2) The number of “raw” output bits R the ADC produces
per sample (before subsequent possible compression); 3) The
mean squared error (MSE) distortion D between the input
signal and a reconstruction that is based on the output of the
ADC.
While different applications may require different tradeoffs
between FS , R and D, it is always desirable to design the
ADC such that all three parameters are as small as possible.
O. Ordentlich is with the Hebrew University of Jerusalem, Israel (email:
or.ordentlich@mail.huji.ac.il). G. Tabak, P. K. Hanumolu and A. C. Singer
are with the University of Illinois, Urbana-Champaign, USA (emails:
{tabak2,hanumolu,acsinger}@illinois.edu). G. W. Wornell is with the Mas-
sachusetts Institute of Technology, MA, USA (email: gww@mit.edu)
The focus of this work is on the quantization rate R. For a
given sampling frequency FS , and a given target distortion
D, our goal is to design ADCs that use the smallest possible
number of raw output bits per sample.
The problem of analog-to-digital conversion can be seen
as an instance of the lossy source coding/lossy compression
problem [3]–[5], as the output of an ADC is a binary sequence,
which represents the analog source. A unique key feature of
the analog-to-digital conversion problem is that the encoding
of the source is carried out in the analog domain, while the
decoding procedure is purely digital. Given the limitations of
analog processing, it is therefore generally only practical to
exploit the source structure at the decoder. Hence, the type of
source coding schemes that are suitable for data conversion,
are those that approach fundamental limits without requiring
knowledge of the source structure at the encoder. In addition,
latency and complexity constraints in data conversion, typi-
cally preclude the use of schemes other than those based on
scalar quantization.
The input signal to an ADC is often known to have structure
that could be exploited to reduce the overall bit rate of its
representation, R. In our analysis, it will be convenient to
express this structure using a stochastic model for the input.
Consequently, throughout the paper, we will model the input
to the ADC as a stationary stochastic Gaussian process X(t),
whose power spectral density (PSD) encapsulates the assumed
structure. More generally, we will sometimes also consider the
problem of analog-to-digital conversion of a vector X(t) =
{X1(t), . . . , XK(t)} of jointly stationary stochastic Gaussian
processes, via K parallel ADCs, the input to each one of them
is one of the K processes.
Under such stochastic modeling, rate-distortion theory [3]
provides the fundamental lower bound Fs · R > RX(D) for
any ADC (and corresponding decoder) that achieves distor-
tion D, where RX(D) is the rate-distortion function of the
process X(t) in bits per second. In general, achieving the
rate-distortion function of a source requires using sophisti-
cated high-dimensional quantizers, whereas analog-to-digital
conversion is invariably done via scalar uniform quantizers.
Thus, achieving this lower bound with ADCs seems overly
optimistic. Nevertheless, as we shall see, approaching the rate-
distortion bound, up to some inevitable loss due to the one-
dimensional nature of the quantization, is sometimes possible
by a simple modification of the scalar uniform quantizer,
namely, a modulo ADC, followed by a digital decoder that
efficiently exploits the source structure.
Instead of sampling and quantizing the process X(t), a
modulo ADC samples and quantizes the process [X(t)] mod
2x
mod∆
. . .. . .
Q(·)
Modulo ADC
Fig. 1. A schematic illustration of the modulo ADC.
∆, where the modulo size ∆ is a design parameter. See
Figure 1. Equivalently, a modulo ADC can be thought of
as a standard uniform scalar ADC with step-size δ and an
arbitrarily large dynamic range/support, but that outputs only
the R least significant bits in the description of each sample,
where 2R = ∆δ . The benefit of applying the modulo operation
on X(t) is in reducing its dynamic range/support, which in
turn enables a reduction of the number of bits per sample
produced by the ADC, without increasing the quantizer’s step-
size. This operation, which corresponds to disregarding coarse
information about X(t), will otherwise substantially degrade
the source reconstruction. However, by properly accounting for
the modulo operation and appropriately choosing its parameter
∆, we can unwrap the modulo operation with high probability
using previous samples of X(t) and exploiting the (redundant)
structure in the signal.
Following standard system design methodology, in the per-
formance analysis of a modulo ADC, we distinguish between
two events: 1) The no-overload event E¯OL where the decoder
was able to correctly unwrap the modulo operation. We require
the MSE distortion, conditioned on this event, to be at most
D; 2) The overload event EOL where the decoder fails in
unwrapping the modulo operation. We require the probability
of this event Pr(EOL) to be small, but do not concern ourselves
with the MSE distortion conditioned on the occurrence of this
event.
A. Our Contributions
This work further develops the modulo ADC framework in
three complementary directions, as specified below.
1) Oversampled Modulo ADC: We show that a modulo
ADC can be used as an alternative to Σ∆ converters. A
Σ∆ converter is based on oversampling the input process
X(t), i.e., sampling above the Nyquist rate, in conjunction
with noise-shaping, which pushes much of the energy of the
quantization noise to high frequencies, where there is no
signal content. See Figure 2. The noise shaping operation
requires incorporating an elaborate mixed signal feedback
circuit. In particular, the circuit first generates the quantization
noise, which necessitates using not only an ADC, but also
an accurately-matched digital-to-analog converter (DAC), and
then applies an analog filter. The analog nature of the signal
processing makes it challenging to use filters of high-orders,
which in turn limits performance.
We develop an alternative architecture (Section III) that
shifts much of the complexity to the decoder, whereas the
Xn Σ− ADC LPF Xˆn
DACΣ −Filter
Encoder Decoder
Fig. 2. Schematic architecture for oversampled Σ∆ converter. {Xn} is
obtained by sampling the process X(t).
Xn Mod-ADC
Encoder
Σ− mod∆ Σ LPF Xˆn
Filter
Decoder
Fig. 3. Schematic architecture for oversampled modulo ADC. The same
architecture, without the low-pass filter (LPF) is also suitable for modulo
ADC for a general stationary process. {Xn} is obtained by sampling the
process X(t).
“encoder” is simply a modulo ADC. See Figure 3. The param-
eter ∆ in the modulo ADC, as well as the coefficients of the
prediction filter in Figure 3, depend only on the bandwidth B
of the input processX(t) and on its variance σ2, and not on the
other details of its PSD. Similarly, the MSE distortion between
the input process and its reconstruction, depends only on B
and σ2. Thus, the developed architecture is as agnostic as Σ∆
converters to the statistics of the input process. Furthermore,
for a flat-spectrum process, the distortion is within a small gap,
due to one-dimensionality of the encoder, from the information
theoretic limit.
2) A Phase-Domain Implementation of Modulo ADC via
Ring Oscillators: We develop a modulo ADC implementation
that performs the modulo reduction inherently as part of the
analog signal acquisition process. As the phase of a periodic
waveform is always measured modulo 2π, a natural class of
candidates are ADCs that first convert the input voltage into
phase, and then quantize that phase. A notable representative
within this class, which has been extensively studied in the
literature [6], [7], is the ring oscillator ADC.
Consider a closed-loop cascade of N inverters, where N is
an odd number, all controlled with the same voltage Vdd = Vin,
see Figure 4. This circuit, which will be described in detail
in Section IV, oscillates between 2N states, corresponding to
the values (‘low’ or ‘high’, represented by ‘0’ or ‘1’) of each
of the N inverters. See Figure 5. The oscillation frequency is
controlled by Vdd. Due to the oscillating nature of the circuit, if
we sample its state every TS seconds, we cannot tell how many
“state changes" occurred between two consecutive samples,
but we are able to determine this number modulo 2N . Thus,
by setting Vdd to Vdd(t) = g(X(t)), where X(t) is the analog
signal to be converted to a digital one and g(·) is a function
to be specified, we obtain a modulo ADC. The input-output
relation of this modulo ADC is characterized in Section IV,
and depends on the response time of the inverters to change
in their input, as a function of Vdd.
3Fig. 4. A schematic illustration of a ring oscillator with N = 5 inverters.
The states of all N inverter are measured every TS seconds.
Fig. 5. An example of the evolution of the states of the inverters in a ring
oscillator.
In practice, the modulo operation realized in this way
deviates from the ideal characteristic of Figure 1 in a variety of
ways. Accordingly, we perform several numerical experiments
to evaluate and optimize the performance of an oversampled
ring oscillator modulo ADC, and compare it to the perfor-
mance of an ideal modulo ADC as well as to a Σ∆ converter.
The results demonstrate that despite the non-idealities in the
ring oscillator implementation, in some regimes, this architec-
ture holds substantial potential for improvement over existing
ADCs.
3) Modulo ADCs for Jointly Stationary Processes: In many
applications the number of sensors/antennas observing a par-
ticular process is greater than the number of degrees-of-
freedom (per time unit) governing its behavior. Thus, there is
a redundancy at the receiver that can be exploited. However,
as this redundancy can be spread across time and space,
traditional ADC architectures, as well as the modulo ADC ar-
chitectures described in Section II-A and II-B, are insufficient.
In this part of the paper, we show how to address this problem
via a natural extension of the modulo ADC framework.
As an example we will consider the problem of wireless
communication. It is by now well established that using
receivers, as well as transmitters, with multiple antennas,
dramatically increases the achievable communication rates
over wireless channels [8], [9]. However, adding antennas
comes with the price of requiring multiple expensive and
power hungry RF chains. For traditional ADC architectures,
power and cost scale linearly with the number of receive
antennas, which motivates an alternative solution.
It is often the case, that the signals observed by the different
receive antennas are highly correlated, in time and in space. As
an illustrative example, consider the case where the transmitter
has one antenna, whereas the receiver has K > 1 antennas.
We can model the signal observed at each of the antennas,
after sampling, as
Y kn = h
k
n ∗Xn + Zkn, k = 1, . . . ,K, n = 1, . . . , N, (1)
where {Xn} is the process emitted by the transmitter, {hkn} is
the kth channel impulse response, and {Zkn} are independent
additive white Gaussian noise (AWGN) processes.
Since all K output processes {Y 1n }, . . . , {Y Kn } in (1) are
noisy and filtered versions of of the same input process, they
will typically be highly correlated. However, this correlation
may be spread in time (the n-axis) and in space (the k-
axis). As an extreme example, assume {Xn} is an iid process,
and the filters simply incur different delays, i.e., hkn = δn−k
for k = 1, . . . ,K . While each individual process {Y nk }
is white, and each vector (Y 1n , . . . , Y
K
n ), n = 1, . . . , N
has a scaled identity covariance matrix, the vector process
{{Y 1n }, . . . , {Y Kn }} is highly correlated. One must therefore
jointly process the time and the spatial dimensions in order to
exploit this correlation.
This phenomenon, where the signals observed by the differ-
ent ADCs are highly correlated, is not unique to the wireless
communication setup, and appears in many other applications,
e.g., multi-array radar. It is, however, taken to the extreme in
massive MIMO [10], where the number of antennas at the
base station is of the order of tens or even hundreds, while
the number of users it supports may be substantially fewer.
In Section VI we develop an architecture that uses modulo
ADCs, one for each receive antenna, in order to exploit the
space-time correlation of the processes. We develop a low-
complexity decoding algorithm for unwrapping the modulo
operations. This algorithm combines the idea of performing
prediction in time, of the quantized vector process from its
past, with that of integer-forcing source decoding [11], which
is used for exploiting spatial correlations in the prediction error
vector. See Figure 6. In the limit of small D, the loss of the
developed analog-to-digital conversion scheme with respect
to the information theoretic lower bound on D, is shown to
reduce to that of the integer-forcing source decoder.
B. Related Work
The idea of using modulo ADCs/quantizers for exploiting
temporal correlations within the input process X(t) towards
reducing the quantization rate R, dates back, at least, to [12],
where a quantization scheme, called modulo-PCM, was intro-
duced. A decoding scheme for unwrapping the modulo oper-
ation, based on maximum-likelihood sequence detection [13],
was further proposed in [12], and a heuristic analysis was
performed, based on prediction of X(t) from its past, which
shows that modulo-PCM can approach the Shannon lower
bound under the high-resolution assumptions. In Section II-A,
we develop a more complete analysis of modulo quantization,
the details of which are required for the application we discuss
in Section III.
The architecture from Figure 3 is based on using a predic-
tion filter at the decoder, as a part of the modulo unwrapping
process, as was hinted at in [12] (see also [14]). In agree-
ment with the literature on differential pulse-code modulation
4X1n Mod-ADC Σ− mod∆...
XKn Mod-ADC Σ− mod∆
A
mod∆
mod∆
A−1
Σ Xˆ1n...
Σ XˆKn
Matrix Filter
Encoder Decoder
Fig. 6. Schematic architecture for Modulo ADCs for jointly stationary processes.
(DPCM) at the late 1970s (see e.g. [15]), the authors in [12]
proposed to design the prediction filter as the optimal one-step
predictor of the unquantized process {Xn} from its past. As
shown in [16], this design criterion is sub-optimal, and the
“correct” design criterion is to take this filter as the one-step
predictor of the quantized process from its past. The difference
between the two design criteria is significant for oversampled
processes, which are the focus of Section III, whose PSD is
zero at high frequencies, as in those frequencies the signal-to-
distortion ratio is zero, no matter how small the quantization
noise is. Our analysis in Section III reveals that designing
the modulo size ∆ and the prediction filter with respect to
a quantized flat-spectrum input process, results in a universal
system. This means, that this system attains the same distortion
D for all input processes that share the same support for the
PSD and the same variance.
The use of modulo ADCs/quantizers was also studied by
Boufounos in the context of quantization of oversampled
signals [17] (see also [18]). In particular, it is shown in [17]
that by randomly embedding a measurement vector in RK
onto an M ≫ K dimensional subspace, and using a modulo
ADC for quantizing each of the coordinates of the result,
one can attain a distortion that decreases exponentially with
the oversampling ratio, with high probability. In Section III
we consider a similar setup, where an oversampled analog
signal, with oversampling ratio L > 1, i.e. Fs is L times
greater than the Nyquist frequency, is digitized by a modulo
ADC. In the language of [17], this corresponds to embedding
X ∈ RK to an M = LK dimensional space by zero-padding
followed by interpolation, which is indeed a linear operation.
We show that for this particular “embedding” not only is the
decay of MSE distortion exponential in the oversampling ratio,
but the attained distortion is information-theoretically optimal,
up to a constant loss, which is explicitly characterized, due
to the scalar nature of the quantizer. Moreover, under this
“embedding”, a simple low-complexity decoding algorithm
exists, whereas for the random projection case studied in [17],
no computationally efficient decoding algorithm was given.
One advantage, on the other hand, of the approach from [17],
is that it is applicable to 1-bit modulo ADCs, whereas the
performance of the scheme from Section III typically becomes
attractive starting from R & 2 bits per sample.
Very recently, Bhandari et al. have addressed the question
of what is the minimal sampling rate that allows for exact re-
covery of a bandlimited finite-energy signal, from its modulo-
reduced sampled version [19] (see also [20]). They have
found that a sufficient condition for correct reconstruction is
sampling above the Nyquist rate by a factor of 2πe, regardless
of the size of the modulo interval. The analysis in [19] did
not take quantization noise into account, which corresponds
to R =∞ and D = 0 in our setup.
The merits of a modulo ADC for distributed analog-to-
digital conversion of signals correlated in space, but not in
time, were demonstrated in [11]. A low-complexity decoding
algorithm, for unwrapping the modulo operation, was pro-
posed and its performance was analyzed. It was demonstrated
via numerical experiments that the performance is usually
quite close to the information theoretic lower bounds (See
also [21]). In Section II-B, we summarize the decoding scheme
from [11] and the corresponding performance analysis, as
those will be needed in Section VI, where we develop a
modulo ADC architecture for analog-to-digital conversion of
jointly stationary processes. The decoding algorithm for this
setup, as well as its performance analysis, is inspired by the
ideas and techniques from Sections II-A and II-B.
In a broader sense, modulo quantization is closely re-
lated to Wyner-Ziv’s source coding with side information
setup and to its channel coding dual, which is the Gel’fand-
Pinsker setup [22]. In the latter context, we further note
that modulo quantization is widely used for communication
over intersymbol interference channels [23], [24]. Recently,
Hong and Caire [25] considered modulo ADCs as potential
candidates for the front end of receivers in a cloud radio access
network (CRAN), employing compute-and-forward [26] based
protocols.
Note that the although the concept of modulo ADC is
reminiscent of folding ADCs [27], an important difference is
that unlike the latter, the former does not keep track of the
number of folds that occurred and, moreover, its functionality
does not depend on this number, i.e., it does not saturate
for large inputs. In unwrapping the modulo operation at the
decoder, the missing information about number of folds is
recovered, and we are able to attain the same D with smaller
rate.
Finally, another related line of work, is that of compressed
sampling, see, e.g., [28]–[30], where the goal is to design
universal and efficient ADCs with a small sampling frequency
FS , under the assumption that the input signal occupies only
a small portion of its total bandwidth, but the exact support is
unknown.
5C. Organization
The rest of the paper is organized as follows. In Section II
we formally define the modulo ADC and study its perfor-
mance for stationary scalar input processes, and for random
vectors (spatial correlation). Section III develops the use of
oversampled modulo ADCs as a substitute for Σ∆ converters,
and analyzes the tradeoffs this architecture achieves. In Sec-
tion IV we introduce an implementation of modulo ADCs via
ring oscillators and establish the corresponding input-output
mathematical model. Numerical experiments for evaluating the
performance of ring oscillators based oversampled modulo
ADCs are performed in Section V. Section VI proposes to
use parallel modulo ADCs for digitizing jointly stationary
processes. The paper concludes in Section VII.
II. PRELIMINARIES ON IDEAL MODULO ADC
Let ∆ ∈ R+ be a positive number, and define the mod∆
operation as
[x] mod ∆ , x−∆
⌊ x
∆
⌋
∈ [0,∆),
where the floor operation ⌊x⌋ returns the largest integer
smaller than or equal to x. By definition, we have that for
any x, y ∈ R and ∆ > 0
[[x] mod ∆+ y] mod ∆ = [x+ y] mod ∆. (2)
An R-bit modulo ADC with resolution parameter α, or (R,α)
mod-ADC, is defined by
[x]R,α , [⌊αx⌋] mod 2R ∈ {0, 1, . . . , 2R − 1},
where we have assumed that 2R is an integer. In case R
itself is an integer, each sample of [x]R,α can be represented
by R bits. Otherwise, we can buffer n consecutive samples
[x1]R,α, . . . , [xn]R,α and represent them by ⌈nR⌉ ≤ nR + 1
bits, such that the average number of bits per sample is
≤ R + 1n . The role of α here is to scale the input prior to
quantization. We can write [x]R,α as
[x]R,α = [αx + (⌊αx⌋ − αx)] mod 2R = [αx + z] mod 2R.
(3)
The error term z = ⌊αx⌋ − αx ∈ (−1, 0] in (3) is clearly
a deterministic function of x. Nevertheless, throughout this
paper we will model this error term as additive uniform noise
Z ∼ Unif((−1, 0]) statistically independent of x, such that
the (R,α) mod-ADC will be treated as a stochastic channel
with input x and output Y , related as
Y = [αx+ Z] mod 2R. (4)
The approximation of the (R,α) mod-ADC by the additive
modulo channel (4) can be made exact via the use of sub-
tractive dithers. Specifically, we can use a random variable
U ∼ Unif([0, 1)), statistically independent of x, which we
refer to as a dither, and feed x˜ = x+U/α to the (R,α) mod-
ADC instead of feeding x. The output of the modulo ADC in
this case will be
[x˜]R,α = [αx˜+ (⌊αx˜⌋ − αx˜)] mod 2R
= [αx+ U + (⌊αx+ U⌋ − (αx+ U))] mod 2R.
Subtracting U from [x˜]R,α and reducing the result modulo 2
R,
we obtain
[[x˜]R,α − U ] mod 2R
=
[
[αx+ U + (⌊αx + U⌋ − (αx+ U))]mod 2R − U]mod 2R
=[αx+ (⌊αx+ U⌋ − (αx+ U))] mod 2R,
where the last equality follows from the distributive law of
modulo (2). Note that for every x ∈ R, the random variable
Z = ⌊αx + U⌋ − (αx + U) is uniformly distributed over
(−1, 0], and is therefore independent of x [31, Lemma 1].
Thus, with subtractive dithers, the additive noise model (4) is
exact. We note that even when dithering is not used, under
suitable conditions this approximation is quite accurate [32].
Although the modulo operation entails loss of information
in general, in many situations it is possible to unwrap it, i.e.,
reconstruct αx + Z from Y = [αx + Z] mod 2R with high
probability.1 In particular, let
Y˜ =
[
Y +
1
2
2R
]
mod 2R − 1
2
2R, (5)
and note that conditioned on the no-overload event
EOL ,
{
αx+ Z ∈
[
−1
2
2R,
1
2
2R
)}
,
we have that Y˜ = αx + Z . Thus, if Pr(EOL) is close to
1, the modulo operation has no effect with high probability.
Note that Pr(EOL) = Pr
(|αx + Z| > 122R) is identical to the
probability that a standard uniform quantizer with dynamic
range (support) 2R/α is in overload. Thus, when thinking of
x as a single observation, it is unclear what the advantages of
a modulo ADC are with respect to a traditional uniform ADC.
However, as we illustrate below, the modulo ADC allows
exploitation of the statistical structure of the acquired signal
in a much more efficient manner than the standard ADC.
The following lemma is proved using Chernoff’s bound, and
will be useful in the sequel for bounding Pr(EOL) in various
scenarios.
Lemma 1 ( [33, Lemma 4], [34, Theorem 7]): Consider
the random variable Zeff =
∑L
ℓ=1 αℓZℓ +
∑K
k=1 βkUk
where {Zℓ}Lℓ=1 are iid Gaussian random variables with zero
mean and some variance σ2z and {Uk}Kk=1 are iid random
variables, statistically independent of {Zℓ}Lℓ=1, uniformly
distributed over the interval [−ρ/2, ρ/2) for some ρ > 0. Let
σ2eff , E(Z
2
eff). Then for any τ ∈ R
Pr(Zeff > τ) = Pr(Zeff < −τ) ≤ exp
{
− τ
2
2σ2eff
}
.
A. Modulo ADCs for Scalar Stationary Processes
Let {Xn} be a zero-mean discrete-time stationary Gaussian
stochastic process, obtained by sampling a stationary Gaussian
process X(t) every TS seconds. Let
Yn = [αXn + Zn] mod 2
R, n = 1, 2, . . .
be the process obtained by applying a (R,α) mod-ADC on
the process {Xn}, where {Zn} is a Unif((−1, 0]) iid noise,
1Here, the term “high probability” is used to state that this probability can
be made as high as desired by increasing R. We explicitly quantify the relation
between R and the desired “no-overload” probability.
6and let
Vn = αXn + Zn, n = 1, 2, . . .
be its non-folded version. Our goal is to design a decoder
that recovers Vn from the outputs of the modulo ADC,
{Yn}, with high probability. To that end, we assume the
decoder has access to {Vn−1, . . . , Vn−p}, an assumption that
will be justified in the sequel, and that it knows the auto-
covariance function CX [r] = E[XnXn−r] of {Xn}. We apply
the following algorithm (See also Figure 3 for a schematic
illustration):
Inputs: Yn,{Vn−1, . . . , Vn−p}, {CX [r]}, R, α.
Output: Estimates Vˆn, Xˆn, for Vn and Xn, respectively.
Algorithm:
1) Compute the optimal linear MMSE predictor for Vn from
its last p samples
Vˆ pn =
p∑
i=1
hi ·
(
Vn−i +
1
2
)
− 1
2
, (6)
where {hn} is a p-tap prediction filter, computed based
on {CX [r]} and α, and the shift by 1/2 compensates for
E(Zn).
2) Compute
Wn = [Yn − Vˆ pn ] mod 2R
W˜n =
[
Wn +
1
2
2R
]
mod 2R − 1
2
2R.
3) Output Vˆn = Vˆ
p
n + W˜n, and Xˆn =
Vˆn+
1
2
α .
Remark 1: Note that {hn} is the p-tap prediction filter
for the quantized process {Vn} from its past, rather than
for {Xn} from its past. While the loss for using the lat-
ter, instead of the former, becomes insignificant when high-
resolution assumptions apply, it can be arbitrarily large for
oversampled processes, for which high-resolution assumptions
never hold [16], [35]. The filter coefficients {hn} need only
be computed once, and can then be used for all times.
The following proposition characterizes the performance of
the algorithm above. All logarithms in this paper are taken to
base 2, unless stated otherwise.
Proposition 1: Let Vˆ pn , Vˆn and Xˆn be as defined in the
algorithm above, and let σ2p = E(Vn − Vˆ pn )2. We have that
Pr(EOLn) , Pr(Vˆn 6= Vn) ≤ 2 exp
{
−3
2
22(R−
1
2
log(12σ2p))
}
,
(7)
and
D = E[(Xn − Xˆn)2|EOLn ] ≤
1
12α2(1− Pr(EOLn))
, (8)
where the event EOLn = {Vˆn = Vn} is the complement of the
event EOLn = {Vˆn 6= Vn}.
Proof: Let Epn , Vn−Vˆ pn be the pth order prediction error
of the process {Vn}, and note that its variance σ2p = E(Epn)2
is invariant to n due to stationarity. We have that
Wn = [Yn − Vˆ pn ] mod 2R
=
[
[Vn] mod 2
R − Vˆ pn
]
mod 2R
=
[
Vn − Vˆ pn
]
mod 2R (9)
= [Epn] mod 2
R,
where equation (9) follows from the modulo distributive
law (2), and constitutes the key advantage of the modulo
operation for exploiting temporal correlations. Note that W˜n ∈
[− 122R, 122R) is a cyclicly shifted version of Wn ∈ [0, 2R), as
in (5). Therefore, conditioned on the event
EOLn =
{
|Epn| <
1
2
2R
}
we have that W˜n = E
p
n.
Note that Epn is a zero-mean linear combination of statis-
tically independent Gaussian and uniform random variables,
such that Lemma 1 applies, and we have that
Pr(EOLn) , Pr(W˜n 6= Epn)
= Pr
(
|EPn | >
1
2
2R
)
≤ 2 exp
{
−2
2R
8σ2p
}
= 2 exp
{
−3
2
22(R−
1
2
log(12σ2p))
}
, (10)
Whenever EOLn occurs, we have that Vˆn = Vn, and conse-
quently
Xˆn = Xn +
Zn +
1
2
α
and
E[(Xn − Xˆn)2|EOLn ] = E
[(
Zn +
1
2
α
)2 ∣∣∣∣EOLn
]
=
1
α2
E(Zn + 1/2)
2 − Pr(EOLn)E[(Zn + 1/2)2|EOLn ]
Pr(EOLn)
≤ 1
12α2(1− Pr(EOLn))
. (11)
Proposition 1 shows that we can make Pr(EOLn) as small
as 2e−
3
2
22δ by choosing
R =
1
2
log(12σ2p) + δ. (12)
For example, taking δ = 2 bits, results in an overload
probability smaller than 10−10. In particular, unless we take
a very small δ, we have that 1 − Pr(EOLn) ≈ 1, and
consequently, by Proposition 1, we will have D ≈ 1/12α2.
Thus, to simplify expressions in the analysis that follows, we
assume D = 1/12α2. We note the tradeoff in choosing α:
on the one hand, increasing α decreases the MSE distortion
D, but on the other hand the prediction error variance σ2p
of the process Vn = αXn + Zn increases with α such that
the required rate R for avoiding overload errors increases.
Thus, the tradeoff between D and the required quantization
rate is controlled through the parameter α. We now turn to
characterize the tradeoff the developed scheme achieves.
7Let h(A) denote the differential entropy of the random
variable A, and h(A|B) the conditional differential entropy of
A given the random variable B [5]. Recall that for a stationary
Gaussian process {Xn} with PSD SX(ejω) we have that [36]
h(Xn|Xn−1, . . .) = 1
2π
∫ π
π
1
2
log
(
2πeSX(e
jω)
)
dω, (13)
and in particular h(Xn|Xn−1, . . .) = −∞ if and only if
SX(e
jω) = 0 over a measurable subset of [−π, π). Shannon’s
lower bound [3], states that the number of bits per sample R
produced by any quantizer that attains an MSE distortion D
must satisfy
R(D) ≥ RSLB(D) , h(Xn|Xn−1, . . .)− 1
2
log(2πeD).
It is well-known that for Gaussian processes with finite
h(Xn|Xn−1, . . .), Shannon’s lower bound is asymptotically
tight, i.e., limD→0R(D)−RSLB(D) = 0, [3].
Proposition 2: If h(Xn|Xn−1, . . .) > −∞, then
lim
D→0
lim
p→∞
1
2
log(12σ2p) = RSLB(D).
Proof: We can write
1
2
log(12σ2p) =
1
2
log

 σ2pα2
1
12α2

 = 1
2
log
(
E(E′pn )
2
D
)
. (14)
where E′pn is the pth order prediction error of the process
Xn + Zn/α = Xn +
√
DZ˜n, where Z˜n ∼ Unif([−
√
12, 0))
iid.
For a Gaussian process {Xn}, the condition
h(Xn|Xn−1, . . .) > −∞ is equivalent to
1
2π
∫ π
−π
1
2
log
(
SX(e
jω)
)
dω > −∞. (15)
As a consequence of (15), we have that
lim
D→0
1
2π
∫ π
−π
1
2
log
(
2πe
(
SX(e
jω) +D
))
dω
= h(Xn|Xn−1, . . .). (16)
By Paley-Wiener’s theorem [37], we have that
lim
p→∞
E(E′pn )
2 = 2
1
2pi
∫
pi
−pi
log(SX(ejω)+D)dω. (17)
Combining (16) and (17), we obtain that
lim
D→0
lim
p→∞
E(E′pn )
2 = 22h(Xn|Xn−1,...)−2πe,
for processes with finite entropy rate h(Xn|Xn−1, . . .). The
result now follows by rearranging terms.
For the practically important case where {Xn} is obtained
by oversampling the process {X(t)}, which is studied in
Section III, the assumption h(Xn|Xn−1, . . .) > −∞ of
Proposition 2 does not hold. Nevertheless, we will show that
the modulo ADC nevertheless achieves performance that is
close to the information theoretic limits.
Above, we have assumed that the decoder has access to
the non-folded samples {Vn−1, . . . , Vn−p}. To justify this
assumption, an initialization step is needed, where the decoder
acquires the first p consecutive samples {V1, . . . , Vp}, or
estimates of these samples. Once those are obtained, we can
apply the algorithm described above, sample-by-sample, and
assume the estimate Vˆn produced by the algorithm at time n
is correct, and can be used as an input for the algorithm in
the next p steps. All samples Vp+1, . . . , VT will be recovered
correctly, as long as no overload error occurred within the
T − p decoding steps. Thus, by the union bound, we see that
the first T−p samples are recovered correctly with probability
at least 1− 2Te−3222δ .2
One conceptually simple way of performing the initializa-
tion, i.e., obtaining {V1, . . . , Vp} is by using a standard scalar
quantizer with high-rate for the first p samples. Although
the high power consumption of such a quantizer will have
a negligible effect on the total power consumption, due to
the fact it is used only for a small fraction of the time, this
approach has the disadvantage of having to include two ADCs,
a high-rate standard ADC and a modulo ADC withing the
system. Alternatively, one can perform the initialization using
only a R bit modulo ADC in one of the two following ways:
1) Increase α gradually until it reaches its final value. For the
first sample, α1 will be chosen such that V1 = α1X1+Z1
is w.h.p. within the modulo interval, such that no predic-
tion is needed. Next, we can use V1 in order to predict
V2 = α2X2+Z2, which allows to use α2 > α1 such that
the prediction error is still within the modulo interval.
Continuing this way, we can keep increasing α until
convergence.
2) We can collect a long vector of outputs from the modulo
ADC, say {Y1, . . . , YK}, K > p, and unwrap the modulo
operation via the integer-forcing source coding scheme
described in the next subsection. The amount of com-
putations per sample required in this method is greater
than that of the “steady state”, i.e., after initialization is
complete, but since initialization is rarely performed, the
effect on the total complexity is negligible.
B. Modulo ADCs for Random Vectors
Let X ∼ N (0,Σ) be a K-dimensional Gaussian random
vector with zero mean and covariance matrix Σ. Let
Yk = [αXk + Zk] mod 2
R, k = 1, . . .K,
be obtained by applying K identical (R,α) mod-ADCs, each
applied to a different coordinate of the vector X, where the
quantization noises Zk ∼ Unif((−1, 0]), k = 1, . . . ,K , are
iid, and let
Vk = αXk + Zk, k = 1, . . .K,
be its non-folded version. Our goal is to recover V ,
[V1, . . . , VK ]
T from the outputs Y , [Y1, . . . , YK ]T of the
modulo ADCs with high probability.
By definition of the modulo operation, we have that V ∈
Y + 2R · ZK . Consequently, the optimal decoder for V from
the measurement Y, in terms of minimizing Pr(Vˆ 6= V), is
Vˆopt = Y + argmax
b∈2R·ZK
fV(Y + b), (18)
where fV is the probability density function (PDF) of the
random vector V. Although fV can be expressed as the
convolution of the K-dimensional Gaussian PDF of αX and
2Note that conditioning on the event that no overload error occurred until
time n, changes the statistics of Epn. Thus, applying the union bound correctly
here requires some more care. See [35] for more details.
8X1 Mod-ADC...
XK Mod-ADC
A
mod∆
mod∆
A−1
Xˆ1
XˆK
...
Encoder Decoder
Fig. 7. Schematic architecture for modulo ADC for random vectors.
the cubic PDF of Z = [Z1, . . . , ZK ]
T , no simpler closed-
form expression is known for it. However, as α increases
(high-resolution quantization regime), fV approaches the pdf
of a N (−1
2
, α2Σ + 112I) random vector, where
1
2
is a K-
dimensional vector with all entries equal to 12 and I is the
identity matrix. Consequently, one can use the sub-optimal (in
terms of minimizing Pr(Vˆ 6= V)) decoder
VˆGauss = Y
+ argmin
b∈2R·ZK
(Y +
1
2
+ b)T
(
α2Σ+
1
12
I
)−1
(Y +
1
2
+ b).
The matrix (α2Σ + 112 I)
−1 is positive definite and therefore
admits a Cholesky decomposition
(
α2Σ+ 112I
)−1
= LLT
where L is a lower triangular matrix with strictly positive
diagonal entries. Setting s = −LT (1
2
+Y), we can write
VˆGauss = Y + 2
R · argmin
b∈ZK
∥∥LTb− 2−R · s∥∥ . (19)
Thus, the problem of finding VˆGauss is equivalent to that of
finding the closest point to 2−R · s in the lattice generated by
the basis LT . Solving this problem, in general, is known to re-
quire running time exponential in K [38], unless P=NP. Thus,
for largeK , finding VˆGauss is computationally prohibitive. One
therefore needs to seek an alternative, low-complexity, decoder
for V from Y. Next, we review such a decoder, proposed
in [11], dubbed the integer-forcing (IF) source decoder, see
Figure 7. The decoding algorithm works as follows.
Inputs: Y, Σ, R, α.
Output: Estimates VˆIF, and XˆIF, for V and X, respectively.
Algorithm:
1) Solve
A = [a1| · · · |aK ]T
= argmin
A¯∈ZK×K
|A¯|6=0
max
k=1,...,K
1
2
log
(
a¯Tk
(
I+ 12α2Σ
)
a¯k
)
,
(20)
where |A| denotes the absolute value of det(A).
2) For k = 1, . . . ,K , compute
g¯k ,
[
aTk
(
Y +
1
2
)]
mod 2R = [gk] mod 2
R, (21)
g˜k ,
[
g¯k +
1
2
2R
]
mod 2R − 1
2
2R,
and set g˜ = [g˜1, . . . , g˜K ]
T .
3) Output VˆIF = A
−1g˜, and XˆIF = VˆIFα .
Remark 2: The optimization problem (20) requires a com-
putational complexity exponential in K , in general (unless
P=NP). However, the problem of finding the optimal integer
matrix A, need only be solved once for each covariance
matrix Σ and α. Thus, even if the solution to this problem
is computationally expensive, its cost is normalized by the
number of times this solution is used. In practice, one can
apply the LLL algorithm [39] in order to obtain a sub-optimal
A with polynomial complexity in K .
The next proposition, adapted from [11, Theorem 2] char-
acterizes the performance of modulo ADCs with the decoder
above.
Proposition 3: Let A = [a1| · · · |aK ]T be the matrix found
in step 1 of the algorithm above, and define
RIFSC(A) = max
k=1,...,K
1
2
log
(
aTk
(
I+ 12α2Σ
)
aTK
)
. (22)
We have that
Pr(EOL) = Pr(VˆIF 6= V) ≤ 2K exp
{
−3
2
· 22(R−RIFSC(A))
}
,
and
Dk = E
[(
Xk − Xˆk,IF
)2 ∣∣∣∣EOL
]
≤ 1
12α2(1− Pr(EOL)) ,
for all k = 1, . . . ,K , where the event EOL = {VˆIF = V} is
the complement of the event EOL = {VˆIF 6= V}.
The main idea behind the decoder above is the simple
observation that for any vector a = [a1, . . . , ak]
T ∈ ZK and
any vector h = [h1, . . . , hK ]
T ∈ RK we have that[
K∑
k=1
ak[hk] mod 2
R
]
mod 2R =
[
K∑
k=1
akhk
]
mod 2R.
(23)
Proof: By the identity (23), we have that the quantities
g¯k, computed in step 2 of the algorithm, satisfy
g¯k =
[
aTk
(
Y +
1
2
)]
mod 2R = [gk] mod 2
R,
where
gk , a
T
k
(
V +
1
2
)
.
Furthermore, g˜k ∈ [− 122R, 122R) is merely a cyclicly shifted
version of g¯k ∈ [0, 2R). Thus, g˜k = gk if and only if gk ∈
[− 122R, 122R). Consequently, VˆIF 6= V if and only if the event
EOL =
K⋃
k=1
{
|gk| ≥ 1
2
2R
}
,
occurs. Thus, by the union bound,
Pr(EOL) = Pr(VˆIF 6= V) ≤
K∑
k=1
Pr
(
|gk| ≥ 1
2
2R
)
. (24)
The random variable gk has zero mean, variance σ
2
k =
aTk
(
α2Σ+ 112 I
)
ak, and satisfies the conditions of Lemma 1.
We therefore have that
Pr
(
|gk| ≥ 1
2
2R
)
≤ 2 exp
{
−2
2R
8σ2k
}
= 2 exp
{
−3
2
· 22(R− 12 log(12σ2k))
}
= 2 exp
{
−3
2
· 22(R− 12 log(aTk (I+12α2Σ)ak))
}
.
Substituting this into (24) and recalling the definition of
9RIFSC(A), gives
Pe ≤ 2K exp
{
−3
2
· 22(R−RIFSC(A))
}
. (25)
Conditioned on the event EOL, i.e., the event that EOL did not
occur, we have that for all k = 1, . . . ,K
Dk = E
[(
Xk − Xˆk,IF
)2 ∣∣∣∣EOL
]
= E
[(
Zk +
1
2
α
)2 ∣∣∣∣EOL
]
≤ 1
12α2(1− Pr(EOL)) ,
where the last inequality follows similarly to (11).
As in the previous subsection, we set
R = RIFSC(A) + δ, (26)
such that
Pr(EOL) ≤ 2K exp
{
−3
2
· 22δ
}
, (27)
and set D = 1/12α2, which is a good approximation for the
upper bound we derived on Dk, provided that δ is not too
small. Consequently, we can write
RIFSC(A, D) , max
k=1,...,K
1
2
log
(
aTk
(
I+
1
D
Σ
)
ak
)
. (28)
The tradeoff between rate, distortion and error probability
achieved by the (R,α) mod-ADC with an integer-forcing
decoder is therefore characterized by equations (26), (27),
and (28). To put this result in context, we recall the information
theoretic benchmark [11]
RBTbench(D) ,
1
2K
log
∣∣∣∣I+ 1DΣ
∣∣∣∣ ,
that approximates the minimal quantization rate, per quantizer,
required by any computationally and delay unlimited system
in order to achieve MSE of at most D in the reconstructions
of each Xk, k = 1, . . . ,K . Thus,
RIFSC(A, D) −RBTbench(D)
=
1
2
log

maxk=1,...,K aTk (I+ 1DΣ) ak∣∣I+ 1DΣ∣∣ 1K

 . (29)
It is easy to show that the right hand side of (29) is non-
negative [11]. However, typically the gap is quite small,
and under certain distributions of practical interest on Σ,
the cumulative distribution function (CDF) of this gap can
be characterized [21]. A comprehensive comparison between
RIFSC(D) and R
BT
bench(D) was performed in [11], and it was
demonstrated that they are usually quite close.
We further remark that the integer-forcing source decoder
is merely one sub-optimal algorithm for solving (19). It is an
interesting avenue for future research to develop alternative
algorithms, with firm performance guarantees (as in (26), (27)
and (28)), for the same problem, or more ambitiously, for
solving (18).
III. OVERSAMPLED MODULO-ADC
In Section II-A we have demonstrated the effectiveness
of the modulo ADC architecture for acquiring stochastic
processes that are correlated in time. In particular, we have
shown that the performance of a modulo ADC depends on
the variance of the prediction error of the process {Vn =
αXn + Zn}, rather than the variance of Vn itself. However,
when designing an ADC, it is desirable to impose as few
constraints as possible on the signals that will be fed to the
ADC. Therefore, assuming that {Xn} is such that {Vn} is
highly predictable may be too restrictive.
Nevertheless, recalling that the process {Xn} is obtained
by sampling a continuous-time process X(t), we observe
that if the sampling rate is higher than Nyquist’s rate, {Xn}
will be bandlimited,3 and consequently, {Vn} will be highly
predictable no matter what the precise PSD of {Xn} happens
to be. In fact, this observation can be viewed as the rationale
underlying Σ∆-conversion. In particular, a Σ∆-converter is
information theoretically equivalent to a differential pulse-code
modulator (DPCM) whose input is a bandlimited signal with
flat spectrum [35].
While having many advantages, the implementation of Σ∆
converters is more involved than that of traditional scalar
uniform quantizers. The main challenge in the design of Σ∆
converters is the need to produce the quantization error, and
then apply a filter to this analog signal. A major obstacle is
that the generation of the quantization error requires to first
quantize the current sample, then apply a digital-to-analog
converter (DAC) to produce the analog representation of the
quantizer’s output, and finally to subtract this representation
from the original sample. See Figure 2. The quantizer and
the DAC need to be matched as otherwise the produced
quantization error is inaccurate. This, however, turns out to
be quite difficult to achieve, unless the quantizer is a simple
sign detector (1-bit quantizer).
To circumvent the challenges listed above, we develop
an oversampled modulo ADC, as an alternative to Σ∆-
conversion. The only assumptions made on the input process
{X(t)} is that it is bandlimited with maximal frequency at
most B, and that its variance is at most σ2. The developed
universal architecture is as follows. See Figure 3.
Analog-to-digital conversion: The process X(t) is uni-
formly sampled every TS = 1/2LB seconds, L > 1, such that
the sampling rate is L times above Nyquist’s rate. Each sample
of the obtained discrete-time process {Xn} is then discretized
using an (R,α) mod-ADC, resulting in the quantized process
{Yn = [αXn + Zn] mod 2R}.
As above, we define the unfolded process {Vn = αXn +
Zn}. The decoding procedure assumes {Vn−1, . . . , Vn−p} are
given, and computes an estimate for Vn, based on Yn.
Inputs: Yn, {Vn−1, . . . , Vn−p}, σ2, L, R, α.
Outputs: Estimates Vˆn and Xˆn for Vn and Xn, respectively.
Algorithm: The algorithm is exactly the same as that in
Section II-A, with only one difference. Here {CX [r]} is
unknown. Thus, for the computation of the p-tap prediction
3We say that a discrete-time process {Xn} is bandlimited, if there exists
some γ < pi such that SX(e
jω) = 0 for all ω ∈ (−pi,−γ) ∪ (γ, pi).
Since our analysis takes quantization noise into account, it is quite robust to
slight deviations from the assumption that SX(e
jω) is strictly band limited.
In particular, as long as SX(e
jω) ≪ D, for all ω ∈ (−pi,−γ) ∪ (γ, pi),
where D is the target MSE distortion, our analysis remains valid.
10
filter {hn}, we assume the PSD of {Xn} is
SX(e
jω) =
{
Lσ2 ω ∈ [− πL , πL)
0 ω /∈ [− πL , πL) , (30)
even though this assumption may, and is most likely to, be
wrong.
Final post-processing: After collecting a long sequence of
estimates {Xˆ1, . . . , XˆN} we apply a non-causal low pass filter
G(ejω) =
{
12α2Lσ2
1+12α2Lσ2 if ω ∈
[− πL , πL]
0 if ω /∈ [− πL , πL]
on them, to obtain the sequence {XˆLPF1 , . . . , XˆLPFN }.
The advantages over Σ∆ conversion are clear: the only
processing done in the analog domain is sampling and apply-
ing a modulo ADC, whereas all filtering operations are done
digitally at the decoder.
Proposition 1 provides an upper bound on the error proba-
bility Pr(EOLn) = Pr(Vˆn 6= Vn) in terms of R− 12 log(12σ2p).
However, Proposition 2, which characterizes the scaling of
1
2 log(12σ
2
p) with D, does not apply here for two reasons.
The first is that we use a mismatched prediction filter here,
due to the unknown PSD of {Xn}, and the second is that
whatever the exact PSD truns out to be, it is assumed to
be supported on the frequency interval [− πL , πL ], such that
h(Xn|Xn−1, . . .) = −∞, and the high-resolution assumption
never holds. Instead, we prove the following.
Proposition 4: Let {Xn} be a zero-mean stationary pro-
cess with variance E(X2n) ≤ σ2 and PSD supported in
frequency interval [− πL , πL ]. Let Vn = αXn + Zn where
Zn ∼ Unif([−1, 0)), and Vˆ pn be as in (6), where {hn}
is the optimal linear MMSE p-tap prediction filter for Vn,
from its past samples {Vn−1, . . . , Vn−p}, designed under the
assumption that SX(e
jω) is as in (30). Then
lim
p→∞ 12σ
2
p ≤
(
1 + 12α2Lσ2
) 1
L .
Proof: Let
SV˜ (e
jω) =
{
α2Lσ2 + 1/12 ω ∈ [− πL , πL ]
1/12 ω /∈ [− πL , πL ]
, (31)
and let Hp(e
jω) be the frequency response of the prediction
filter {hn}, which is designed with respect to (31). Further,
let H(ejω) = limp→∞Hp(ejω). By the basic principles of
optimal linear MMSE prediction, we have that
SV˜ (e
jω)|1−H(ejω)|2 = 2 12pi
∫
pi
−pi
log(SV˜ (e
jω))dω. (32)
Therefore, combining (31) and (32), we see that
|1−H(ejω)|2 =
{(
1 + 12α2Lσ2
) 1
L
−1
ω ∈ [− πL , πL ](
1 + 12α2Lσ2
) 1
L ω /∈ [− πL , πL ]
.
(33)
Applying this filter on the “actual” process Vn = αXn + Zn,
whose PSD is
SV (e
jω) =
{
α2SX(e
jω) + 1/12 ω ∈ [− πL , πL ]
1/12 ω /∈ [− πL , πL ]
,
we get
lim
p→∞
12σ2p = limp→∞
12E(Vn − V pn )2
=
1
2π
∫ π
−π
SV (e
jω)|1 −H(ejω)|2dω
=
(
1 + 12α2Lσ2
) 1
L
2π
[ ∫
ω/∈[−π/L,π/L]
1dω
+
∫ π/L
−π/L
(
1 + 12α2Lσ2
)−1 (
1 + 12α2SX(e
jω)
)
dω
]
≤ (1 + 12α2Lσ2) 1L , (34)
where the last inequality follows from our assumption that
1
2π
∫ π/L
−π/L SX(e
jω)dω = E(X2n) ≤ σ2.
It follows from Proposition 1 combined with Proposition 4,
that for a quantization rate of
R = δ +
1
L
1
2
log
(
1 + 12α2Lσ2
)
. (35)
the proposed system achieves Pr(EOLn) ≤ 2 exp{− 3222δ}, for
all input processes with bandwidth ≤ B and variance ≤ σ2.
After low-pass filtering with G(ejω), we get by a similar
analysis to that done in Section II-A and in [35], that for long
enough N such that the LPF can be treated as ideal, we have
that
D = E
[
(Xn − XˆLPFn )2
∣∣∣∣
N⋂
n=1
{Vˆn = Vn}
]
≤ σ
2
1 + 12α2Lσ2
1
1− Pr
(⋂N
n=1{Vˆn = Vn}
)
≤ σ
2
1 + 12α2Lσ2
1
1−N Pr(EOLn)
≤ σ
2
1 + 12α2Lσ2
1
1− 2N exp{− 3222δ}
. (36)
Thus, for large enough δ such that the total overload proba-
bility is small, i.e.,
2N exp
{
−3
2
22δ
}
≪ 1, (37)
we have that our system achieves distortion ≈ D with
R =
1
L
1
2
log
(
σ2
D
)
+ δ. (38)
The term 1L
1
2 log(
σ2
D ) is the rate-distortion function of a source
with PSD as in (30). Thus, up to the loss of δ bits per sample,
due to the one dimensional quantizer we are using, whose
size is dictated by (37), our system is optimal in the following
minimax sense: no system can attain a better tradeoff between
R and D simultaneously for all processes with bandwidth at
most B and variance at most σ2.
The multiplicative increase in quantization rate of the devel-
oped system, with respect to the fundamental rate-distortion
limit, is (12 log
(
σ2
D
)
+Lδ)/(12 log
(
σ2
D
)
). If X(t) were sam-
pled at its Nyquist rate, rather than L times above it, stan-
dard uniform scalar quantization would have achieved similar
overload probability and distortion with only a (12 log
(
σ2
D
)
+
δ)/(12 log
(
σ2
D
)
) multiplicative increase in rate with respect to
11
the fundamental limit. The disadvantage of the latter approach
is that it requires to use a high-resolution quantizer for each
sample, whereas the scheme developed here, allows to reduce
the number of quantization bits per sample, at the expanse
of an increased sampling rate. Thus, just like Σ∆ conversion,
the scheme developed here allows to replace slow but high-
resolution ADCs, with fast low-resolution ones.
IV. IMPLEMENTATION VIA RING OSCILLATORS
In this Section we develop an architecture for a circuit
implementing a modulo ADC, and provide a mathematical
model for its input-output characteristic. Our implementation
is essentially based on converting the input voltage into phase,
which can naturally only be observed modulo 2π, and then
quantizing the phase. To that end, we use ring oscillator ADCs,
as described next.
Consider a closed-loop cascade of N inverters, where N is
an odd number, all controlled with the same voltage Vdd, see
Figure 4. This circuit, which is referred to as a ring oscillator
can act as an ADC with sampling period Ts, when Vdd is set
to Vin(t) = g(X(t)), where X(t) is the analog signal to be
converted to a digital one and g(·) is a function to be specified,
and the state (‘0’ or ‘1’, corresponding to ‘low’ or ‘high’) of
each inverter is measured every Ts seconds.
It is well known that the time it takes for a non-ideal
inverter’s output to respond to a change in its input is a
function of Vdd [40], which we denote by ∆(Vdd) > 0. Taking
this delay into account, a moment of reflection reveals that at
each time instance, exactly one pair of adjacent inverters are
at the same state whereas all other pairs of adjacent inverters
are at distinct states. Denote by I ∈ {1, . . . , N} the index
of the first inverter within the pair that shares the same state,
and denote its state by B ∈ {0, 1}, i.e., the adjacent pair
of inverters with the same state are inverter I and inverter
[I+1] mod N , and their state is B. With this notation, we can
uniquely identify the states of all N inverters at time t with the
numberQt = (It−1)+N ·[It+Bt] mod 2 ∈ {0, . . . , 2N−1}.
See Figure 5.
A crucial observation is that the process Qt cyclically
oscillates in increments of +1 modulo 2N . More formally
stated, if t′ > t is the earliest time where Qt′ 6= Qt, then
Qt′ = [Qt + 1] mod 2N . We designate by Vn the number of
increments that occurred in the process {Qt} within the time
interval [nTS , (n+1)Ts), and define the output of the induced
modulo ADC as
Yn , [Vn] mod 2N = [Q(n+1)Ts −QnTs ] mod 2N.
Next, we relate Vn to the process Vin(t). To this end,
we make the simplifying assumption that X(t) is constant
within each time interval [nTs, (n+ 1)Ts), and consequently,
so is Vin(t). This assumption can be made exact by adding a
sample-and-hold circuit to the system. Assuming the function
∆(Vdd) is identical for all N inverters, we have that
QnTs =
[⌊
n−1∑
k=−∞
Ts
∆(Vin(kTs))
⌋]
mod 2N,
and consequently,
Yn =
[[⌊
n∑
k=−∞
Ts
∆(Vin(kTs))
⌋]
mod 2N
−
[⌊
n−1∑
k=−∞
Ts
∆(Vin(kTs))
⌋]
mod 2N
]
mod 2N
=
[⌊
n∑
k=−∞
Ts
∆(Vin(kTs))
⌋
−
⌊
n−1∑
k=−∞
Ts
∆(Vin(kTs))
⌋]
mod 2N,
where the last equality follows from the modulo distributive
law (2). Defining the “quantization error”
Zn =
⌊
n∑
k=−∞
Ts
∆(Vin(kTs))
⌋
−
n∑
k=−∞
Ts
∆(Vin(kTs))
∈ (−1, 0],
we can write
Yn =
[
n∑
k=−∞
Ts
∆(Vin(kTs))
+ Zn
−
n−1∑
k=−∞
Ts
∆(Vin(kTs))
− Zn−1
]
mod 2N
=
[
Ts
∆(Vin(nTs))
+ Zn − Zn−1
]
mod 2N.
Let us now define the function
f(x) =
1
∆(x)
,
which corresponds to the oscillation frequency of our circuit,
and is dictated by the characteristics of the inverters at hand,
and let us also take the function g(·) to be affine, such that
Vin(t) = a+bX(t). We further define the discrete time process
Xn = X(nTs), for all n ∈ N. We have therefore obtained the
model
Yn = [Ts · f(a+ bXn) + Zn − Zn−1] mod 2N. (39)
In general, the quantization noise process {Zn} is a deter-
ministic function of the process {Xn}. Nevertheless, as in
the analysis of the ideal modulo ADC, in the sequel we
make the simplifying assumption that it is an iid process with
Zn ∼ Unif((−1, 0]).
If f(·) were an affine function itself, with an appropriate
choice of the parameters a, b we could have induced the model
Yn = [αXn + Zn − Zn−1] mod 2R,
where R = log(2N), which is identical to the ideal (R,α)
mod-ADC, up to the fact that the quantization noise Zn−Zn−1
is now a first order moving-average (MA) process rather than
a white process. In practice, however, it is difficult to construct
inverters for which f(·) is approximately affine within a large
range. The effect of nonlinearities of f(·) on the performance
of the modulo ADC is numerically studied in the next section.
V. NUMERICAL EXPERIMENTS
We have conducted numerical simulations for the perfor-
mance of a ring oscillator based modulo ADC, where the input
is an oversampled process, as in Section III. In our simulations,
we have assumed that the inverters were produced using a
CMOS technology. The corresponding function f(Vin) relating
the input voltage to the output frequency of the oscillator,
12
V in[V]
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
f(V
in
)
×109
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
Fig. 8. The voltage to output frequency function f(Vin).
which was introduced in Section IV, is shown in Figure 8, as
obtained using a PSpice simulation.
A. Design of System Parameters
In all our simulations, we have designed the modulo ADC
and the corresponding decoder as described in Section III, i.e.,
under the assumption that the input signal X(t) is a Gaussian
stationary process with zero mean and variance σ2, whose PSD
is flat within the frequency interval [−B,B] and zero outside
this interval. The sampling rate is a factor of L > 1 above
the Nyquist rate, such that the sampling period is Ts =
1
2LB
seconds.
Given the oversampling ratio L, the number of inverters N ,
and the above assumptions on the statistics of X(t), the design
of the modulo ADC and its corresponding decoder consists of:
1) Choosing the shift and scaling parameters a and b for the
modulo ADC such that Vin(t) = a+ bX(t);
2) Designing the p-tap prediction filter {hn} for Vn =
Tsf(a + bXn) + Zn − Zn−1 given the past samples
{Vn−1, . . . , Vn−p};
3) Designing a 2k + 1-tap noncausal smoothing filter {gn}
for estimating Xn from {Vn−k, . . . , Vn+k}.
The decoding procedure consists of recovering an estimate
{Vˆn} for {Vn} from the modulo ADC’s outputs {Yn =
[Tsf(a+bXn)+Zn−Zn−1] mod 2N}, by applying the decod-
ing procedure described in Section III with the prediction filter
{hn}. Then, the estimate {Xˆn} is produced by applying the
smoothing filter {gn} to the process {Vˆn}, which is referred
to as final post-processing in Section III . The filters {hn} and
{gn} are chosen as the MMSE-optimal linear prediction and
smoothing filters, respectively. Calculating the coefficients of
{hn} requires knowledge of the second-order statistics of the
process {Vn}. This in turn, can be (numerically) calculated
from the pairwise distribution of {Xn, Xn−m}, m = 0, . . . , p,
which is fully characterized by our assumption that {Xn} is
a Gaussian process with PSD SX(e
jω) as in (30). Calculating
the coefficients of {gn} requires, in addition, the joint second-
order statistics of the processes {Xn, Vn}, which can either be
calculated numerically, or via Bussgang’s Theorem [41].
We apply the developed modulo ADC architecture to pro-
cesses of length T discrete samples. The parameters a and b
are chosen as follows: Let Pe = Pr(∪Tt=1Vˆt 6= Vt) be the block
error probability of our decoder, and let ǫ be our target block
error probability. For every a and b, we find the filters {hn}
and {gn} as described above, and compute the corresponding
Pe = Pe(a, b) via Monte Carlo simulation for a Gaussian
input process with PSD as in (30). Among all (a, b) for which
Pe(a, b) < ǫ, we choose the pair that results in the smallest
MSE distortion 1T
∑T
t=1 E(Xt − Xˆt)2. The target block error
probability for all of the setups we consider is ǫ = 10−3,
and the block length we consider is T = 211. Roughly, these
parameters correspond to allowing a per-sample overload error
probability of 10−3 · 2−11 ≈ 4.89 · 10−7.
B. Evaluation Method
The system was designed for a bandlimited Gaussian pro-
cess with a flat PSD. Nevertheless, we would like it to achieve
approximately the same MSE distortion and error probability
for all bandlimited processes with the same variance, regard-
less of the PSD within that band. For an ideal modulo ADC
and large p, this is indeed the case, as shown in Section III.
To test to what extent this remains the case also for the ring
oscillator based modulo ADC, we apply our system on two
types of processes: 1) A Gaussian process with variance σ2
and bandwidth B, whose PSD is flat within this band, for
which the system was designed; 2) A sinusoidal waveform,
whose frequency is chosen at random, uniformly on [0, B),
and whose amplitude is
√
2σ2, such that its power is σ2.
For each experiment, we also plot the theoretical perfor-
mance of an ideal (R,α) mod-ADC, as well as those of a
first-order Σ∆ (with the optimal 1-tap noise shaping filter)
converter, both designed to achieve the same target block error
probability for the bandlimited Gaussian stochastic process
X(t). Although overload errors have a different effect on Σ∆
converters and modulo ADCs, both systems fail to achieve
their target distortions unless those are avoided.
In the ADC literature, it is quite common to measure the
performance of a particular ADC for a sinusoidal input. One
drawback of this approach is that the deterministic nature of
the input signal allows to design the ADC such that overload
errors never occur, without significantly increasing its dynamic
range above the standard deviation of its input. For stochastic
processes, even if Gaussianity is assumed, the dynamic range
must be as large as multiple standard deviations of its input, in
order to ensure a small overload probability. In our derivations,
this is manifested through the rate backoff parameter δ, which
dictates the ratio between the quantizer’s dynamic range 2R
and the standard deviation of its input (which in our case is
the prediction error processes).
In order to allow a unified presentation of the results for both
Gaussian and sinusoidal processes, rather than plotting the rate
Rmod-ADC(D) required by the modulo ADC in order to achieve
an MSE distortion D with target block error probability ǫ, we
13
(a) B = 100Hz, L = 3 (b) B = 44.1KHz, L = 3
(c) B = 100KHz, L = 3 (d) B = 1MHz, L = 3
Fig. 9. Performance of ring oscillators based modulo ADC (RO-ADC). We plot SNR vs. quantization rate for a Gaussian process and for a sinusoidal
waveform processes with a random frequency, uniformly distributed over [0, B). For comparison we also plot the performance of an ideal (R, α) mod-ADC,
as well as those of an ideal first-order Σ∆ converter. For all curves, SNR is defined as σ2/D. The prediction filter has p = 25 taps, whereas the smoothing
filter has 2k + 1 taps for k = 22.
plot Rmod-ADC(D)− δ, where
δ =
1
2
log
(
−2
3
ln
( ǫ
2T
))
. (40)
This is consistent with traditional converter analyses that sep-
arate saturation effects from granularity ones [4], [37]. For our
parameters T = 211, ǫ = 10−3, (40) evaluates to δ ≈ 1.6717
bits. Note that by (12), δ is the rate backoff required in order
to attain block error probability below ǫ by an ideal modulo
ADC, when the input process is Gaussian. A similar analysis
reveals that the same rate backoff is also required for a Σ∆
converter to attain the same block error probability, under
the same assumptions on the input process [35]. Thus, in all
figures we also plot RΣ∆(D)− δ rather than RΣ∆(D), where
RΣ∆(D) is the rate needed by the Σ∆ converter to attain
distortion D with block error probability below ǫ.
C. Results and Discussion
We have performed experiments for the parameters L = 3
and four different values of B: 100Hz, 44.1KHz, 100KHz and
1MHz. The value of σ2 is immaterial, as it can be absorbed in
the parameter b. The results are depicted in Figures 9a, 9b, 9c
and 9d, respectively. The results are based on Monte Carlo
simulation, with 103 independent trials for each point in each
figure. No overload errors were observed for the choices of
a, b, {hn} and {gn} that correspond to each point in the
figures, neither for the Gaussian processes and neither for the
sinusoidal processes.
In general, the results indicate that the ring oscillator
implementation of a modulo ADC is closer to the ideal modulo
ADC for small bandwidths B and quantization rates R. In all
figures we observe the same trend: for small enough R the
curve of the SNR as a function of R for the ring oscillator
modulo ADC is parallel to that of the ideal modulo ADC,
and has a slope of ≈ 6L = 18dB/bit, in agreement with (38).
Then, for large enough R the system’s non-linearities “kick-
in” and the slope significantly decreases. Eventually, for large
enough R, the first-order Σ∆ converter outperforms the ring
oscillator modulo ADC, as can be observed in Figure 9d.
Nevertheless, for moderate values of R, even for B = 1MHz,
14
the improvement over the Σ∆ converter can be as large as
17dB.
The trends above are to be expected. Recall that the output
of the corresponding modulo ADC is given by (39). If b · σ
is small enough, the function f(a + bXn) resides in a small
interval around f(a) with high probability, and is well approx-
imated by the linear function f(a)+ bf ′(a)Xn. Consequently,
the output of the modulo ADC can be well approximated as
Yn ≈ [Tsbf ′(a)Xn + Zn − Zn−1 + Tsf(a)] mod 2N.
Since Tsf(a) is known and can be removed, this is equivalent
to a (Tsbf
′(a), log(2N)) mod-ADC, albeit with quantization
noise Zn − Zn−1 rather than Zn.
Typically, however, in order to get a large gain from using
a modulo ADC rather than a standard uniform quantizer, we
would like to use an (R,α) mod-ADC with α · σ ≫ 122R.
Thus, in order to get a “useful” modulo ADC that is close to
ideal, the two conditions (i) b · σ ≪ 1; (ii) Tsf ′(a) · b · σ ≫
N ; should hold. These two conditions can only be satisfied
simultaneously if Tsf
′(a) ≫ 1, i.e., when the sampling rate
is low, relative to f ′(a).
For an ideal (R,α) mod-ADC with a given target overload
error probability, as R increases α can also increase, resulting
in a smaller distortion. Similarly, for the ring oscillator modulo
ADC, the optimal choice of b should, in general, increase
with R. For small rates, the optimal value of b is also small,
such that the linear approximation for the function f(·) is not
too bad. However, as R, and consequently b, increases, the
nonlinearities start becoming significant and the slope of the
SNR as a function of R becomes smaller.
VI. MODULO ADCS FOR JOINTLY STATIONARY
PROCESSES
In this section we develop a scheme that uses K parallel
modulo ADCs for digitizing K jointly stationary processes,
provide a corresponding low-complexity decoding algorithm,
and characterize its performance.
Let {X1n}, . . . , {XKn } be K discrete-time jointly Gaussian
stationary random processes, obtained by sampling the jointly
Gaussian stationary processes X1(t), . . . , XK(t) every Ts
seconds. Let
Y kn = [αX
k
n + Z
k
n] mod 2
R, k = 1, . . . ,K, n = 1, 2, . . .
be the processes obtained by applying K parallel (R,α) mod-
ADCs, on {X1n}, . . . , {XKn }, where the input to the kth mod-
ulo ADC is the process {Xkn}, and {Zkn} is a Unif((−1, 0])
noise, iid in space and in time. Let
V kn = αX
k
n + Z
k
n, k = 1, . . . ,K, n = 1, 2, . . .
be the non-folded version of Y nk . Let Xn = [X
1
n, . . . , X
K
n ]
T ,
and define Yn, Zn and Vn similarly. Our goal is to recover
the process {Vn} from the outputs of the modulo ADCs with
high probability.
To achieve this goal, we employ a two-step procedure,
combining the schemes from Section II-A and Section II-B:
first we compute a predictor Vˆpn based on previous p samples
{Vn−1, . . . ,Vn−p} whose error is the vector Epn = Vn−Vˆpn.
By the same derivation as in Section II-A, we can produce
[Epn] mod 2
R from Yn and {Vn−1, . . . ,Vn−p}, where the
modulo operation applied to a vector is to be understood
as reducing each coordinate modulo 2R. Now, our task is
to decode a modulo-folded correlated random vector, which
can be done via the integer-forcing decoder described in
Section II-B. This relatively simple decoding procedure allows
to efficiently exploit both temporal and spatial correlations.
Below we describe it in more detail. See Figure 6. For all
ℓ,m ∈ {1, . . . ,K}, let Cℓm[r] = E(XℓnXmn−r).
Inputs: Yn, {Vn−1, . . . ,Vn−p}, {Cℓm[r]} for all ℓ,m ∈
{1, . . . ,K}, R, α.
Outputs: Estimates Vˆn and Xˆn for Vn and Xn, respec-
tively.
Algorithm:
1) Compute the optimal linear MMSE predictor forVn from
its last p samples
Vˆpn =
p∑
i=1
Hi ·
(
Vn−i +
1
2
)
− 1
2
,
where {Hn} is a p-tap matrix prediction filter, Hi ∈
R
K×K , for i = 1, . . . , p, computed based on {Cℓm[r]}
for all ℓ,m ∈ {1, . . . ,K} and α, and the shift by 1
2
compensates for E(Zn).
2) Compute
Wn = [Yn − Vˆpn] mod 2R,
where the modulo reduction is to be understood as taken
component-wise.
3) Define the pth order prediction error Epn , Vn − Vˆpn,
and compute its covariance matrix Σp = E
[
Epn(E
p
n)
T
]
based on {Cℓm[r]} for all ℓ,m ∈ {1, . . . ,K} and α.
Note that Σp is indeed invariant with respect to n due to
stationarity.
4) Solve
A = [a1| · · · |aK ]T
= argmin
A¯∈ZK×K
|A¯|6=0
max
k=1,...,K
1
2
log
(
12a¯TkΣpa¯k
)
. (41)
5) For k = 1, . . . ,K , compute
g¯kn ,
[
aTkWn
]
mod 2R
g˜kn ,
[
g¯kn +
1
2
2R
]
mod 2R − 1
2
2R,
and set g˜n = [g˜
1
n, . . . , g˜
k
n]
T .
6) Compute
Eˆpn = A
−1g˜n, Vˆn = Vˆpn + Eˆ
p
n, Xˆn =
Vˆn +
1
2
α
.
Proposition 5: Let A = [a1| · · · |aK ]T be the matrix found
in step 4 of the algorithm above, and define
RSTIFSC(A) = max
k=1,...,K
1
2
log
(
12aTkΣpa
T
K
)
. (42)
We have that
Pr(EOLn) = Pr(Vˆn 6= Vn) ≤ 2K exp
{
−3
2
· 22(R−RSTIFSC(A))
}
,
and
Dkn = E
[(
Xkn − Xˆkn
)2 ∣∣∣∣EOLn
]
≤ 1
12α2(1− Pr(EOLn))
,
15
for all k = 1, . . . ,K , where the event EOLn = {Vˆn = Vn}
is the complement of the event EOLn = {Vˆn 6= Vn}.
Proof: We first note that
Wn = [Yn − Vˆpn] mod 2R
=
[
[Vn] mod 2
R − Vˆpn
]
mod 2R
=
[
Vn − Vˆpn
]
mod 2R
= [Epn] mod 2
R,
where the second equality follows from the modulo distributive
law (2). By (23), we have that
g¯kn ,
[
aTkWn
]
mod 2R =
[
aTkE
p
n
]
mod 2R = [gkn] mod 2
R,
where
gkn = a
T
KE
p
n. (43)
Furthermore, g˜kn ∈ [− 122R, 122R) is merely a cyclicly shifted
version of g¯kn ∈ [0, 2R). Thus, g˜kn = gkn if and only if gkn ∈
[− 122R, 122R). Consequently, Eˆpn 6= En, and therefore Vˆn 6=
Vn, if and only if the event
EOLn =
K⋃
k=1
{
|gkn| ≥
1
2
2R
}
,
occurs. Now, repeating the same steps from the proof of
Proposition 3, we arrive at the claimed bounds.
Using Shannon’s lower bound, and applying similar argu-
ments as in [42], one can show that any quantization scheme
for the source {Xn} that produces R bits/sample/coordinate
and attains E(Xkn − Xˆkn)2 ≤ D, k = 1, . . . ,K , n = 1, . . .,
must have R ≥ 1Kh(Xn|Xn−1, . . .) − 12 log(2πeD). Let
Ep∗n = Xn − Xˆpn, where Xpn is the optimal pth order MMSE
(linear) predictor of Xn from {Xn−1, . . . ,Xn−p}, and let
Σ∗p = E
[
Ep∗n (E
p∗
n )
T
]
. We have that
h(Xn|Xn−1, . . . ,Xn−p) = h(Ep∗n |Xn−1, . . . ,Xn−p)
(a)
= h(Ep∗n )
(b)
=
1
2
log
(
(2πe)K |Σ∗p|
)
,
where (a) follows from the orthogonality principle of MMSE
estimation [37], and (b) from the fact that Ep∗n is a Gaussian
random vector [5]. Thus, for any quantization scheme we must
have
R(D) ≥ RSLB(D) , 1
2
log

 limp→∞ ∣∣Σ∗p∣∣ 1K
D

 .
Similarly to previous subsections, we set D = 1/12α2,
which is a good approximation for Dnk , k = 1, . . . ,K ,
provided that δ = R − RSTIFSC(A) is not too small. The rate
required by our scheme, as given in Proposition 5, depends
on 12Σp, which corresponds to the prediction error covariance
of the process X˜n =
√
12α2Xn + Z˜n =
1√
D
(Xn +
√
DZ˜n),
where Z˜n =
√
12Zn is a random vector with unit variance iid
entries. Let Σ˜p be the pth order prediction error covariance of
the process Xn+
√
DZ˜n. We can rewrite the rate required by
our scheme as
RSTIFSC(A, D) ,
1
2
log
(
maxk=1,...,K a
T
k Σ˜pak
D
)
.
Now, noting that if h(Xn|Xn−1, . . .) > −∞, we have that
2 3 4 5 6 7 8 9 10 11 12
Average Rate [bits]
0
5
10
15
20
25
30
35
40
(1/
D)
 [d
B]
SLB
IFSC
Naive
Fig. 10. Comparison between the average quantization rates RST
IFSC
(D),
RSLB(D), and Rnaive(D). The setup is that of quantizing vector of stationary
processes {X1n}, {X
2
n} described in the end of Section VI, with L = 5 and
p = 24.
Σ˜p → Σ∗p as D → 0, we obtain the following proposition.
Proposition 6: Assume h(Xn|Xn−1, . . .) ≥ −∞, and let
Σ∗ , limp→∞Σ∗p. We have that
lim
D→0
lim
p→∞
RSTIFSC(A, D)−RSLB(D)
=
1
2
log
(
maxk=1,...,K a
T
kΣ
∗ak
|Σ∗| 1K
)
. (44)
Thus, in the high-resolution regime, when taking large
enough p, the gap between RSTIFSC(A, D) and the information
theoretic lower bound is dictated by the loss of IFSC for a
source whose covariance vector is Σ∗. The right hand side
of (44) is non-negative [11], but is typically quite small. To
illustrate this, we generate two correlated processes {X1n}
and {X2n} as follows: let {W 1n}, {W 2n}, {W 3n} be three iid
N (0, 1) random processes. Let X1n =
∑L−1
i=0 hiW
3
n−i +W
1
n ,
and X1n =
∑L−1
i=0 giW
3
n−i +W
2
n , where {hn} and {gn} are
two filters, each with L taps. Clearly, when the filters have
sufficiently strong taps the process {Xn} = [{X1n}, {X2n}]T
will be highly correlated in time and in space. In Figure 10 we
plot the average rate required by the developed scheme, as well
as RSLB(D), and the rate required by a standard ADC, denoted
Rnaive(D), with respect to to an iid N (0, 100) distribution on
the 2L taps of {hn} and {gn}. In the simulations performed,
we took L = 5 and p = 24.
VII. CONCLUSIONS AND OUTLOOK
We have studied the modulo ADC architecture as an alter-
native approach for analog-to-digital conversion. The modulo
ADC allows exploitation of the statistical structure of the input
process digitally at the decoder without requiring the ADC
to adapt itself to the input statistics. We have demonstrated
the effectiveness of oversampled modulo ADCs as a simple
substitute to Σ∆ converters, allowing an increase in the
filter’s order far beyond that which is possible in current Σ∆
converters, since for modulo ADC filtering is done digitally.
Moreover, we have shown that, when used for digitizing jointly
16
stationary processes, parallel modulo ADCs can efficiently
exploit both temporal and spatial correlations.
An implementation of modulo ADCs via ring oscillators
was developed, and the corresponding input-output function
for the obtained modulo ADC was characterized in terms of
the delay–Vdd profile of the inverters that construct the ring os-
cillator. We have then numerically studied the performance this
implementation can attain for oversampled input processes,
and compared it to those of Σ∆ converters.
There are several important challenges for future research.
Perhaps most important is building a modulo ADC chip proto-
type. Although our simulations are based on the function f(·)
measured from an actual (PSpice model of a) ring oscillator
device, a hardware implementation is needed to fully assess
the benefits of modulo ADCs. Furthermore, we would like
to see whether it is possible to construct inverters with more
favorable properties for ring oscillator-based modulo ADCs. In
particular, we would like them to have a larger range where
they are well approximated by an affine function. Another
interesting avenue for future research is finding functions g(·)
that can be implemented in the analog domain, such that the
composition of function f ◦ g = f(g(·)) is more linear.
ACKNOWLEGEMENT
The authors are deeply grateful to Uri Erez, whose hum-
bleness is the only reason for his absence from the authors
list.
REFERENCES
[1] R. Walden, “Analog-to-digital converter survey and analysis,” IEEE
Journal on Selected Areas in Communications, vol. 17, no. 4, pp. 539–
550, 1999.
[2] B. Le, T. W. Rondeau, J. H. Reed, and C. W. Bostian, “Analog-to-digital
converters,” IEEE Signal Processing Magazine, vol. 22, no. 6, pp. 69–77,
Nov 2005.
[3] T. Berger, Rate distortion theory: A mathematical basis for data com-
pression. Prentice-Hall, 1971.
[4] N. S. Jayant and P. Noll, Digital coding of waveforms: principles and
applications to speech and video. Englewood Cliffs, NJ: Prentice-Hall,
1984.
[5] T. Cover and J. Thomas, Elements of Information Theory, 2nd ed.
Hoboken, NJ: Wiley-Interscience, 2006.
[6] M. Hovin, A. Olsen, T. S. Lande, and C. Toumazou, “Delta-sigma mod-
ulators using frequency-modulated intermediate values,” IEEE Journal
of Solid-State Circuits, vol. 32, no. 1, pp. 13–22, Jan 1997.
[7] M. Z. Straayer and M. H. Perrott, “A 12-bit, 10-MHz bandwidth,
continuous-time Σ∆ ADC with a 5-bit, 950-MS/s VCO-based quan-
tizer,” IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 805–814,
Apr. 2008.
[8] E. Telatar, “Capacity of multi-antenna Gaussian channels,” European
Transactions on Telecommunications, vol. 10, no. 6, pp. 585–595,
November - December 1999.
[9] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.
Cambridge: Cambridge University Press, 2005.
[10] E. G. Larsson, O. Edfors, F. Tufvesson, and T. L. Marzetta, “Massive
MIMO for next generation wireless systems,” IEEE Communications
Magazine, vol. 52, no. 2, pp. 186–195, February 2014.
[11] O. Ordentlich and U. Erez, “Integer-forcing source coding,” IEEE
Transactions on Information Theory, vol. 63, no. 2, pp. 1253–1269,
Feb 2017.
[12] T. Ericson and V. Ramamoorthy, “Modulo-PCM: A new source coding
scheme,” in ICASSP ’79. IEEE International Conference on Acoustics,
Speech, and Signal Processing, vol. 4, Apr 1979, pp. 419–422.
[13] G. Forney, “Maximum-likelihood sequence estimation of digital se-
quences in the presence of intersymbol interference,” IEEE Transactions
on Information Theory, vol. 18, no. 3, pp. 363–378, May 1972.
[14] V. Ramamoorthy, “A novel speech coder for medium and high bit
rate applications using modulo-PCM principles,” IEEE Transactions on
Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 356–368,
Apr 1985.
[15] P. Noll, “On predictive quantizing schemes,” The Bell System Technical
Journal, vol. 57, no. 5, pp. 1499–1532, May 1978.
[16] R. Zamir, Y. Kochman, and U. Erez, “Achieving the Gaussian rate-
distortion function by prediction,” IEEE Transactions on Information
Theory, vol. 54, no. 7, pp. 3354–3364, July 2008.
[17] P. T. Boufounos, “Universal rate-efficient scalar quantization,” IEEE
Transactions on Information Theory, vol. 58, no. 3, pp. 1861–1872,
March 2012.
[18] D. Valsesia and P. T. Boufounos, “Universal encoding of multispectral
images,” in 2016 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), March 2016, pp. 4453–4457.
[19] A. Bhandari, F. Krahmer, and R. Raskar, “On unlimited sampling,” arXiv
preprint arXiv:1707.06340, 2017.
[20] ——, “Unlimited sampling of sparse signals,” in IEEE Intl. Conf. on
Acoustics, Speech and Signal Processing (ICASSP), 2018.
[21] E. Domanovitz and U. Erez, “Outage probability bounds for integer-
forcing source coding,” in Proceedings of the IEEE Information Theory
Workshop (ITW 2017), Kaohsiung, Taiwan, Nov. 2017.
[22] R. Zamir, S. Shamai (Shitz), and U. Erez, “Nested linear/lattice codes
for structured multiterminal binning,” IEEE Transactions on Information
Theory, vol. 48, no. 6, pp. 1250–1276, June 2002.
[23] M. Tomlinson, “New automatic equalizer employing modulo arithmetic,”
Electron. Lett., vol. 7, pp. 138–139, Mar. 1971.
[24] H. Harashima and H. Miyakawa, “Matched-transmission technique for
channels with intersymbol interference,” IEEE Transactions on Commu-
nications, vol. 20, no. 4, pp. 774– 780, Aug. 1972.
[25] S.-N. Hong and G. Caire, “Compute-and-forward strategies for cooper-
ative distributed antenna systems,” Information Theory, IEEE Transac-
tions on, vol. 59, no. 9, pp. 5227–5243, Sept 2013.
[26] B. Nazer and M. Gastpar, “Compute-and-forward: Harnessing inter-
ference through structured codes,” IEEE Transactions on Information
Theory, vol. 57, no. 10, pp. 6463–6486, Oct. 2011.
[27] J. van Valburg and R. J. van de Plassche, “An 8-b 650-MHz folding
ADC,” IEEE Journal of Solid-State Circuits, vol. 27, no. 12, pp. 1662–
1666, Dec 1992.
[28] R. Venkataramani and Y. Bresler, “Perfect reconstruction formulas
and bounds on aliasing error in sub-Nyquist nonuniform sampling of
multiband signals,” IEEE Transactions on Information Theory, vol. 46,
no. 6, pp. 2173–2183, 2000.
[29] M. Vetterli, P. Marziliano, and T. Blu, “Sampling signals with finite rate
of innovation,” IEEE transactions on Signal Processing, vol. 50, no. 6,
pp. 1417–1428, 2002.
[30] M. Mishali and Y. C. Eldar, “From theory to practice: Sub-Nyquist
sampling of sparse wideband analog signals,” IEEE Journal of Selected
Topics in Signal Processing, vol. 4, no. 2, pp. 375–391, April 2010.
[31] U. Erez and R. Zamir, “Achieving 1
2
log (1 + SNR) on the AWGN
channel with lattice encoding and decoding,” IEEE Transactions on
Information Theory, vol. 50, no. 10, pp. 2293–2314, Oct. 2004.
[32] R. M. Gray, “Quantization noise spectra,” IEEE Transactions on Infor-
mation Theory, vol. 36, no. 6, pp. 1220–1244, Nov 1990.
[33] O. Ordentlich and U. Erez, “Precoded integer-forcing universally
achieves the MIMO capacity to within a constant gap,” IEEE Trans-
actions on Information Theory, vol. 61, no. 1, pp. 323–340, Jan 2015.
[34] C. Feng, D. Silva, and F. Kschischang, “An algebraic approach to
physical-layer network coding,” IEEE Transactions on Information
Theory, vol. 59, no. 11, pp. 7576–7596, Nov 2013.
[35] O. Ordentlich and U. Erez, “Performance analysis and optimal filter
design for sigma-delta modulation via duality with DPCM,” IEEE
Transactions on Information Theory, Submitted Jun., under revision
2015.
[36] R. M. Gray et al., “Toeplitz and circulant matrices: A review,” Founda-
tions and Trends R© in Communications and Information Theory, vol. 2,
no. 3, pp. 155–239, 2006.
[37] A. Gersho and R. M. Gray, Vector quantization and signal compression.
Springer Science & Business Media, 2012, vol. 159.
[38] D. Micciancio and S. Goldwasser, Complexity of Lattice Problems:
A Cryptographic Perspective. Cambridge, UK: Kluwer Academic
Publishers, 2002, vol. 671 of The Kluwer International International
Series in Engineering and Computer Science.
[39] A. K. Lenstra, H. W. Lenstra, and L. Lovász, “Factoring polynomials
with rational coefficients,” Mathematische Annalen, vol. 261, no. 4, pp.
515–534, 1982.
17
[40] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital integrated
circuits: a design perspective. Pearson Education, 2003.
[41] J. J. Bussgang, “Crosscorrelation functions of amplitude-distorted Gaus-
sian signals,” 1952.
[42] R. Zamir and T. Berger, “Multiterminal source coding with high reso-
lution,” IEEE Transactions on Information Theory, vol. 45, no. 1, pp.
106–117, 1999.
