Abstract-Systems that capture and process analog signals must first acquire them through an analog-to-digital converter. While subsequent digital processing can remove statistical correlations present in the acquired data, the dynamic range of the converter is typically scaled to match that of the input analog signal. The present paper develops an approach for analog-todigital conversion that aims at minimizing the number of bits per sample at the output of the converter. This is attained by reducing the dynamic range of the analog signal by performing a modulo operation on its amplitude, and then quantizing the result. While the converter itself is universal and agnostic of the statistics of the signal, the decoder operation on the output of the quantizer can exploit the statistical structure in order to unwrap the modulo folding. The performance of this method is shown to approach information theoretical limits, as captured by the rate-distortion function, in various settings. An architecture for modulo analog-to-digital conversion via ring oscillators is suggested, and its merits are numerically demonstrated.
I. INTRODUCTION
Analog-to-digital converters (ADCs) are an essential component in any device that manipulates analog signals in a digital manner. While digital systems have benefited tremendously from scaling, their analog counterparts have become increasingly challenging. Consequently, it is often the case that the ADC constitutes the main bottleneck in a system, both in terms of power consumption and real estate, and in terms of the quality of the system's output. Developing more efficient ADCs is therefore of great interest [1] , [2] .
The quality of an ADC is measured via the tradeoff between various parameters such as power consumption, size, cost of manufacturing, and the distortion between the input signal and its digitally-based representation. For the sake of a unified, technology-independent, discussion, it is convenient to restrict the characterization of an ADC quality to three basic parameters: 1) The number of analog samples per second F S ; 2) The number of "raw" output bits R the ADC produces per sample (before subsequent possible compression); 3) The mean squared error (MSE) distortion D between the input signal and a reconstruction that is based on the output of the ADC.
While different applications may require different tradeoffs between F S , R and D, it is always desirable to design the ADC such that all three parameters are as small as possible.
O. Ordentlich is with the Hebrew University of Jerusalem, Israel (email: or.ordentlich@mail.huji.ac.il). G. Tabak, P. K. Hanumolu and A. C. Singer are with the University of Illinois, Urbana-Champaign, USA (emails: {tabak2,hanumolu,acsinger}@illinois.edu). G. W. Wornell is with the Massachusetts Institute of Technology, MA, USA (email: gww@mit.edu)
The focus of this work is on the quantization rate R. For a given sampling frequency F S , and a given target distortion D, our goal is to design ADCs that use the smallest possible number of raw output bits per sample.
The problem of analog-to-digital conversion can be seen as an instance of the lossy source coding/lossy compression problem [3] - [5] , as the output of an ADC is a binary sequence, which represents the analog source. A unique key feature of the analog-to-digital conversion problem is that the encoding of the source is carried out in the analog domain, while the decoding procedure is purely digital. Given the limitations of analog processing, it is therefore generally only practical to exploit the source structure at the decoder. Hence, the type of source coding schemes that are suitable for data conversion, are those that approach fundamental limits without requiring knowledge of the source structure at the encoder. In addition, latency and complexity constraints in data conversion, typically preclude the use of schemes other than those based on scalar quantization.
The input signal to an ADC is often known to have structure that could be exploited to reduce the overall bit rate of its representation, R. In our analysis, it will be convenient to express this structure using a stochastic model for the input. Consequently, throughout the paper, we will model the input to the ADC as a stationary stochastic Gaussian process X(t), whose power spectral density (PSD) encapsulates the assumed structure. More generally, we will sometimes also consider the problem of analog-to-digital conversion of a vector X(t) = {X 1 (t), . . . , X K (t)} of jointly stationary stochastic Gaussian processes, via K parallel ADCs, the input to each one of them is one of the K processes.
Under such stochastic modeling, rate-distortion theory [3] provides the fundamental lower bound F s · R > R X (D) for any ADC (and corresponding decoder) that achieves distortion D, where R X (D) is the rate-distortion function of the process X(t) in bits per second. In general, achieving the rate-distortion function of a source requires using sophisticated high-dimensional quantizers, whereas analog-to-digital conversion is invariably done via scalar uniform quantizers. Thus, achieving this lower bound with ADCs seems overly optimistic. Nevertheless, as we shall see, approaching the ratedistortion bound, up to some inevitable loss due to the onedimensional nature of the quantization, is sometimes possible by a simple modification of the scalar uniform quantizer, namely, a modulo ADC, followed by a digital decoder that efficiently exploits the source structure.
Instead of sampling and quantizing the process X(t), a modulo ADC samples and quantizes the process [X(t)] mod ∆, where the modulo size ∆ is a design parameter. See Figure 1 . Equivalently, a modulo ADC can be thought of as a standard uniform scalar ADC with step-size δ and an arbitrarily large dynamic range/support, but that outputs only the R least significant bits in the description of each sample, where 2 R = ∆ δ . The benefit of applying the modulo operation on X(t) is in reducing its dynamic range/support, which in turn enables a reduction of the number of bits per sample produced by the ADC, without increasing the quantizer's stepsize. This operation, which corresponds to disregarding coarse information about X(t), will otherwise substantially degrade the source reconstruction. However, by properly accounting for the modulo operation and appropriately choosing its parameter ∆, we can unwrap the modulo operation with high probability using previous samples of X(t) and exploiting the (redundant) structure in the signal.
Following standard system design methodology, in the performance analysis of a modulo ADC, we distinguish between two events: 1) The no-overload eventĒ OL where the decoder was able to correctly unwrap the modulo operation. We require the MSE distortion, conditioned on this event, to be at most D; 2) The overload event E OL where the decoder fails in unwrapping the modulo operation. We require the probability of this event Pr(E OL ) to be small, but do not concern ourselves with the MSE distortion conditioned on the occurrence of this event.
A. Our Contributions
This work further develops the modulo ADC framework in three complementary directions, as specified below.
1) Oversampled Modulo ADC:
We show that a modulo ADC can be used as an alternative to Σ∆ converters. A Σ∆ converter is based on oversampling the input process X(t), i.e., sampling above the Nyquist rate, in conjunction with noise-shaping, which pushes much of the energy of the quantization noise to high frequencies, where there is no signal content. See Figure 2 . The noise shaping operation requires incorporating an elaborate mixed signal feedback circuit. In particular, the circuit first generates the quantization noise, which necessitates using not only an ADC, but also an accurately-matched digital-to-analog converter (DAC), and then applies an analog filter. The analog nature of the signal processing makes it challenging to use filters of high-orders, which in turn limits performance.
We develop an alternative architecture (Section III) that shifts much of the complexity to the decoder, whereas the "encoder" is simply a modulo ADC. See Figure 3 . The parameter ∆ in the modulo ADC, as well as the coefficients of the prediction filter in Figure 3 , depend only on the bandwidth B of the input process X(t) and on its variance σ 2 , and not on the other details of its PSD. Similarly, the MSE distortion between the input process and its reconstruction, depends only on B and σ 2 . Thus, the developed architecture is as agnostic as Σ∆ converters to the statistics of the input process. Furthermore, for a flat-spectrum process, the distortion is within a small gap, due to one-dimensionality of the encoder, from the information theoretic limit.
2) A Phase-Domain Implementation of Modulo ADC via Ring Oscillators: We develop a modulo ADC implementation that performs the modulo reduction inherently as part of the analog signal acquisition process. As the phase of a periodic waveform is always measured modulo 2π, a natural class of candidates are ADCs that first convert the input voltage into phase, and then quantize that phase. A notable representative within this class, which has been extensively studied in the literature [6] , [7] , is the ring oscillator ADC.
Consider a closed-loop cascade of N inverters, where N is an odd number, all controlled with the same voltage V dd = V in , see Figure 4 . This circuit, which will be described in detail in Section IV, oscillates between 2N states, corresponding to the values ('low' or 'high', represented by '0' or '1') of each of the N inverters. See Figure 5 . The oscillation frequency is controlled by V dd . Due to the oscillating nature of the circuit, if we sample its state every T S seconds, we cannot tell how many "state changes" occurred between two consecutive samples, but we are able to determine this number modulo 2N . Thus, by setting V dd to V dd (t) = g(X(t)), where X(t) is the analog signal to be converted to a digital one and g(·) is a function to be specified, we obtain a modulo ADC. The input-output relation of this modulo ADC is characterized in Section IV, and depends on the response time of the inverters to change in their input, as a function of V dd . In practice, the modulo operation realized in this way deviates from the ideal characteristic of Figure 1 in a variety of ways. Accordingly, we perform several numerical experiments to evaluate and optimize the performance of an oversampled ring oscillator modulo ADC, and compare it to the performance of an ideal modulo ADC as well as to a Σ∆ converter. The results demonstrate that despite the non-idealities in the ring oscillator implementation, in some regimes, this architecture holds substantial potential for improvement over existing ADCs.
3) Modulo ADCs for Jointly Stationary Processes:
In many applications the number of sensors/antennas observing a particular process is greater than the number of degrees-offreedom (per time unit) governing its behavior. Thus, there is a redundancy at the receiver that can be exploited. However, as this redundancy can be spread across time and space, traditional ADC architectures, as well as the modulo ADC architectures described in Section II-A and II-B, are insufficient. In this part of the paper, we show how to address this problem via a natural extension of the modulo ADC framework.
As an example we will consider the problem of wireless communication. It is by now well established that using receivers, as well as transmitters, with multiple antennas, dramatically increases the achievable communication rates over wireless channels [8] , [9] . However, adding antennas comes with the price of requiring multiple expensive and power hungry RF chains. For traditional ADC architectures, power and cost scale linearly with the number of receive antennas, which motivates an alternative solution.
It is often the case, that the signals observed by the different receive antennas are highly correlated, in time and in space. As an illustrative example, consider the case where the transmitter has one antenna, whereas the receiver has K > 1 antennas. We can model the signal observed at each of the antennas, after sampling, as
where {X n } is the process emitted by the transmitter, {h k n } is the kth channel impulse response, and {Z k n } are independent additive white Gaussian noise (AWGN) processes.
Since all K output processes {Y 1 n }, . . . , {Y K n } in (1) are noisy and filtered versions of of the same input process, they will typically be highly correlated. However, this correlation may be spread in time (the n-axis) and in space (the kaxis). As an extreme example, assume {X n } is an iid process, and the filters simply incur different delays, i.e., h This phenomenon, where the signals observed by the different ADCs are highly correlated, is not unique to the wireless communication setup, and appears in many other applications, e.g., multi-array radar. It is, however, taken to the extreme in massive MIMO [10] , where the number of antennas at the base station is of the order of tens or even hundreds, while the number of users it supports may be substantially fewer.
In Section VI we develop an architecture that uses modulo ADCs, one for each receive antenna, in order to exploit the space-time correlation of the processes. We develop a lowcomplexity decoding algorithm for unwrapping the modulo operations. This algorithm combines the idea of performing prediction in time, of the quantized vector process from its past, with that of integer-forcing source decoding [11] , which is used for exploiting spatial correlations in the prediction error vector. See Figure 6 . In the limit of small D, the loss of the developed analog-to-digital conversion scheme with respect to the information theoretic lower bound on D, is shown to reduce to that of the integer-forcing source decoder.
B. Related Work
The idea of using modulo ADCs/quantizers for exploiting temporal correlations within the input process X(t) towards reducing the quantization rate R, dates back, at least, to [12] , where a quantization scheme, called modulo-PCM, was introduced. A decoding scheme for unwrapping the modulo operation, based on maximum-likelihood sequence detection [13] , was further proposed in [12] , and a heuristic analysis was performed, based on prediction of X(t) from its past, which shows that modulo-PCM can approach the Shannon lower bound under the high-resolution assumptions. In Section II-A, we develop a more complete analysis of modulo quantization, the details of which are required for the application we discuss in Section III.
The architecture from Figure 3 is based on using a prediction filter at the decoder, as a part of the modulo unwrapping process, as was hinted at in [12] (see also [14] ). In agreement with the literature on differential pulse-code modulation
Encoder Decoder (DPCM) at the late 1970s (see e.g. [15] ), the authors in [12] proposed to design the prediction filter as the optimal one-step predictor of the unquantized process {X n } from its past. As shown in [16] , this design criterion is sub-optimal, and the "correct" design criterion is to take this filter as the one-step predictor of the quantized process from its past. The difference between the two design criteria is significant for oversampled processes, which are the focus of Section III, whose PSD is zero at high frequencies, as in those frequencies the signal-todistortion ratio is zero, no matter how small the quantization noise is. Our analysis in Section III reveals that designing the modulo size ∆ and the prediction filter with respect to a quantized flat-spectrum input process, results in a universal system. This means, that this system attains the same distortion D for all input processes that share the same support for the PSD and the same variance.
The use of modulo ADCs/quantizers was also studied by Boufounos in the context of quantization of oversampled signals [17] (see also [18] ). In particular, it is shown in [17] that by randomly embedding a measurement vector in R K onto an M ≫ K dimensional subspace, and using a modulo ADC for quantizing each of the coordinates of the result, one can attain a distortion that decreases exponentially with the oversampling ratio, with high probability. In Section III we consider a similar setup, where an oversampled analog signal, with oversampling ratio L > 1, i.e. F s is L times greater than the Nyquist frequency, is digitized by a modulo ADC. In the language of [17] , this corresponds to embedding X ∈ R K to an M = LK dimensional space by zero-padding followed by interpolation, which is indeed a linear operation. We show that for this particular "embedding" not only is the decay of MSE distortion exponential in the oversampling ratio, but the attained distortion is information-theoretically optimal, up to a constant loss, which is explicitly characterized, due to the scalar nature of the quantizer. Moreover, under this "embedding", a simple low-complexity decoding algorithm exists, whereas for the random projection case studied in [17] , no computationally efficient decoding algorithm was given. One advantage, on the other hand, of the approach from [17] , is that it is applicable to 1-bit modulo ADCs, whereas the performance of the scheme from Section III typically becomes attractive starting from R 2 bits per sample.
Very recently, Bhandari et al. have addressed the question of what is the minimal sampling rate that allows for exact recovery of a bandlimited finite-energy signal, from its moduloreduced sampled version [19] (see also [20] ). They have found that a sufficient condition for correct reconstruction is sampling above the Nyquist rate by a factor of 2πe, regardless of the size of the modulo interval. The analysis in [19] did not take quantization noise into account, which corresponds to R = ∞ and D = 0 in our setup.
The merits of a modulo ADC for distributed analog-todigital conversion of signals correlated in space, but not in time, were demonstrated in [11] . A low-complexity decoding algorithm, for unwrapping the modulo operation, was proposed and its performance was analyzed. It was demonstrated via numerical experiments that the performance is usually quite close to the information theoretic lower bounds (See also [21] ). In Section II-B, we summarize the decoding scheme from [11] and the corresponding performance analysis, as those will be needed in Section VI, where we develop a modulo ADC architecture for analog-to-digital conversion of jointly stationary processes. The decoding algorithm for this setup, as well as its performance analysis, is inspired by the ideas and techniques from Sections II-A and II-B.
In a broader sense, modulo quantization is closely related to Wyner-Ziv's source coding with side information setup and to its channel coding dual, which is the Gel'fandPinsker setup [22] . In the latter context, we further note that modulo quantization is widely used for communication over intersymbol interference channels [23] , [24] . Recently, Hong and Caire [25] considered modulo ADCs as potential candidates for the front end of receivers in a cloud radio access network (CRAN), employing compute-and-forward [26] based protocols.
Note that the although the concept of modulo ADC is reminiscent of folding ADCs [27] , an important difference is that unlike the latter, the former does not keep track of the number of folds that occurred and, moreover, its functionality does not depend on this number, i.e., it does not saturate for large inputs. In unwrapping the modulo operation at the decoder, the missing information about number of folds is recovered, and we are able to attain the same D with smaller rate.
Finally, another related line of work, is that of compressed sampling, see, e.g., [28] - [30] , where the goal is to design universal and efficient ADCs with a small sampling frequency F S , under the assumption that the input signal occupies only a small portion of its total bandwidth, but the exact support is unknown.
C. Organization
The rest of the paper is organized as follows. In Section II we formally define the modulo ADC and study its performance for stationary scalar input processes, and for random vectors (spatial correlation). Section III develops the use of oversampled modulo ADCs as a substitute for Σ∆ converters, and analyzes the tradeoffs this architecture achieves. In Section IV we introduce an implementation of modulo ADCs via ring oscillators and establish the corresponding input-output mathematical model. Numerical experiments for evaluating the performance of ring oscillators based oversampled modulo ADCs are performed in Section V. Section VI proposes to use parallel modulo ADCs for digitizing jointly stationary processes. The paper concludes in Section VII.
II. PRELIMINARIES ON IDEAL MODULO ADC
Let ∆ ∈ R + be a positive number, and define the mod∆ operation as
where the floor operation ⌊x⌋ returns the largest integer smaller than or equal to x. By definition, we have that for any x, y ∈ R and
An R-bit modulo ADC with resolution parameter α,
where we have assumed that 2 R is an integer. In case R itself is an integer, each sample of [x] R,α can be represented by R bits. Otherwise, we can buffer n consecutive samples [x 1 ] R,α , . . . , [x n ] R,α and represent them by ⌈nR⌉ ≤ nR + 1 bits, such that the average number of bits per sample is ≤ R + 1 n . The role of α here is to scale the input prior to quantization. We can write (3) is clearly a deterministic function of x. Nevertheless, throughout this paper we will model this error term as additive uniform noise Z ∼ Unif((−1, 0]) statistically independent of x, such that the (R, α) mod-ADC will be treated as a stochastic channel with input x and output Y , related as
The approximation of the (R, α) mod-ADC by the additive modulo channel (4) can be made exact via the use of subtractive dithers. Specifically, we can use a random variable U ∼ Unif([0, 1)), statistically independent of x, which we refer to as a dither, and feedx = x + U/α to the (R, α) mod-ADC instead of feeding x. The output of the modulo ADC in this case will be
Subtracting U from [x] R,α and reducing the result modulo 2 R , we obtain
where the last equality follows from the distributive law of modulo (2) . Note that for every x ∈ R, the random variable
, and is therefore independent of x [31, Lemma 1] . Thus, with subtractive dithers, the additive noise model (4) is exact. We note that even when dithering is not used, under suitable conditions this approximation is quite accurate [32] . Although the modulo operation entails loss of information in general, in many situations it is possible to unwrap it, i.e.,
R with high probability. 1 In particular, let
and note that conditioned on the no-overload event
we have thatỸ = αx + Z. Thus, if Pr(E OL ) is close to 1, the modulo operation has no effect with high probability. Note that Pr(E OL ) = Pr |αx + Z| > 1 2 2 R is identical to the probability that a standard uniform quantizer with dynamic range (support) 2 R /α is in overload. Thus, when thinking of x as a single observation, it is unclear what the advantages of a modulo ADC are with respect to a traditional uniform ADC. However, as we illustrate below, the modulo ADC allows exploitation of the statistical structure of the acquired signal in a much more efficient manner than the standard ADC.
The following lemma is proved using Chernoff's bound, and will be useful in the sequel for bounding Pr(E OL ) in various scenarios.
Lemma 1 ( [33, Lemma 4] , [34, Theorem 7] ):
are iid Gaussian random variables with zero mean and some variance σ
A. Modulo ADCs for Scalar Stationary Processes
Let {X n } be a zero-mean discrete-time stationary Gaussian stochastic process, obtained by sampling a stationary Gaussian process X(t) every T S seconds. Let
be the process obtained by applying a (R, α) mod-ADC on the process {X n }, where {Z n } is a Unif((−1, 0]) iid noise, and let V n = αX n + Z n , n = 1, 2, . . .
be its non-folded version. Our goal is to design a decoder that recovers V n from the outputs of the modulo ADC, {Y n }, with high probability. To that end, we assume the decoder has access to {V n−1 , . . . , V n−p }, an assumption that will be justified in the sequel, and that it knows the autocovariance function C X [r] = E[X n X n−r ] of {X n }. We apply the following algorithm (See also Figure 3 for a schematic illustration):
Output: EstimatesV n ,X n , for V n and X n , respectively.
Algorithm:
1) Compute the optimal linear MMSE predictor for V n from its last p sampleŝ
where {h n } is a p-tap prediction filter, computed based on {C X [r]} and α, and the shift by 1/2 compensates for
Remark 1: Note that {h n } is the p-tap prediction filter for the quantized process {V n } from its past, rather than for {X n } from its past. While the loss for using the latter, instead of the former, becomes insignificant when highresolution assumptions apply, it can be arbitrarily large for oversampled processes, for which high-resolution assumptions never hold [16] , [35] . The filter coefficients {h n } need only be computed once, and can then be used for all times.
The following proposition characterizes the performance of the algorithm above. All logarithms in this paper are taken to base 2, unless stated otherwise.
Proposition 1: LetV p n ,V n andX n be as defined in the algorithm above, and let σ
and
where the event E OLn = {V n = V n } is the complement of the event E OL n = {V n = V n }.
n be the pth order prediction error of the process {V n }, and note that its variance σ
is invariant to n due to stationarity. We have that
where equation (9) follows from the modulo distributive law (2), and constitutes the key advantage of the modulo operation for exploiting temporal correlations. Note thatW
, as in (5) . Therefore, conditioned on the event
n is a zero-mean linear combination of statistically independent Gaussian and uniform random variables, such that Lemma 1 applies, and we have that
Whenever E OLn occurs, we have thatV n = V n , and consequentlyX
Proposition 1 shows that we can make Pr(E OLn ) as small as 2e
For example, taking δ = 2 bits, results in an overload probability smaller than 10 −10 . In particular, unless we take a very small δ, we have that 1 − Pr(E OL n ) ≈ 1, and consequently, by Proposition 1, we will have D ≈ 1/12α
2 . Thus, to simplify expressions in the analysis that follows, we assume D = 1/12α 2 . We note the tradeoff in choosing α: on the one hand, increasing α decreases the MSE distortion D, but on the other hand the prediction error variance σ 2 p of the process V n = αX n + Z n increases with α such that the required rate R for avoiding overload errors increases. Thus, the tradeoff between D and the required quantization rate is controlled through the parameter α. We now turn to characterize the tradeoff the developed scheme achieves.
Let h(A) denote the differential entropy of the random variable A, and h(A|B) the conditional differential entropy of A given the random variable B [5] . Recall that for a stationary Gaussian process {X n } with PSD S X (e jω ) we have that [36] h(X n |X n−1 , . . .) = 1 2π
and in particular h(X n |X n−1 , . . .) = −∞ if and only if S X (e jω ) = 0 over a measurable subset of [−π, π). Shannon's lower bound [3] , states that the number of bits per sample R produced by any quantizer that attains an MSE distortion D must satisfy
It is well-known that for Gaussian processes with finite h(X n |X n−1 , . . .), Shannon's lower bound is asymptotically tight, i.e.,
Proof: We can write
where E ′p n is the pth order prediction error of the process
For a Gaussian process {X n }, the condition h(X n |X n−1 , . . .) > −∞ is equivalent to 1 2π
As a consequence of (15), we have that
By Paley-Wiener's theorem [37] , we have that
Combining (16) and (17), we obtain that lim
for processes with finite entropy rate h(X n |X n−1 , . . .). The result now follows by rearranging terms. For the practically important case where {X n } is obtained by oversampling the process {X(t)}, which is studied in Section III, the assumption h(X n |X n−1 , . . .) > −∞ of Proposition 2 does not hold. Nevertheless, we will show that the modulo ADC nevertheless achieves performance that is close to the information theoretic limits.
Above, we have assumed that the decoder has access to the non-folded samples {V n−1 , . . . , V n−p }. To justify this assumption, an initialization step is needed, where the decoder acquires the first p consecutive samples {V 1 , . . . , V p }, or estimates of these samples. Once those are obtained, we can apply the algorithm described above, sample-by-sample, and assume the estimateV n produced by the algorithm at time n is correct, and can be used as an input for the algorithm in the next p steps. All samples V p+1 , . . . , V T will be recovered correctly, as long as no overload error occurred within the T − p decoding steps. Thus, by the union bound, we see that the first T −p samples are recovered correctly with probability at least 1 − 2T e
One conceptually simple way of performing the initialization, i.e., obtaining {V 1 , . . . , V p } is by using a standard scalar quantizer with high-rate for the first p samples. Although the high power consumption of such a quantizer will have a negligible effect on the total power consumption, due to the fact it is used only for a small fraction of the time, this approach has the disadvantage of having to include two ADCs, a high-rate standard ADC and a modulo ADC withing the system. Alternatively, one can perform the initialization using only a R bit modulo ADC in one of the two following ways: 1) Increase α gradually until it reaches its final value. For the first sample, α 1 will be chosen such that V 1 = α 1 X 1 +Z 1 is w.h.p. within the modulo interval, such that no prediction is needed. Next, we can use V 1 in order to predict V 2 = α 2 X 2 + Z 2 , which allows to use α 2 > α 1 such that the prediction error is still within the modulo interval.
Continuing this way, we can keep increasing α until convergence. 2) We can collect a long vector of outputs from the modulo ADC, say {Y 1 , . . . , Y K }, K > p, and unwrap the modulo operation via the integer-forcing source coding scheme described in the next subsection. The amount of computations per sample required in this method is greater than that of the "steady state", i.e., after initialization is complete, but since initialization is rarely performed, the effect on the total complexity is negligible.
B. Modulo ADCs for Random Vectors
Let X ∼ N (0, Σ) be a K-dimensional Gaussian random vector with zero mean and covariance matrix Σ. Let
be obtained by applying K identical (R, α) mod-ADCs, each applied to a different coordinate of the vector X, where the quantization noises Z k ∼ Unif((−1, 0]), k = 1, . . . , K, are iid, and let
be its non-folded version. Our goal is to recover
T of the modulo ADCs with high probability.
By definition of the modulo operation, we have that V ∈ Y + 2 R · Z K . Consequently, the optimal decoder for V from the measurement Y, in terms of minimizing Pr(V = V), iŝ
where f V is the probability density function (PDF) of the random vector V. Although f V can be expressed as the convolution of the K-dimensional Gaussian PDF of αX and 2 Note that conditioning on the event that no overload error occurred until time n, changes the statistics of E p n . Thus, applying the union bound correctly here requires some more care. See [35] for more details. the cubic PDF of
The matrix (α 2 Σ + 
Thus, the problem of findingV Gauss is equivalent to that of finding the closest point to 2 −R · s in the lattice generated by the basis L T . Solving this problem, in general, is known to require running time exponential in K [38] , unless P=NP. Thus, for large K, findingV Gauss is computationally prohibitive. One therefore needs to seek an alternative, low-complexity, decoder for V from Y. Next, we review such a decoder, proposed in [11] , dubbed the integer-forcing (IF) source decoder, see Figure 7 . The decoding algorithm works as follows.
Inputs: Y, Σ, R, α. Output: EstimatesV IF , andX IF , for V and X, respectively. Algorithm:
where |A| denotes the absolute value of det(A).
The optimization problem (20) requires a computational complexity exponential in K, in general (unless P=NP). However, the problem of finding the optimal integer matrix A, need only be solved once for each covariance matrix Σ and α. Thus, even if the solution to this problem is computationally expensive, its cost is normalized by the number of times this solution is used. In practice, one can apply the LLL algorithm [39] in order to obtain a sub-optimal A with polynomial complexity in K.
The next proposition, adapted from [11, Theorem 2] characterizes the performance of modulo ADCs with the decoder above.
Proposition 3:
T be the matrix found in step 1 of the algorithm above, and define
We have that
, for all k = 1, . . . , K, where the event E OL = {V IF = V} is the complement of the event E OL = {V IF = V}.
The main idea behind the decoder above is the simple observation that for any vector
Proof: By the identity (23), we have that the quantities g k , computed in step 2 of the algorithm, satisfȳ
where
Consequently,V IF = V if and only if the event
occurs. Thus, by the union bound,
The random variable g k has zero mean, variance σ
12 I a k , and satisfies the conditions of Lemma 1. We therefore have that
Substituting this into (24) and recalling the definition of
Conditioned on the event E OL , i.e., the event that E OL did not occur, we have that for all k = 1, . . . , K
where the last inequality follows similarly to (11) . As in the previous subsection, we set
such that
and set D = 1/12α 2 , which is a good approximation for the upper bound we derived on D k , provided that δ is not too small. Consequently, we can write
The tradeoff between rate, distortion and error probability achieved by the (R, α) mod-ADC with an integer-forcing decoder is therefore characterized by equations (26), (27) , and (28). To put this result in context, we recall the information theoretic benchmark [11] 
that approximates the minimal quantization rate, per quantizer, required by any computationally and delay unlimited system in order to achieve MSE of at most D in the reconstructions of each X k , k = 1, . . . , K. Thus,
It is easy to show that the right hand side of (29) is nonnegative [11] . However, typically the gap is quite small, and under certain distributions of practical interest on Σ, the cumulative distribution function (CDF) of this gap can be characterized [21] . A comprehensive comparison between R IFSC (D) and R BT bench (D) was performed in [11] , and it was demonstrated that they are usually quite close.
We further remark that the integer-forcing source decoder is merely one sub-optimal algorithm for solving (19) . It is an interesting avenue for future research to develop alternative algorithms, with firm performance guarantees (as in (26) , (27) and (28)), for the same problem, or more ambitiously, for solving (18) .
III. OVERSAMPLED MODULO-ADC
In Section II-A we have demonstrated the effectiveness of the modulo ADC architecture for acquiring stochastic processes that are correlated in time. In particular, we have shown that the performance of a modulo ADC depends on the variance of the prediction error of the process {V n = αX n + Z n }, rather than the variance of V n itself. However, when designing an ADC, it is desirable to impose as few constraints as possible on the signals that will be fed to the ADC. Therefore, assuming that {X n } is such that {V n } is highly predictable may be too restrictive.
Nevertheless, recalling that the process {X n } is obtained by sampling a continuous-time process X(t), we observe that if the sampling rate is higher than Nyquist's rate, {X n } will be bandlimited, 3 and consequently, {V n } will be highly predictable no matter what the precise PSD of {X n } happens to be. In fact, this observation can be viewed as the rationale underlying Σ∆-conversion. In particular, a Σ∆-converter is information theoretically equivalent to a differential pulse-code modulator (DPCM) whose input is a bandlimited signal with flat spectrum [35] .
While having many advantages, the implementation of Σ∆ converters is more involved than that of traditional scalar uniform quantizers. The main challenge in the design of Σ∆ converters is the need to produce the quantization error, and then apply a filter to this analog signal. A major obstacle is that the generation of the quantization error requires to first quantize the current sample, then apply a digital-to-analog converter (DAC) to produce the analog representation of the quantizer's output, and finally to subtract this representation from the original sample. See Figure 2 . The quantizer and the DAC need to be matched as otherwise the produced quantization error is inaccurate. This, however, turns out to be quite difficult to achieve, unless the quantizer is a simple sign detector (1-bit quantizer).
To circumvent the challenges listed above, we develop an oversampled modulo ADC, as an alternative to Σ∆-conversion. The only assumptions made on the input process {X(t)} is that it is bandlimited with maximal frequency at most B, and that its variance is at most σ 2 . The developed universal architecture is as follows. See Figure 3 .
Analog-to-digital conversion: The process X(t) is uniformly sampled every T S = 1/2LB seconds, L > 1, such that the sampling rate is L times above Nyquist's rate. Each sample of the obtained discrete-time process {X n } is then discretized using an (R, α) mod-ADC, resulting in the quantized process
As above, we define the unfolded process {V n = αX n + Z n }. The decoding procedure assumes {V n−1 , . . . , V n−p } are given, and computes an estimate for V n , based on Y n .
Inputs:
Outputs: EstimatesV n andX n for V n and X n , respectively. Algorithm: The algorithm is exactly the same as that in Section II-A, with only one difference. Here {C X [r]} is unknown. Thus, for the computation of the p-tap prediction filter {h n }, we assume the PSD of {X n } is
even though this assumption may, and is most likely to, be wrong.
Final post-processing: After collecting a long sequence of estimates {X 1 , . . . ,X N } we apply a non-causal low pass filter
on them, to obtain the sequence {X LPF 1 , . . . ,X LPF N }. The advantages over Σ∆ conversion are clear: the only processing done in the analog domain is sampling and applying a modulo ADC, whereas all filtering operations are done digitally at the decoder.
Proposition 1 provides an upper bound on the error probability Pr(E OL n ) = Pr(V n = V n ) in terms of R − The first is that we use a mismatched prediction filter here, due to the unknown PSD of {X n }, and the second is that whatever the exact PSD truns out to be, it is assumed to be supported on the frequency interval [− (6), where {h n } is the optimal linear MMSE p-tap prediction filter for V n , from its past samples {V n−1 , . . . , V n−p }, designed under the assumption that S X (e jω ) is as in (30) . Then
and let H p (e jω ) be the frequency response of the prediction filter {h n }, which is designed with respect to (31) . Further, let H(e jω ) = lim p→∞ H p (e jω ). By the basic principles of optimal linear MMSE prediction, we have that
Therefore, combining (31) and (32), we see that
.
(33) Applying this filter on the "actual" process V n = αX n + Z n , whose PSD is
where the last inequality follows from our assumption that 1 2π
It follows from Proposition 1 combined with Proposition 4, that for a quantization rate of
the proposed system achieves Pr(E OL n ) ≤ 2 exp{− 3 2 2 2δ }, for all input processes with bandwidth ≤ B and variance ≤ σ 2 .
After low-pass filtering with G(e jω ), we get by a similar analysis to that done in Section II-A and in [35] , that for long enough N such that the LPF can be treated as ideal, we have that
Thus, for large enough δ such that the total overload probability is small, i.e.,
we have that our system achieves distortion ≈ D with
The term
is the rate-distortion function of a source with PSD as in (30) . Thus, up to the loss of δ bits per sample, due to the one dimensional quantizer we are using, whose size is dictated by (37), our system is optimal in the following minimax sense: no system can attain a better tradeoff between R and D simultaneously for all processes with bandwidth at most B and variance at most σ 2 .
The multiplicative increase in quantization rate of the developed system, with respect to the fundamental rate-distortion limit, is ( . If X(t) were sampled at its Nyquist rate, rather than L times above it, standard uniform scalar quantization would have achieved similar overload probability and distortion with only a (
multiplicative increase in rate with respect to the fundamental limit. The disadvantage of the latter approach is that it requires to use a high-resolution quantizer for each sample, whereas the scheme developed here, allows to reduce the number of quantization bits per sample, at the expanse of an increased sampling rate. Thus, just like Σ∆ conversion, the scheme developed here allows to replace slow but highresolution ADCs, with fast low-resolution ones.
IV. IMPLEMENTATION VIA RING OSCILLATORS
In this Section we develop an architecture for a circuit implementing a modulo ADC, and provide a mathematical model for its input-output characteristic. Our implementation is essentially based on converting the input voltage into phase, which can naturally only be observed modulo 2π, and then quantizing the phase. To that end, we use ring oscillator ADCs, as described next.
Consider a closed-loop cascade of N inverters, where N is an odd number, all controlled with the same voltage V dd , see Figure 4 . This circuit, which is referred to as a ring oscillator can act as an ADC with sampling period T s , when V dd is set to V in (t) = g(X(t)), where X(t) is the analog signal to be converted to a digital one and g(·) is a function to be specified, and the state ('0' or '1', corresponding to 'low' or 'high') of each inverter is measured every T s seconds.
It is well known that the time it takes for a non-ideal inverter's output to respond to a change in its input is a function of V dd [40] , which we denote by ∆(V dd ) > 0. Taking this delay into account, a moment of reflection reveals that at each time instance, exactly one pair of adjacent inverters are at the same state whereas all other pairs of adjacent inverters are at distinct states. Denote by I ∈ {1, . . . , N } the index of the first inverter within the pair that shares the same state, and denote its state by B ∈ {0, 1}, i.e., the adjacent pair of inverters with the same state are inverter I and inverter [I +1] mod N , and their state is B. With this notation, we can uniquely identify the states of all N inverters at time t with the number Q t = (I t −1)+N ·[I t +B t ] mod 2 ∈ {0, . . . , 2N −1}. See Figure 5 .
A crucial observation is that the process Q t cyclically oscillates in increments of +1 modulo 2N . More formally stated, if t ′ > t is the earliest time where
. We designate by V n the number of increments that occurred in the process {Q t } within the time interval [nT S , (n + 1)T s ), and define the output of the induced modulo ADC as
Next, we relate V n to the process V in (t). To this end, we make the simplifying assumption that X(t) is constant within each time interval [nT s , (n + 1)T s ), and consequently, so is V in (t). This assumption can be made exact by adding a sample-and-hold circuit to the system. Assuming the function ∆(V dd ) is identical for all N inverters, we have that
and consequently,
where the last equality follows from the modulo distributive law (2) . Defining the "quantization error"
we can write
Let us now define the function
, which corresponds to the oscillation frequency of our circuit, and is dictated by the characteristics of the inverters at hand, and let us also take the function g(·) to be affine, such that V in (t) = a+bX(t). We further define the discrete time process X n = X(nT s ), for all n ∈ N. We have therefore obtained the model
In general, the quantization noise process {Z n } is a deterministic function of the process {X n }. Nevertheless, as in the analysis of the ideal modulo ADC, in the sequel we make the simplifying assumption that it is an iid process with
were an affine function itself, with an appropriate choice of the parameters a, b we could have induced the model
where R = log(2N ), which is identical to the ideal (R, α) mod-ADC, up to the fact that the quantization noise Z n −Z n−1 is now a first order moving-average (MA) process rather than a white process. In practice, however, it is difficult to construct inverters for which f (·) is approximately affine within a large range. The effect of nonlinearities of f (·) on the performance of the modulo ADC is numerically studied in the next section.
V. NUMERICAL EXPERIMENTS
We have conducted numerical simulations for the performance of a ring oscillator based modulo ADC, where the input is an oversampled process, as in Section III. In our simulations, we have assumed that the inverters were produced using a CMOS technology. The corresponding function f (V in ) relating the input voltage to the output frequency of the oscillator, which was introduced in Section IV, is shown in Figure 8 , as obtained using a PSpice simulation.
A. Design of System Parameters
In all our simulations, we have designed the modulo ADC and the corresponding decoder as described in Section III, i.e., under the assumption that the input signal X(t) is a Gaussian stationary process with zero mean and variance σ 2 , whose PSD is flat within the frequency interval [−B, B] and zero outside this interval. The sampling rate is a factor of L > 1 above the Nyquist rate, such that the sampling period is T s =
2LB
seconds.
Given the oversampling ratio L, the number of inverters N , and the above assumptions on the statistics of X(t), the design of the modulo ADC and its corresponding decoder consists of: 1) Choosing the shift and scaling parameters a and b for the modulo ADC such that V in (t) = a + bX(t); 2) Designing the p-tap prediction filter {h n } for V n = T s f (a + bX n ) + Z n − Z n−1 given the past samples {V n−1 , . . . , V n−p }; 3) Designing a 2k + 1-tap noncausal smoothing filter {g n } for estimating X n from {V n−k , . . . , V n+k }. The decoding procedure consists of recovering an estimate {V n } for {V n } from the modulo ADC's outputs {Y n = [T s f (a+bX n )+Z n −Z n−1 ] mod 2N }, by applying the decoding procedure described in Section III with the prediction filter {h n }. Then, the estimate {X n } is produced by applying the smoothing filter {g n } to the process {V n }, which is referred to as final post-processing in Section III . The filters {h n } and {g n } are chosen as the MMSE-optimal linear prediction and smoothing filters, respectively. Calculating the coefficients of {h n } requires knowledge of the second-order statistics of the process {V n }. This in turn, can be (numerically) calculated from the pairwise distribution of {X n , X n−m }, m = 0, . . . , p, which is fully characterized by our assumption that {X n } is a Gaussian process with PSD S X (e jω ) as in (30) . Calculating the coefficients of {g n } requires, in addition, the joint secondorder statistics of the processes {X n , V n }, which can either be calculated numerically, or via Bussgang's Theorem [41] .
We apply the developed modulo ADC architecture to processes of length T discrete samples. The parameters a and b are chosen as follows: Let P e = Pr(∪ T t=1Vt = V t ) be the block error probability of our decoder, and let ǫ be our target block error probability. For every a and b, we find the filters {h n } and {g n } as described above, and compute the corresponding P e = P e (a, b) via Monte Carlo simulation for a Gaussian input process with PSD as in (30) . Among all (a, b) for which P e (a, b) < ǫ, we choose the pair that results in the smallest MSE distortion
2 . The target block error probability for all of the setups we consider is ǫ = 10 −3 , and the block length we consider is T = 2 11 . Roughly, these parameters correspond to allowing a per-sample overload error probability of 10 −3 · 2 −11 ≈ 4.89 · 10 −7 .
B. Evaluation Method
The system was designed for a bandlimited Gaussian process with a flat PSD. Nevertheless, we would like it to achieve approximately the same MSE distortion and error probability for all bandlimited processes with the same variance, regardless of the PSD within that band. For an ideal modulo ADC and large p, this is indeed the case, as shown in Section III.
To test to what extent this remains the case also for the ring oscillator based modulo ADC, we apply our system on two types of processes: 1) A Gaussian process with variance σ 2 and bandwidth B, whose PSD is flat within this band, for which the system was designed; 2) A sinusoidal waveform, whose frequency is chosen at random, uniformly on [0, B), and whose amplitude is √ 2σ 2 , such that its power is σ 2 .
For each experiment, we also plot the theoretical performance of an ideal (R, α) mod-ADC, as well as those of a first-order Σ∆ (with the optimal 1-tap noise shaping filter) converter, both designed to achieve the same target block error probability for the bandlimited Gaussian stochastic process X(t). Although overload errors have a different effect on Σ∆ converters and modulo ADCs, both systems fail to achieve their target distortions unless those are avoided.
In the ADC literature, it is quite common to measure the performance of a particular ADC for a sinusoidal input. One drawback of this approach is that the deterministic nature of the input signal allows to design the ADC such that overload errors never occur, without significantly increasing its dynamic range above the standard deviation of its input. For stochastic processes, even if Gaussianity is assumed, the dynamic range must be as large as multiple standard deviations of its input, in order to ensure a small overload probability. In our derivations, this is manifested through the rate backoff parameter δ, which dictates the ratio between the quantizer's dynamic range 2 R and the standard deviation of its input (which in our case is the prediction error processes).
In order to allow a unified presentation of the results for both Gaussian and sinusoidal processes, rather than plotting the rate R mod-ADC (D) required by the modulo ADC in order to achieve an MSE distortion D with target block error probability ǫ, we
Fig . 9 . Performance of ring oscillators based modulo ADC (RO-ADC). We plot SNR vs. quantization rate for a Gaussian process and for a sinusoidal waveform processes with a random frequency, uniformly distributed over [0, B). For comparison we also plot the performance of an ideal (R, α) mod-ADC, as well as those of an ideal first-order Σ∆ converter. For all curves, SNR is defined as σ 2 /D. The prediction filter has p = 25 taps, whereas the smoothing filter has 2k + 1 taps for k = 22.
plot R mod-ADC (D) − δ, where
This is consistent with traditional converter analyses that separate saturation effects from granularity ones [4] , [37] . For our parameters T = 2 11 , ǫ = 10 −3 , (40) evaluates to δ ≈ 1.6717 bits. Note that by (12), δ is the rate backoff required in order to attain block error probability below ǫ by an ideal modulo ADC, when the input process is Gaussian. A similar analysis reveals that the same rate backoff is also required for a Σ∆ converter to attain the same block error probability, under the same assumptions on the input process [35] . Thus, in all figures we also plot R Σ∆ (D) − δ rather than R Σ∆ (D), where R Σ∆ (D) is the rate needed by the Σ∆ converter to attain distortion D with block error probability below ǫ.
C. Results and Discussion
We have performed experiments for the parameters L = 3 and four different values of B: 100Hz, 44.1KHz, 100KHz and 1MHz. The value of σ 2 is immaterial, as it can be absorbed in the parameter b. The results are depicted in Figures 9a, 9b, 9c and 9d, respectively. The results are based on Monte Carlo simulation, with 10 3 independent trials for each point in each figure. No overload errors were observed for the choices of a, b, {h n } and {g n } that correspond to each point in the figures, neither for the Gaussian processes and neither for the sinusoidal processes.
In general, the results indicate that the ring oscillator implementation of a modulo ADC is closer to the ideal modulo ADC for small bandwidths B and quantization rates R. In all figures we observe the same trend: for small enough R the curve of the SNR as a function of R for the ring oscillator modulo ADC is parallel to that of the ideal modulo ADC, and has a slope of ≈ 6L = 18dB/bit, in agreement with (38) . Then, for large enough R the system's non-linearities "kickin" and the slope significantly decreases. Eventually, for large enough R, the first-order Σ∆ converter outperforms the ring oscillator modulo ADC, as can be observed in Figure 9d . Nevertheless, for moderate values of R, even for B = 1MHz, the improvement over the Σ∆ converter can be as large as 17dB.
The trends above are to be expected. Recall that the output of the corresponding modulo ADC is given by (39) . If b · σ is small enough, the function f (a + bX n ) resides in a small interval around f (a) with high probability, and is well approximated by the linear function f (a) + bf ′ (a)X n . Consequently, the output of the modulo ADC can be well approximated as
Since T s f (a) is known and can be removed, this is equivalent to a (T s bf ′ (a), log(2N )) mod-ADC, albeit with quantization noise Z n − Z n−1 rather than Z n .
Typically, however, in order to get a large gain from using a modulo ADC rather than a standard uniform quantizer, we would like to use an (R, α) mod-ADC with α · σ ≫ R . Thus, in order to get a "useful" modulo ADC that is close to ideal, the two conditions
should hold. These two conditions can only be satisfied simultaneously if T s f ′ (a) ≫ 1, i.e., when the sampling rate is low, relative to f ′ (a). For an ideal (R, α) mod-ADC with a given target overload error probability, as R increases α can also increase, resulting in a smaller distortion. Similarly, for the ring oscillator modulo ADC, the optimal choice of b should, in general, increase with R. For small rates, the optimal value of b is also small, such that the linear approximation for the function f (·) is not too bad. However, as R, and consequently b, increases, the nonlinearities start becoming significant and the slope of the SNR as a function of R becomes smaller.
VI. MODULO ADCS FOR JOINTLY STATIONARY PROCESSES
In this section we develop a scheme that uses K parallel modulo ADCs for digitizing K jointly stationary processes, provide a corresponding low-complexity decoding algorithm, and characterize its performance.
Let {X 1 n }, . . . , {X K n } be K discrete-time jointly Gaussian stationary random processes, obtained by sampling the jointly Gaussian stationary processes T , and define Y n , Z n and V n similarly. Our goal is to recover the process {V n } from the outputs of the modulo ADCs with high probability.
To achieve this goal, we employ a two-step procedure, combining the schemes from Section II-A and Section II-B: first we compute a predictorV p n based on previous p samples {V n−1 , . . . , V n−p } whose error is the vector E p n = V n −V p n . By the same derivation as in Section II-A, we can produce
R from Y n and {V n−1 , . . . , V n−p }, where the modulo operation applied to a vector is to be understood as reducing each coordinate modulo 2 R . Now, our task is to decode a modulo-folded correlated random vector, which can be done via the integer-forcing decoder described in Section II-B. This relatively simple decoding procedure allows to efficiently exploit both temporal and spatial correlations. Below we describe it in more detail. See Figure 6 . For all ℓ, m ∈ {1, . . . , K}, let C ℓm [r] = E(X ℓ n X m n−r ). Inputs: Y n , {V n−1 , . . . , V n−p }, {C ℓm [r]} for all ℓ, m ∈ {1, . . . , K}, R, α.
Outputs: EstimatesV n andX n for V n and X n , respectively.
Algorithm:
where {H n } is a p-tap matrix prediction filter, H i ∈ R K×K , for i = 1, . . . , p, computed based on {C ℓm [r]} for all ℓ, m ∈ {1, . . . , K} and α, and the shift by 1 2 compensates for E(Z n ).
2) Compute
where the modulo reduction is to be understood as taken component-wise. 3) Define the pth order prediction error E p n V n −V p n , and compute its covariance matrix
} for all ℓ, m ∈ {1, . . . , K} and α. Note that Σ p is indeed invariant with respect to n due to stationarity.
4) Solve
T be the matrix found in step 4 of the algorithm above, and define
, for all k = 1, . . . , K, where the event E OLn = {V n = V n } is the complement of the event E OL n = {V n = V n }. Proof: We first note that
where the second equality follows from the modulo distributive law (2) . By (23), we have that g 
occurs. Now, repeating the same steps from the proof of Proposition 3, we arrive at the claimed bounds. Using Shannon's lower bound, and applying similar arguments as in [42] , one can show that any quantization scheme for the source {X n } that produces R bits/sample/coordinate and attains E(X = 1 2 log (2πe) K |Σ * p | , where (a) follows from the orthogonality principle of MMSE estimation [37] , and (b) from the fact that E p * n is a Gaussian random vector [5] . Thus, for any quantization scheme we must have
Similarly to previous subsections, we set D = 1/12α 2 , which is a good approximation for D n k , k = 1, . . . , K, provided that δ = R − R ST IFSC (A) is not too small. The rate required by our scheme, as given in Proposition 5, depends on 12Σ p , which corresponds to the prediction error covariance of the processX n = √ 12α 2 X n +Z n = 1 √ D (X n + √ DZ n ), whereZ n = √ 12Z n is a random vector with unit variance iid entries. LetΣ p be the pth order prediction error covariance of the process X n + √ DZ n . We can rewrite the rate required by our scheme as Now, noting that if h(X n |X n−1 , . . .) > −∞, we have that 
Thus, in the high-resolution regime, when taking large enough p, the gap between R ST IFSC (A, D) and the information theoretic lower bound is dictated by the loss of IFSC for a source whose covariance vector is Σ * . The right hand side of (44) is non-negative [11] , but is typically quite small. To illustrate this, we generate two correlated processes {X T will be highly correlated in time and in space. In Figure 10 we plot the average rate required by the developed scheme, as well as R SLB (D), and the rate required by a standard ADC, denoted R naive (D), with respect to to an iid N (0, 100) distribution on the 2L taps of {h n } and {g n }. In the simulations performed, we took L = 5 and p = 24.
VII. CONCLUSIONS AND OUTLOOK
We have studied the modulo ADC architecture as an alternative approach for analog-to-digital conversion. The modulo ADC allows exploitation of the statistical structure of the input process digitally at the decoder without requiring the ADC to adapt itself to the input statistics. We have demonstrated the effectiveness of oversampled modulo ADCs as a simple substitute to Σ∆ converters, allowing an increase in the filter's order far beyond that which is possible in current Σ∆ converters, since for modulo ADC filtering is done digitally. Moreover, we have shown that, when used for digitizing jointly stationary processes, parallel modulo ADCs can efficiently exploit both temporal and spatial correlations.
An implementation of modulo ADCs via ring oscillators was developed, and the corresponding input-output function for the obtained modulo ADC was characterized in terms of the delay-V dd profile of the inverters that construct the ring oscillator. We have then numerically studied the performance this implementation can attain for oversampled input processes, and compared it to those of Σ∆ converters.
There are several important challenges for future research. Perhaps most important is building a modulo ADC chip prototype. Although our simulations are based on the function f (·) measured from an actual (PSpice model of a) ring oscillator device, a hardware implementation is needed to fully assess the benefits of modulo ADCs. Furthermore, we would like to see whether it is possible to construct inverters with more favorable properties for ring oscillator-based modulo ADCs. In particular, we would like them to have a larger range where they are well approximated by an affine function. Another interesting avenue for future research is finding functions g(·) that can be implemented in the analog domain, such that the composition of function f • g = f (g(·)) is more linear.
