spurious signals. The level of the spurious signals was analyzed, and the relative mismatch errors should be smaller than 10 0SFDR=20 , where SFDR is the required dynamic range. In practice, this easily limits the dynamic range to 040-50 dBc unless the incompletely cancelled images are hidden underneath the fundamental signals by choosing a carrier frequency of fs=4.
spurious signals. The level of the spurious signals was analyzed, and the relative mismatch errors should be smaller than 10 0SFDR=20 , where SFDR is the required dynamic range. In practice, this easily limits the dynamic range to 040-50 dBc unless the incompletely cancelled images are hidden underneath the fundamental signals by choosing a carrier frequency of fs=4.
I. INTRODUCTION
High-performance data conversion can be achieved either by expending power and area to achieve high precision in a single analog architecture or by distributing the architecture over multiple low-resolution quantization tasks, each implemented with relatively imprecise analog circuits, and combined in the digital domain. Delta-sigma modulation has proven to be superior in attaining very high precision by distributing the quantization process over time [1] . Both high speed and high resolution can be achieved by distributing the quantization process in space [2] .
The highest speeds in analog-to-digital conversion are obtained with flash and folding converter architectures. A folding analog-to-digital converter (ADC) compared to a flash ADC offers fewer comparators and reduced decoding logic, thus allowing higher speed at lower power [3] . A folding interpolating ADC further interpolates the folded output signals to increase resolution or reduce folding rate in a multistage conversion architecture. Fig. 1 (a) depicts an example 3-bit folding ADC [4] , [5] . Each of the three folding circuits comprises identical saturating difference units at linearly spaced inflection voltages, whose outputs are combined in alternating fashion. The folded output signals are zero-thresholded by high-gain comparators, producing Gray-code outputs illustrated in Fig. 1(b) . The comparator outputs are latched and digitally postprocessed to generate binary-coded outputs.
Conventionally, the saturating difference unit () of a folding circuit is implemented as a bipolar junction transistor (BJT) or MOS differential pair. The differential output current is a saturating monotonic smooth (sigmoid) function of the differential input voltage. For instance, in the case of a MOS differential pair shown in Fig. 2 biased in the subthreshold region, one of the complementary output currents can be expressed as [6] , [7] 
where I b is the bias tail current, A = =Vt is a voltage range factor set by the Boltzmann thermal voltage V t = kT=q and gate coupling coefficient , and
is the canonical logistic sigmoid function. The complementary output currents are differentially combined, with alternating polarity, to construct the folded output current illustrated in Fig. 1(b) . Proper folding operation relies on precise addition and subtraction of sigmoid functions with identical saturation level I b and with linearly spaced points of inflection V ref . MOS transistor mismatch in the differential pair and tail current supply contribute variability in the amplitude and offset of the implemented sigmoids, illustrated in Fig. 2(b) . To maintain linearity in the ADC characteristic, relative variations in amplitude and offset cannot exceed the least significant bit (LSB) level (2 0n for n-bit conversion). Offset and amplitude mismatch can be reduced through enlarged sizing of components, contributing power dissipation. Adaptive autozeroing techniques can compensate for offset [8] , but not amplitude mismatch.
We present a compact, offset and amplitude-compensated folding circuit utilizing dynamic differential sigmoid () units, each implemented with one capacitor and four nMOS transistors. Section II describes the sigmoid difference unit, and Section III presents the folding circuit comprising these units. Section IV summarizes measured results from a densely integrated bank of 128 parallel 4-bit folding Gray-code ADCs fabricated in a 0.5-m CMOS process.
II. CORRELATED DOUBLE-SAMPLING SIGMOID UNIT

A. MOS-C Diode-Integrator
This section demonstrates that a differencing and folding unit can be implemented using a single active element, with precisely controlled sigmoid amplitude and offset. The circuit consists of a capacitor and an exponential element, such as a diode [9] or a MOS transistor operating in the subthreshold regime [10] , where the differential voltage is presented as a step in input voltage on the MOS gate. Offset compensation is achieved in the charge domain, similar to the CMOS charge-transfer comparator described in [11] .
In the circuit of Fig. 3 (a), the nMOS transistor is source-coupled to a capacitor. In the subthreshold and saturation regions, the drain current is exponential in gate and source voltage, and the large-signal dynamics of the integrator are described by
which, by integrating over t and Vs, yields 
Interestingly, for t CVt=Io(0 + ), Io(t) becomes independent of initial conditions [9] Io(t) 
B. Differential Sigmoid Unit
Saturation of the output current of the circuit in Fig. 3 (a) as a function of a change in the input voltage V g is utilized in the design of the sigmoid difference unit as shown in Fig. 3(b) . The nMOS capacitor is initially charged by pulsing reset (RST), as shown in gate voltage transition produces a change in source current according to (7) . By combining (6) and (7), the input-output characteristic of the sigmoid difference unit is expressed as I sat (t) = CV t t (11) where the time origin t = 0 is taken at the onset of inSel (Fig. 4) .
The time dependence of offset and amplitude is inconsequential to the folding characteristic, as time is in common to all -cells, and all 1 For large values of , the nMOS may initially enter the strong inversion region. This affects the timing but not the operation of the circuit, since once has raised to reach the subthreshold the asymptotic relationship (8) holds again. If desired, the residual uniform (systematic) offset can be eliminated by controlling the timing of 4t 2 and 4t 3 relative to 4t 1 . In particular, at the onset of the INT integration interval (t = 4t2), the voltage offset (10) reduces to zero when 4t 2 equals 4t 1 . 2 This choice results in zero offset for folding ADC operation, assuming that the thresholding comparison of the folding output takes place primarily at the onset of the INT interval through a regenerative amplification process as described in Section III-B. In the interpolative mode of folding ADC operation, the folding current is integrated over the entire interval of INT resulting in a broadened sigmoid with input-referred voltage offset = log( 1 + ), which reduces to zero for = .
III. MOS-C FOLDING CIRCUIT AND GRAY-CODED FOLDING ADC
A dynamic Gray-coded folding ADC is realized by constructing folding circuits consisting of units just described. Folding circuits and comparators in the architecture of Fig. 1(a) combine to produce the Gray-coded output bits shown in Fig. 1(b) . The folding currents could, in principle, be integrated for continued interpolating conversion to further increase resolution. For brevity in the present exposition, the interpolating functionality of the folding architecture for multistage ADC operation will not be elaborated on in what follows.
In general, an n-bit folding ADC comprises 2 n 0 1 sigmoid units, arranged in n folding circuits, each folding circuit feeding into a single comparator to generate the Gray-coded bits D j , j = 0; . . . n 0 1. An additional sigmoid unit, supplying half the tail current Isat=2, is needed in each folding circuit (n total) as a reference bias in the comparison.
A. MOS-C Folding Circuit
The -units produce output currents For instance, in a 4-bit (n = 4) Gray-code ADC, the LSB (j = 3) differential folding output currents contain the following eight sigmoid contributions: Theoretical and simulated output currents of a 4-bit LSB folding circuit are plotted in Fig. 7 as a function of input voltage. In the actual implementation, two additional -cells are used to avoid side effects visible at the lower and higher ends of the conversion interval.
B. Integrating Sense-Amplifying Comparator
Bit decisions are made on integrated, differentially folded -unit currents using a correlated double sampling sense-amplifying comparator. As discussed in Section II-B, synchronous and properly chosen timing of control signals eliminates the offset V o (t).
The integrating and latching sense-amplifying comparator is shown in Fig. 8 . A cascode stage M9 0 M10, controlled by the bias voltage V casc , provides low impedance input to the sense amplifier to improve the conversion speed and reduce the effect of the output conductance of the -cells. Current-domain correlated double sampling is achieved by swapping differential current inputs to the comparator at start of integration using multiplexers M50M6 and M70M8, from precharge to evaluate mode. In precharge mode (time interval 4t 2 in Fig. 4 
IV. EXPERIMENTAL RESULTS
A prototype 128-channel bank of 4-bit dynamic Gray-codes folding ADCs was fabricated in a 0.5-m CMOS process. The die micrograph is shown in Fig. 9 . The parallel bank of folding ADCs serves to quantize analog outputs from a massively parallel mixed-signal matrix-vector [13] . The ADC bank measures 0.75 mm 2 2 mm, and dissipates 82 mW of power at 6-MHz clock, for a combined conversion rate of 768 Msps (7.68 2 10 8 samples per second). The portion of the power consumed by the folding circuits (excluding sense amplifiers and output drivers) is 15 mW. The resistor string contributes 1 mW to total power. The measured characteristics are summarized in Table I .
In mixed-signal MVM array processing, multiple results from arrayparallel low-resolution folding ADCs are combined in the digital domain to digitize a single output vector component. Parallel use of multiple quantizers allows to boost overall quantization resolution by almost 2 bits beyond the resolution limits of each ADC channel [13] , [14] to approximately 6 bits.
V. CONCLUSION
A compact, offset and amplitude-compensated sigmoid differencing and folding circuit has been reported. Each sigmoid difference unit performs correlated double sampling of the inputs to avoid mismatch errors. The circuit operates in weak inversion and offers both high speed and low power. The design is suited for parallel data conversion on mixed-signal computational arrays, active pixel imagers, and other distributed charge or voltage mode circuits. A 128-channel parallel bank of 4-bit Gray-code folding ADCs converters has been implemented in a 0.5-m CMOS process, delivering 128 2 6 Msps at 82-mW power dissipation.
The results are meant to illustrate the principle and not to indicate performance limits of the approach. Resolution can be enhanced using interpolation by integrating the folding currents onto capacitors and presenting the resulting differential voltage to subsequent folding stages. Speed is limited mainly by the time required for the MOS-C diode-integrators to enter the subthreshold region. It is straightforward to extend the circuit with BJTs, which attain exponential I-V characteristics at elevated current levels, for higher speed ADCs.
